DistOS 2021F 2021-12-09

From Soma-notes
Revision as of 00:15, 10 December 2021 by Soma (talk | contribs) (Created page with "==Notes== <pre> Lecture 22 ---------- Final exam review - themes of the class - key papers & concepts - potential questions Format of final exam - 3 hours - essay que...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Notes

Lecture 22
----------

Final exam review

 - themes of the class
 - key papers & concepts
 - potential questions

Format of final exam
 - 3 hours
 - essay questions, 3 or 4 out of more
   (haven't created final yet)
 - open book, open note, open internet
 - just, NO COLLABORATION
 - and please, no plagiarism
   - all words should be your own
   - wasn't an issue on the midterm



Themes of the class
 - distributed OSs can't just copy single-system OSs
   because the abstractions don't scale
   - UNIX-like systems work well for individual systems
   - people know how to program them, so we want
     to keep them as much as possible
 - but to scale, we need to approach problems from
   different angles, make use of different abstractions

Why don't single system OSs scale?
Key assumptions are violated
 - process memory is strongly consistent
   and access to it is roughly uniform
    - low latency
 - files are consistent
 - system all works or is completely broken
    - partial failures not a concern
 - so, communication is consistent and reliable

This all changes when systems are connected by networks
 - network is *unreliable* and insecure
   - lose packets or entire connections
 - individual hosts fail independently yet
   distributed application should keep running
     - has to, because otherwise nothing would work

(You know video for all classes are available through
brightspace right?)

Note that older attempts at distributed OSs dealt with distribution but not with reliability
 - allowed for whole system to go down when individual ones did

Later there was replication to enable reliability

But only when we had jobs that needed to be bigger than
one host did we really scale things
  - the web

(Scientific computing has always needed lots of computing power, but it was a specialized discipline with very different kinds of production requirements.)

So, what are the strategies to make a platform for distributed applications?
 - keep UNIX, but in the form of containers
   - ephemeral, often immutable hosts
 - orchestration to manage containers
 - distributed data stores (filesystems, key-value,
   relational databases) built on immutability
   as much as possible
     - record append files
     - immutable tablets
     - immutable objects/chunks
 - relax consistency as much as possible
 - where we need consistency, need special mechanisms
   - use consensus algorithms (Paxos, etc)
     as little as possible to avoid bottlenecks
   - use synchonized time/order of events
   - make use of domain-specific regularities
     to optimize
 - replicate data and metadata for reliability &
   performance
   - replication leads to consistency issues,
     which are solved through a combination of
     immutability and consensus
 - if you want performance, assume system is trusted
   - nodes can fail, but any bad behavior is strictly
     bounded
 - if you can't trust systems centrally because
   they are controlled by mutually non-trusting parties,
   you have to address the byzantine general's problem
     - and solutions tend to be expensive!
     - this is where you get cryptocurrencies and
       proof of work
 - these systems are best suited for embarassingly
   parallel tasks, e.g., ones that fit the mapreduce
   model or are serving independent customers concurrently
     - great for major online app/service providers,
       Facebook, Google, Amazon, etc
 - however, we can do computations that require more coordination - but we have to optimize every specific type separately (i.e., we can optimize for ML but it won't help with doing large-scale simulations)

Single system OSs got us used to having a single infrastructure for everything
 - in a distributed world, the infrastructure has to mirror
   application requirements
   - except, maybe, when we do lots of engineering
     (think Ceph and Spanner)
   - and even there, we have to be careful, will need
     to do app-specific tweaking


The best essays will
 - have a clear argument
 - logical arguments
 - many specific examples drawing upon multiple papers
   from the class
     - you can bring in other sources, but focus
       should be on what we covered

Remember you're demostrating that you know the class material