DistOS 2021F 2021-10-19

From Soma-notes

Notes

Midterm Review
--------------
What did we cover this semester so far?

Systems covered:
 Solaris Zones
 Locus
 GFS
 Plan 9
 UNIX
 Borg
 Omega
 NASD
 Zookeper
 Chubby
 Bigtable
 DSM (many systems)
 MapReduce
 NFS

Lots of systems.  But you should know what each of them is, at least at a high level.
  - why are they related to the idea of distributed operating systems?

Midterm is not strictly recall based
 - essays, not multiple choice
 - you need to synthesize the information

So what's been the narrative of the course?
 - don't distributed processes, distribute the OS

Early vs later systems

Early: UNIX, LOCUS, NFS, Plan 9
Later: GFS, NASD, MapReduce, BigTable, etc

What changed between early and later systems?
 - the web!

Early systems had more of a workstation focus
Later systems were built to do web services

But weren't there any early systems designed to do bigger scale things?
 - yes, there were supercomputers
 - but, they were used for mostly non-business purposes
   - scientific & military, i.e., simulations,
     code breaking, intelligence gathering

Big machines for big businesses were mainframes
 - fundamentally centralized

When the web came along, workloads arose that could consume as much computational & storage power as we threw at them
 - no single machine, even supercomputers and mainframes,
   were sufficient
 - how else could we search the web, give email to everyone, and serve everyone ads?

We can't build bigger individual computers, so we better
use the machines we have, i.e., commodity PCs, to scale to the web

 - if we build the right abstractions for the problems
   we're trying to solve, we can scale almost arbitrarily

* Early systems simply tried to scale existing abstractions
* Later systems made new abstractions that were designed for scaling

Plan 9, DSM - pinnacle of scaling standard OS abstractions
  (really UNIX abstractions)
   - memory & files

Later systems used new abstractions
 - mapreduce: divide computations into a "map" and "reduce" operation
 - append-oriented, record containing files (GFS)
 - filesystems organized around objects/chunks, not blocks
    - higher level abstraction better suited to
      parallel operations, separating data and metadata
 - immutable key value stores (SSTables in BigTable)
 - coordination mechanisms: coarse locks (chubby),
   wait-free coordination data structures (zookeeper)
    - only have complete consistency where absolutely
      necessary
 - parallel computation/storage wherever possible by
   minimizing coordination costs
   (see relaxation on consistency)

Note the abstractions aren't standard OS ones.  In particular, the process and the file are no longer central.

But we still need classic files and processes
 - most of our code assumes this sort of setup
 - still very useful on individual hosts
    - flexible
    - know how to make perform well
    - understood by developers

But they get in the way when really trying to distribute things.  So what do we do?  Containers!
 - packages of processes & files

This feels like biology to me
 - going from cells to multicellular organisms
 - cell = container

So, what are some questions I could ask on the midterm?

* Large scale distributed systems often build other distrubuted systems, why do you think this is the case and what is it influenced by?
  - good, but a bit too open ended
  - maybe ask it with some specifics, things that are built
    on each other, or ask for examples of systems
    build on each other

* Explain how the Google-related systems fit together.
  - good, but a bit tricky since we don't have all the pieces (but we have a lot)
  - Maybe ask in a more specific way, say by mentioning a few systems and their connections and asking to expand on it

* What are the drawbacks of the early implementations of distributed systems (LOCUS, Plan 9)? How did the newer implementations tackle those limitations?
  - older systems were solving a different problem than
    newer systems
  - so maybe focus on the different problems being solved
    and how that informed their design

* How are independent facilities like Chubby/ZooKeeper (added on top of distributed systems) relate to similar but dependent facilities within UNIX filesystem?
  - so what are these things providing that comes automatically with UNIX?  And why are these important?

* Unix was written in the 1970's, yet unix like operating systems (Linux mostly) is still used in most computing tasks. What are some reasons for this and what limitations does this impose?
  - how has the UNIX design been good and bad
    (helped and held things back)
  - I'd want to connect this to later systems somehow,
    maybe by focusing on how certain abstractions had
    limited scalability, why that was, and how later systems
    got around these limitations

* Ease of use tends to influence what distrubuted systems win out in the long term. Give some examples of these systems and how they worked to make the developers life easier 
  - very interesting take
  - but we haven't really discussed usability,
    so may not be that fair

* Why did distributed operating systems largely fail in the past (Plan 9, LOCUS, Distributed shared memory). Why do we no longer follow these implementation, what is different about modern implementation (Cloud computing, GFS, Omega Borg) 
  - "failure" is a bit ambiguous, maybe can focus
    on how it isn't the dominant approach
    (or, why don't we all use Plan 9)

* Compare & contrast two papers from the same class (e.g. BigTable & MapReduce, Chubby & zookeeper)
 - well, bigtable and mapreduce are pretty different
 - but chubby and zookeeper are definitely comparable
 - probably a better type of question for the final

I'm expecting small essays
 - introduction & conclusion
 - should have an argument
 - 3-5 paragraphs normally
 - you should briefly outline before you write
   - I will grade you on writing as well as technical content
 - you don't need to summarize what we read
   - make references that show you know what they are
     as part of an argument
 - your essay should have a thesis that you argue for/against

 - the questions will be open ended and subject to interpretation
   - use them as inspiration to write essays that
     show off that you understand the concepts of the class
   - make sure they demonstrate how you've synthesized
     knowledge, not memorized
       - you should have mental models of how things fit together