Notes
Midterm Review
--------------
What did we cover this semester so far?
Systems covered:
Solaris Zones
Locus
GFS
Plan 9
UNIX
Borg
Omega
NASD
Zookeper
Chubby
Bigtable
DSM (many systems)
MapReduce
NFS
Lots of systems. But you should know what each of them is, at least at a high level.
- why are they related to the idea of distributed operating systems?
Midterm is not strictly recall based
- essays, not multiple choice
- you need to synthesize the information
So what's been the narrative of the course?
- don't distributed processes, distribute the OS
Early vs later systems
Early: UNIX, LOCUS, NFS, Plan 9
Later: GFS, NASD, MapReduce, BigTable, etc
What changed between early and later systems?
- the web!
Early systems had more of a workstation focus
Later systems were built to do web services
But weren't there any early systems designed to do bigger scale things?
- yes, there were supercomputers
- but, they were used for mostly non-business purposes
- scientific & military, i.e., simulations,
code breaking, intelligence gathering
Big machines for big businesses were mainframes
- fundamentally centralized
When the web came along, workloads arose that could consume as much computational & storage power as we threw at them
- no single machine, even supercomputers and mainframes,
were sufficient
- how else could we search the web, give email to everyone, and serve everyone ads?
We can't build bigger individual computers, so we better
use the machines we have, i.e., commodity PCs, to scale to the web
- if we build the right abstractions for the problems
we're trying to solve, we can scale almost arbitrarily
* Early systems simply tried to scale existing abstractions
* Later systems made new abstractions that were designed for scaling
Plan 9, DSM - pinnacle of scaling standard OS abstractions
(really UNIX abstractions)
- memory & files
Later systems used new abstractions
- mapreduce: divide computations into a "map" and "reduce" operation
- append-oriented, record containing files (GFS)
- filesystems organized around objects/chunks, not blocks
- higher level abstraction better suited to
parallel operations, separating data and metadata
- immutable key value stores (SSTables in BigTable)
- coordination mechanisms: coarse locks (chubby),
wait-free coordination data structures (zookeeper)
- only have complete consistency where absolutely
necessary
- parallel computation/storage wherever possible by
minimizing coordination costs
(see relaxation on consistency)
Note the abstractions aren't standard OS ones. In particular, the process and the file are no longer central.
But we still need classic files and processes
- most of our code assumes this sort of setup
- still very useful on individual hosts
- flexible
- know how to make perform well
- understood by developers
But they get in the way when really trying to distribute things. So what do we do? Containers!
- packages of processes & files
This feels like biology to me
- going from cells to multicellular organisms
- cell = container
So, what are some questions I could ask on the midterm?
* Large scale distributed systems often build other distrubuted systems, why do you think this is the case and what is it influenced by?
- good, but a bit too open ended
- maybe ask it with some specifics, things that are built
on each other, or ask for examples of systems
build on each other
* Explain how the Google-related systems fit together.
- good, but a bit tricky since we don't have all the pieces (but we have a lot)
- Maybe ask in a more specific way, say by mentioning a few systems and their connections and asking to expand on it
* What are the drawbacks of the early implementations of distributed systems (LOCUS, Plan 9)? How did the newer implementations tackle those limitations?
- older systems were solving a different problem than
newer systems
- so maybe focus on the different problems being solved
and how that informed their design
* How are independent facilities like Chubby/ZooKeeper (added on top of distributed systems) relate to similar but dependent facilities within UNIX filesystem?
- so what are these things providing that comes automatically with UNIX? And why are these important?
* Unix was written in the 1970's, yet unix like operating systems (Linux mostly) is still used in most computing tasks. What are some reasons for this and what limitations does this impose?
- how has the UNIX design been good and bad
(helped and held things back)
- I'd want to connect this to later systems somehow,
maybe by focusing on how certain abstractions had
limited scalability, why that was, and how later systems
got around these limitations
* Ease of use tends to influence what distrubuted systems win out in the long term. Give some examples of these systems and how they worked to make the developers life easier
- very interesting take
- but we haven't really discussed usability,
so may not be that fair
* Why did distributed operating systems largely fail in the past (Plan 9, LOCUS, Distributed shared memory). Why do we no longer follow these implementation, what is different about modern implementation (Cloud computing, GFS, Omega Borg)
- "failure" is a bit ambiguous, maybe can focus
on how it isn't the dominant approach
(or, why don't we all use Plan 9)
* Compare & contrast two papers from the same class (e.g. BigTable & MapReduce, Chubby & zookeeper)
- well, bigtable and mapreduce are pretty different
- but chubby and zookeeper are definitely comparable
- probably a better type of question for the final
I'm expecting small essays
- introduction & conclusion
- should have an argument
- 3-5 paragraphs normally
- you should briefly outline before you write
- I will grade you on writing as well as technical content
- you don't need to summarize what we read
- make references that show you know what they are
as part of an argument
- your essay should have a thesis that you argue for/against
- the questions will be open ended and subject to interpretation
- use them as inspiration to write essays that
show off that you understand the concepts of the class
- make sure they demonstrate how you've synthesized
knowledge, not memorized
- you should have mental models of how things fit together