DistOS 2021F 2021-10-19
Jump to navigation
Jump to search
Notes
Midterm Review -------------- What did we cover this semester so far? Systems covered: Solaris Zones Locus GFS Plan 9 UNIX Borg Omega NASD Zookeper Chubby Bigtable DSM (many systems) MapReduce NFS Lots of systems. But you should know what each of them is, at least at a high level. - why are they related to the idea of distributed operating systems? Midterm is not strictly recall based - essays, not multiple choice - you need to synthesize the information So what's been the narrative of the course? - don't distributed processes, distribute the OS Early vs later systems Early: UNIX, LOCUS, NFS, Plan 9 Later: GFS, NASD, MapReduce, BigTable, etc What changed between early and later systems? - the web! Early systems had more of a workstation focus Later systems were built to do web services But weren't there any early systems designed to do bigger scale things? - yes, there were supercomputers - but, they were used for mostly non-business purposes - scientific & military, i.e., simulations, code breaking, intelligence gathering Big machines for big businesses were mainframes - fundamentally centralized When the web came along, workloads arose that could consume as much computational & storage power as we threw at them - no single machine, even supercomputers and mainframes, were sufficient - how else could we search the web, give email to everyone, and serve everyone ads? We can't build bigger individual computers, so we better use the machines we have, i.e., commodity PCs, to scale to the web - if we build the right abstractions for the problems we're trying to solve, we can scale almost arbitrarily * Early systems simply tried to scale existing abstractions * Later systems made new abstractions that were designed for scaling Plan 9, DSM - pinnacle of scaling standard OS abstractions (really UNIX abstractions) - memory & files Later systems used new abstractions - mapreduce: divide computations into a "map" and "reduce" operation - append-oriented, record containing files (GFS) - filesystems organized around objects/chunks, not blocks - higher level abstraction better suited to parallel operations, separating data and metadata - immutable key value stores (SSTables in BigTable) - coordination mechanisms: coarse locks (chubby), wait-free coordination data structures (zookeeper) - only have complete consistency where absolutely necessary - parallel computation/storage wherever possible by minimizing coordination costs (see relaxation on consistency) Note the abstractions aren't standard OS ones. In particular, the process and the file are no longer central. But we still need classic files and processes - most of our code assumes this sort of setup - still very useful on individual hosts - flexible - know how to make perform well - understood by developers But they get in the way when really trying to distribute things. So what do we do? Containers! - packages of processes & files This feels like biology to me - going from cells to multicellular organisms - cell = container So, what are some questions I could ask on the midterm? * Large scale distributed systems often build other distrubuted systems, why do you think this is the case and what is it influenced by? - good, but a bit too open ended - maybe ask it with some specifics, things that are built on each other, or ask for examples of systems build on each other * Explain how the Google-related systems fit together. - good, but a bit tricky since we don't have all the pieces (but we have a lot) - Maybe ask in a more specific way, say by mentioning a few systems and their connections and asking to expand on it * What are the drawbacks of the early implementations of distributed systems (LOCUS, Plan 9)? How did the newer implementations tackle those limitations? - older systems were solving a different problem than newer systems - so maybe focus on the different problems being solved and how that informed their design * How are independent facilities like Chubby/ZooKeeper (added on top of distributed systems) relate to similar but dependent facilities within UNIX filesystem? - so what are these things providing that comes automatically with UNIX? And why are these important? * Unix was written in the 1970's, yet unix like operating systems (Linux mostly) is still used in most computing tasks. What are some reasons for this and what limitations does this impose? - how has the UNIX design been good and bad (helped and held things back) - I'd want to connect this to later systems somehow, maybe by focusing on how certain abstractions had limited scalability, why that was, and how later systems got around these limitations * Ease of use tends to influence what distrubuted systems win out in the long term. Give some examples of these systems and how they worked to make the developers life easier - very interesting take - but we haven't really discussed usability, so may not be that fair * Why did distributed operating systems largely fail in the past (Plan 9, LOCUS, Distributed shared memory). Why do we no longer follow these implementation, what is different about modern implementation (Cloud computing, GFS, Omega Borg) - "failure" is a bit ambiguous, maybe can focus on how it isn't the dominant approach (or, why don't we all use Plan 9) * Compare & contrast two papers from the same class (e.g. BigTable & MapReduce, Chubby & zookeeper) - well, bigtable and mapreduce are pretty different - but chubby and zookeeper are definitely comparable - probably a better type of question for the final I'm expecting small essays - introduction & conclusion - should have an argument - 3-5 paragraphs normally - you should briefly outline before you write - I will grade you on writing as well as technical content - you don't need to summarize what we read - make references that show you know what they are as part of an argument - your essay should have a thesis that you argue for/against - the questions will be open ended and subject to interpretation - use them as inspiration to write essays that show off that you understand the concepts of the class - make sure they demonstrate how you've synthesized knowledge, not memorized - you should have mental models of how things fit together