DistOS 2023W 2023-01-09
Video from today's lecture is available on Brightspace, in Zoom->Cloud Recordings.
Notes
Lecture 1 --------- What is a distributed operating system? First, it is an operating system - the code that transforms the computer that you have into the one you want to program - abstraction, resource management in the service of applications A distributed OS is one that runs across multiple computers connected via a network (or networks) So, why do we need a separate class on this? Why can't this be covered in a regular OS class? - what's so hard about making an OS distributed? Networking is hard! - but why? State management - yes, but why? In a single-system OS, if we have a significant hardware or software issue, the system crashes/fails and we're fine with it In a distributed OS, individual computers can fail, be disconnected, etc - but the system should go on! - we have to, because at scale errors are common, cannot be avoided Failures make state a much harder problem - because when we try to synchronize state, it can fail, and then what do we do? - and, state sync is inherently slow over a network The problem of this semester is that generally-used single-system operating systems (UNIX-like, Windows, etc) use abstractions that are fundamentally incompatible with large-scale distributed operation - we can't keep the abstractions and also distribute the computation - we CAN keep using them, but only on individual systems