DistOS 2023W 2023-01-09

From Soma-notes
Revision as of 21:58, 10 January 2023 by Soma (talk | contribs) (Created page with "Video from today's lecture is available on Brightspace, in Zoom->Cloud Recordings. ==Notes== <pre> Lecture 1 --------- What is a distributed operating system? First, it is an operating system - the code that transforms the computer that you have into the one you want to program - abstraction, resource management in the service of applications A distributed OS is one that runs across multiple computers connected via a network (or networks) So, why do we need...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Video from today's lecture is available on Brightspace, in Zoom->Cloud Recordings.

Notes

Lecture 1
---------

What is a distributed operating system?

First, it is an operating system
 - the code that transforms the computer that you have into the one you want
   to program
   - abstraction, resource management in the service of applications

A distributed OS is one that runs across multiple computers connected via a network (or networks)

So, why do we need a separate class on this?  Why can't this be covered in a regular OS class?
 - what's so hard about making an OS distributed?

Networking is hard!
 - but why?

State management
 - yes, but why?

In a single-system OS, if we have a significant hardware or software issue,
the system crashes/fails and we're fine with it

In a distributed OS, individual computers can fail, be disconnected, etc
 - but the system should go on!
 - we have to, because at scale errors are common, cannot be avoided

Failures make state a much harder problem
 - because when we try to synchronize state, it can fail, and then
   what do we do?
 - and, state sync is inherently slow over a network


The problem of this semester is that generally-used single-system operating systems (UNIX-like, Windows, etc) use abstractions that are fundamentally incompatible with large-scale distributed operation
  - we can't keep the abstractions and also distribute the computation
  - we CAN keep using them, but only on individual systems