DistOS 2023W 2023-04-05

From Soma-notes

Class Discussion

For class discussion today:

  • What did we learn this semester?
  • Specifically, what are the big ideas, the patterns we covered?

Notes

Exam Review
-----------

EXAM LOGISTICS

Exam is on April 21, 2-5 PM (COMP 4000)

(COMP 5102 is scheduled on April 27, 2-5 PM, but please take the exam on April 21 unless you have a good reason.  I will need to manually open up the exam for you on April 27th.)

Exam is online just like the midterm, same format, but three hours.

Projects are due on April 27th by midnight.  I have until May 1st to get grades in.

Exam will require three essays from 3 or more questions (I haven't made it up yet).  Final exam is cumulative.


PRESENTATION LOGISTICS

To facilitate presentations on Monday, please send me your slides by Sunday night as a PDF file, via email (my cunet address) or Teams.  You may have at most 4 slides (including title), plan to talk for 3 minutes. I will record so I can grade after class.

Be sure to present the thesis of your lit review and give evidence in support of the thesis by referring to specific papers/systems

EVERYONE SHOULD COME TO CLASS ON MONDAY.
 - will probably have topics of relevance to the final exam

DISCUSSION SUMMARY

What was this class about?

"distributed abstractions for application development"

Why are Ceph and Spanner similar?
 - both take old APIs that were hard to scale and make them scalable
   (UNIX files & SQL)
 - but at the cost of complexity

When studying for the exam, you have to go beyond labels
 - what are the real commonalities?  What are the patterns?
 - really try to compare papers and see how they relate

What abstractions work in a distributed context, and how can those abstractions be implemented?

Ongoing considerations
 - how to manage state?  Make it immutable wherever you can (append-mostly files), and turn mutable state into immutable state everywhere else (e.g., use logs)

 - minimise consistency requirements, but don't be afraid to use strong consistency where needed using consensus protocols such as Paxos or Raft

 - build for failure of hosts, this will happen, so no computer can be essential, all must be replacable

 - abstract the OS away from the hardware to facilitate distribution of load (e.g., containers)

 - (scheduling is hard and generally requires domain knowledge of workload
   (lots of hacks in practice))

 - keep things simple initially, but later don't be afraid to make the system complex to support easier/more conventional abstractions for developers (users of the system)

 - abstractions for distributed applications need to be specialized to be performant (because parallel is hard)
 
 - Much better performance is possible if you're running trusted workloads
   - compare BOINC, OceanStore with GFS, Dynamo, MapReduce, etc
      - BOINC vs MapReduce
      - OceanStore vs f4