DistOS 2021F 2021-11-11

From Soma-notes

Notes

Lecture 16
----------
Ceph's big win over GFS:
 - mostly POSIX semantics
 - recall GFS files are really weird and have to be big
    - records that could be duplicated, not
      a byte stream

So Ceph is a UNIX-like filesystem scaled up to arbitrarily large workloads
 - scalability & functionality, but at the
   cost of complexity

Note how metadata is treated differently
 - GFS: in the master node (one node, with hot spares)
    - so can't have toooo much metadata, but
      can have lots of data (big files, not many files)

 - Ceph: metadata cluster that can scale
    - so as many files as you want, as large or small as
      you want
    - separate from data storage (OSDs)

But how do you split up metadata access across a cluster?
 - consider hot spots, i.e., a directory with millions of files, or a directory that everyone keeps accessing
 - solution: dynamic subtree partitioning

Note that solutions to filesystem metadata up to this point have been specialized
 - Ceph is generalized

But dynamic subtree partitioning only works because they did simplify the metadata problem as well
 - normally file metadata includes where data is stored
 - Ceph replaces this with CRUSH

CRUSH
 - lets clients figure out where data is stored
   (in objects in the OSD's)
 - just need info on topology and parameters from metadata servers

Note that Ceph assumes a trusted environment
 - just like almost all of the other systems we've discussed
 - clients need to be updated on topology changes
   (i.e., servers being added or removed)

metadata in the form of a function with parameters,
not a list of what's been used and where it is
 - this is cool and different

Consider Postscript (& Display Postscript)
 - programming language for printers
 - idea: computer sends a program to printer rather than raw bitmaps or other plain image info
    - programming language is a bit like forth


Back in the days of the original Macintosh, the Laserwriter printers Apple made had more powerful CPUs and more RAM than the Macs that drove them
  - needed it to create images at 300dpi for paper

CRUSH is in this spirit
 - use math rather than raw data

PDF replaced postscript
 - because postscript couldn't be parallelized easily
   (sequential program after all)

Original NeXT used display postscript
 (interface was implemented in postscript)

And MacOS's Quartz was just display PDF
 - advantage for apple: no royalties for postscript

Sending around code rather than data is a key way we overcome the latency inherent to distributed systems
 - this is Javascript!