DistOS 2021F 2021-11-11
Jump to navigation
Jump to search
Notes
Lecture 16 ---------- Ceph's big win over GFS: - mostly POSIX semantics - recall GFS files are really weird and have to be big - records that could be duplicated, not a byte stream So Ceph is a UNIX-like filesystem scaled up to arbitrarily large workloads - scalability & functionality, but at the cost of complexity Note how metadata is treated differently - GFS: in the master node (one node, with hot spares) - so can't have toooo much metadata, but can have lots of data (big files, not many files) - Ceph: metadata cluster that can scale - so as many files as you want, as large or small as you want - separate from data storage (OSDs) But how do you split up metadata access across a cluster? - consider hot spots, i.e., a directory with millions of files, or a directory that everyone keeps accessing - solution: dynamic subtree partitioning Note that solutions to filesystem metadata up to this point have been specialized - Ceph is generalized But dynamic subtree partitioning only works because they did simplify the metadata problem as well - normally file metadata includes where data is stored - Ceph replaces this with CRUSH CRUSH - lets clients figure out where data is stored (in objects in the OSD's) - just need info on topology and parameters from metadata servers Note that Ceph assumes a trusted environment - just like almost all of the other systems we've discussed - clients need to be updated on topology changes (i.e., servers being added or removed) metadata in the form of a function with parameters, not a list of what's been used and where it is - this is cool and different Consider Postscript (& Display Postscript) - programming language for printers - idea: computer sends a program to printer rather than raw bitmaps or other plain image info - programming language is a bit like forth Back in the days of the original Macintosh, the Laserwriter printers Apple made had more powerful CPUs and more RAM than the Macs that drove them - needed it to create images at 300dpi for paper CRUSH is in this spirit - use math rather than raw data PDF replaced postscript - because postscript couldn't be parallelized easily (sequential program after all) Original NeXT used display postscript (interface was implemented in postscript) And MacOS's Quartz was just display PDF - advantage for apple: no royalties for postscript Sending around code rather than data is a key way we overcome the latency inherent to distributed systems - this is Javascript!