Notes
Lecture 16
----------
Ceph's big win over GFS:
- mostly POSIX semantics
- recall GFS files are really weird and have to be big
- records that could be duplicated, not
a byte stream
So Ceph is a UNIX-like filesystem scaled up to arbitrarily large workloads
- scalability & functionality, but at the
cost of complexity
Note how metadata is treated differently
- GFS: in the master node (one node, with hot spares)
- so can't have toooo much metadata, but
can have lots of data (big files, not many files)
- Ceph: metadata cluster that can scale
- so as many files as you want, as large or small as
you want
- separate from data storage (OSDs)
But how do you split up metadata access across a cluster?
- consider hot spots, i.e., a directory with millions of files, or a directory that everyone keeps accessing
- solution: dynamic subtree partitioning
Note that solutions to filesystem metadata up to this point have been specialized
- Ceph is generalized
But dynamic subtree partitioning only works because they did simplify the metadata problem as well
- normally file metadata includes where data is stored
- Ceph replaces this with CRUSH
CRUSH
- lets clients figure out where data is stored
(in objects in the OSD's)
- just need info on topology and parameters from metadata servers
Note that Ceph assumes a trusted environment
- just like almost all of the other systems we've discussed
- clients need to be updated on topology changes
(i.e., servers being added or removed)
metadata in the form of a function with parameters,
not a list of what's been used and where it is
- this is cool and different
Consider Postscript (& Display Postscript)
- programming language for printers
- idea: computer sends a program to printer rather than raw bitmaps or other plain image info
- programming language is a bit like forth
Back in the days of the original Macintosh, the Laserwriter printers Apple made had more powerful CPUs and more RAM than the Macs that drove them
- needed it to create images at 300dpi for paper
CRUSH is in this spirit
- use math rather than raw data
PDF replaced postscript
- because postscript couldn't be parallelized easily
(sequential program after all)
Original NeXT used display postscript
(interface was implemented in postscript)
And MacOS's Quartz was just display PDF
- advantage for apple: no royalties for postscript
Sending around code rather than data is a key way we overcome the latency inherent to distributed systems
- this is Javascript!