DistOS 2023W 2023-03-22

Discussion Questions

What problems are Cassandra and Dynamo built to solve? How do these problems inform their design?
What are the key technical insights or algorithms behind Cassandra and Dynamo?
What infrastructure do Cassandra and Dynamo seem to rely on? How does this compare with the systems made by Google?
Notes

March 22
--------

Ploblems designed to solve
* Cassandra: inbox search
* Dynamo: shopping carts & user sessions

Infrastructure
* Cassandra: Ganglia (perf monitoring), Zookeeper (consistent state), MapReduce, MySQL
* Dynamo: local node storage (MySQL, Berkeley DB, others)

Why does Google rely so much on a distributed filesystem while Facebook and Amazon don't?
 - Facebook and Amazon just had to deal with user/merchant data
 - Google crawled the web, indexed it

Amazon is different from the other two in that it uses a service-oriented architecture
 - focus on decentralization, connection via well-defined APIs
 - versus more monolithic systems with lots of interdependencies
    - Google uses one single source code repository that everyone can access!

The real difference is about trust
 - monolithic but distributed system: everything is assumed to work together
 - SOA: everyone else may abuse your system, you must design around that

The difference is between implicit vs explicit guarantees of functionality

"two pizza rule" for Amazon?
 - organized into small teams

Every distributed system has to be built around smaller parts, the question is how trusted each of these parts are by other parts of the system

Amazon was uniquely suited to build AWS, because its services were mostly already ready for "outsiders" to use

Both systems make use of gossip protocols
 - scuttlebutt for Cassandra

What is a gossip protocol?
 - sharing state by "talking" with peers
 - should have some guarantees that everyone learns what they need to know
 - gossip can include local and global data

Dynamo has a weird approach to consistency issues
 - most systems resolve consistency when writing data, or after
   data has been written and before reading
 - dynamo resolves while reading

If you do consistency on writes, you can slow down writes
By moving conflict resolution to reads, you ensure writes happen with minimal delay (maximizing performance and durability)

Dynamo's strategy makes sense when you realize their focus is on shopping carts
 - better to have more items in the cart than fewer!
 - definitely a domain-specific optimization

Cassandra is focused on storing inboxes, enabling search
 - so they also have to prioritize writes, but want to keep it more consistent than amazon
 - focus is more on making the inbox responsive so users don't get annoyed
   (don't want them leaving facebook after all)
 - prioritize newer messages over older messages, but keep all the data

Note Cassandra's local storage looks like a log-structured filesystem


Traditional filesystems have metadata and data all intermixed
 - wide trees of blocks for storing data, lots of pointers (block references)
 - poor locality, may require many seeks to get file contents
   (read inode, read indirect blocks, read data blocks)
     - extents only partially solve this

The big problem with traditional filesystems is with writes
 - have to write to random parts of the disk, and random writes are *much* slower than sequential writes
    - and if data isn't written to disk, you'll lose it

Idea: write data in a sequential log first, then write it to the tree
 - journaled filesystems do this, but typically just journal metadata, not data
 - journaled fs make fsck MUCH faster

log structured filesystem gets rid of the regular tree, puts everything in the journal (the log)
  - add in in-memory indexes to track what is current
  - add compaction to remove old versions on disk and free up space