DistOS 2023W 2023-03-22: Difference between revisions
Created page with "==Discussion Questions== * What problems are Cassandra and Dynamo built to solve? How do these problems inform their design? * What are the key technical insights or algorithms behind Cassandra and Dynamo? * What infrastructure do Cassandra and Dynamo seem to rely on? How does this compare with the systems made by Google?" |
No edit summary |
||
Line 4: | Line 4: | ||
* What are the key technical insights or algorithms behind Cassandra and Dynamo? | * What are the key technical insights or algorithms behind Cassandra and Dynamo? | ||
* What infrastructure do Cassandra and Dynamo seem to rely on? How does this compare with the systems made by Google? | * What infrastructure do Cassandra and Dynamo seem to rely on? How does this compare with the systems made by Google? | ||
==Notes== | |||
<pre> | |||
March 22 | |||
-------- | |||
Ploblems designed to solve | |||
* Cassandra: inbox search | |||
* Dynamo: shopping carts & user sessions | |||
Infrastructure | |||
* Cassandra: Ganglia (perf monitoring), Zookeeper (consistent state), MapReduce, MySQL | |||
* Dynamo: local node storage (MySQL, Berkeley DB, others) | |||
Why does Google rely so much on a distributed filesystem while Facebook and Amazon don't? | |||
- Facebook and Amazon just had to deal with user/merchant data | |||
- Google crawled the web, indexed it | |||
Amazon is different from the other two in that it uses a service-oriented architecture | |||
- focus on decentralization, connection via well-defined APIs | |||
- versus more monolithic systems with lots of interdependencies | |||
- Google uses one single source code repository that everyone can access! | |||
The real difference is about trust | |||
- monolithic but distributed system: everything is assumed to work together | |||
- SOA: everyone else may abuse your system, you must design around that | |||
The difference is between implicit vs explicit guarantees of functionality | |||
"two pizza rule" for Amazon? | |||
- organized into small teams | |||
Every distributed system has to be built around smaller parts, the question is how trusted each of these parts are by other parts of the system | |||
Amazon was uniquely suited to build AWS, because its services were mostly already ready for "outsiders" to use | |||
Both systems make use of gossip protocols | |||
- scuttlebutt for Cassandra | |||
What is a gossip protocol? | |||
- sharing state by "talking" with peers | |||
- should have some guarantees that everyone learns what they need to know | |||
- gossip can include local and global data | |||
Dynamo has a weird approach to consistency issues | |||
- most systems resolve consistency when writing data, or after | |||
data has been written and before reading | |||
- dynamo resolves while reading | |||
If you do consistency on writes, you can slow down writes | |||
By moving conflict resolution to reads, you ensure writes happen with minimal delay (maximizing performance and durability) | |||
Dynamo's strategy makes sense when you realize their focus is on shopping carts | |||
- better to have more items in the cart than fewer! | |||
- definitely a domain-specific optimization | |||
Cassandra is focused on storing inboxes, enabling search | |||
- so they also have to prioritize writes, but want to keep it more consistent than amazon | |||
- focus is more on making the inbox responsive so users don't get annoyed | |||
(don't want them leaving facebook after all) | |||
- prioritize newer messages over older messages, but keep all the data | |||
Note Cassandra's local storage looks like a log-structured filesystem | |||
Traditional filesystems have metadata and data all intermixed | |||
- wide trees of blocks for storing data, lots of pointers (block references) | |||
- poor locality, may require many seeks to get file contents | |||
(read inode, read indirect blocks, read data blocks) | |||
- extents only partially solve this | |||
The big problem with traditional filesystems is with writes | |||
- have to write to random parts of the disk, and random writes are *much* slower than sequential writes | |||
- and if data isn't written to disk, you'll lose it | |||
Idea: write data in a sequential log first, then write it to the tree | |||
- journaled filesystems do this, but typically just journal metadata, not data | |||
- journaled fs make fsck MUCH faster | |||
log structured filesystem gets rid of the regular tree, puts everything in the journal (the log) | |||
- add in in-memory indexes to track what is current | |||
- add compaction to remove old versions on disk and free up space | |||
</pre> |
Latest revision as of 17:32, 22 March 2023
Discussion Questions
- What problems are Cassandra and Dynamo built to solve? How do these problems inform their design?
- What are the key technical insights or algorithms behind Cassandra and Dynamo?
- What infrastructure do Cassandra and Dynamo seem to rely on? How does this compare with the systems made by Google?
Notes
March 22 -------- Ploblems designed to solve * Cassandra: inbox search * Dynamo: shopping carts & user sessions Infrastructure * Cassandra: Ganglia (perf monitoring), Zookeeper (consistent state), MapReduce, MySQL * Dynamo: local node storage (MySQL, Berkeley DB, others) Why does Google rely so much on a distributed filesystem while Facebook and Amazon don't? - Facebook and Amazon just had to deal with user/merchant data - Google crawled the web, indexed it Amazon is different from the other two in that it uses a service-oriented architecture - focus on decentralization, connection via well-defined APIs - versus more monolithic systems with lots of interdependencies - Google uses one single source code repository that everyone can access! The real difference is about trust - monolithic but distributed system: everything is assumed to work together - SOA: everyone else may abuse your system, you must design around that The difference is between implicit vs explicit guarantees of functionality "two pizza rule" for Amazon? - organized into small teams Every distributed system has to be built around smaller parts, the question is how trusted each of these parts are by other parts of the system Amazon was uniquely suited to build AWS, because its services were mostly already ready for "outsiders" to use Both systems make use of gossip protocols - scuttlebutt for Cassandra What is a gossip protocol? - sharing state by "talking" with peers - should have some guarantees that everyone learns what they need to know - gossip can include local and global data Dynamo has a weird approach to consistency issues - most systems resolve consistency when writing data, or after data has been written and before reading - dynamo resolves while reading If you do consistency on writes, you can slow down writes By moving conflict resolution to reads, you ensure writes happen with minimal delay (maximizing performance and durability) Dynamo's strategy makes sense when you realize their focus is on shopping carts - better to have more items in the cart than fewer! - definitely a domain-specific optimization Cassandra is focused on storing inboxes, enabling search - so they also have to prioritize writes, but want to keep it more consistent than amazon - focus is more on making the inbox responsive so users don't get annoyed (don't want them leaving facebook after all) - prioritize newer messages over older messages, but keep all the data Note Cassandra's local storage looks like a log-structured filesystem Traditional filesystems have metadata and data all intermixed - wide trees of blocks for storing data, lots of pointers (block references) - poor locality, may require many seeks to get file contents (read inode, read indirect blocks, read data blocks) - extents only partially solve this The big problem with traditional filesystems is with writes - have to write to random parts of the disk, and random writes are *much* slower than sequential writes - and if data isn't written to disk, you'll lose it Idea: write data in a sequential log first, then write it to the tree - journaled filesystems do this, but typically just journal metadata, not data - journaled fs make fsck MUCH faster log structured filesystem gets rid of the regular tree, puts everything in the journal (the log) - add in in-memory indexes to track what is current - add compaction to remove old versions on disk and free up space