DistOS 2018F 2018-11-12

Readings

DeCandia et al., "Dynamo: Amazon’s Highly Available Key-value Store" (SOSP 2007)
Lakshman & Malik, "Cassandra - A Decentralized Structured Storage System" (LADIS 2009)

Notes

Lecture:

Dynamo:

What is Dynamo the solution too? Shopping carts and web sessions.

When you put something in shopping cart, it stays, even months later. Shopping carts matter but have weird priorities compared to the past. First care about availability. Session for it, and if not, give a new session. And this matters at i.e. Christmas.

This does not look like how Google builds things, what is different? What does big table depend on? Gfs and chubby. Dynamo is a standalone service to solve the problem. Why doesn’t Google do that? It is about philosophy. The two pizza rule...teams to be no bigger than could be fed with two pizzas. A bunch of small teams, they want each team to be autonomous. They don’t want hierarchy of management. Relatively small teams, need to be able to work on a self contained project that does not tightly couple you to another team. Service oriented architecture. Other folks inside Amazon when use your service, using an API that can be exposed to the world. Contrasted with Google, designed to be trusted in a trusted environment that is not exposed to the world. Denial or service attacks have to be considered from the beginning from Amazon than Google. Amazon tries to build a platform, Google tries to provide a product. Microsoft does this kind of, internal APIs. AWS, this is the API we use, don’t get a different one. If you don’t like Amazon’s AWS, go do your own thing. Number of services offered by AWS is insane and redundancy among them with overlapping capabilities. Kind of Darwinian. AWS webpage is a thin layer over internal APIs...org chart...look at a product, what they are selling is a reflection of the way the organization is organized. Amazon’s is very flat, Google is not. Google Hierarchical. Not that one is better than the other, just different. Amazon is winning the platform race.

Facebook Cassandra:

Inbox search, messages to search. What they developed....why is this a funny thing. Google is based on searching...Facebook, no. Social graph, Facebook wall...now need to allow people to search. What is the search optimized for? Writes. Want to have inbox search running consistently. How often queried vs. Writes. Optimized for the writes not for the queries...mostly data dumps, occasionally searching. Log structure almost output, everything into memory for index and searching. Someone’s inbox can probably fit into a node. Not a big data problem in the sense of a google search. ...limited search...load into memory and do a quick search of it there, which is very different. Why does that matter? Solving a specialized problem....don’t need a relational database, no schema, free-form search.

What is similar in how they are implemented. Do these use Paxos? Cassandra gossip protocol....a ring, n-1 nodes. Scuttle-bud .... elects a leader. All the master does is what the replicas does...problem partitioning. This is where consistent hashing comes in. Rows grows or shrinks, don’t have to rehash everything. Otherwise, have to reorganize the data. Consistent hashing, only a fraction of it. Seeing how the same ideas are getting applied again and again. Gossip protocols, have not seen that. Paxos, small number of nodes that maintain state, consistent view of that which is replicated in a hierarchy. Harder to get strict guarantees out of gossip kind of thing but better performance to handle incoming data rather than consistency .... Cassandra consistency model can change. Ring structures in gossip protocols go together. Talk to neighbours and a few more neighbours for communication is on the ring. Redundant but not too redundant.

Diff between inbox searching and shopping cart...Dynamo was a key value store, no search. Cassandra needs to do search on a node....optimized for writes but when do a search, sucked into a node’s memory to do searching. Not doing a lot of indexing, just metadata for where messages are stored. If doing relatively infrequent searches, might as well load it into memory....how often will they search their inbox? Otherwise, if infrequent, why bother generating an index. Specialized systems for solving specific problems. How much space there is in the design when you make specialized solutions. General solutions are either limited or broke. Compared to ceph, this is simpler but optimized for a specific use case. Always in comp sci, want to make a general solution but in experience, only way to get useful systems for many different scenarios is to use them in many different scenarios. Why large organizations have successes at solving internal problems and then export them. Google had the scale, had the resources but, were not in the business of selling it (internal use only). Amazon did the opposite. Google is catching up to AWS. Don’t have the correct culture, requires a major culture shift.

Seeing patterns and how infrastructure is developing. The goal, if you see a new paper...how if fits in with the rest...if you see a design, let’s build this? What problem are you solving...can go out there and have an idea of the systems that are there and what is the problem that is being solved there....rather than building your own system. Solve a problem that doesn’t fit, maybe not an off the shelf solution but need to be able to recognize it...availability, consistency....get away from this is the perfect system or best solution. What is right for the problem you are trying to solve....what infrastructure do you have? Do you want to use that infrastructure?

More notes

Dynamo: Solution to always available shopping carts.
- First care about availability. They want you to be able to add to your shopping cart no matter what. Matters because: Christmas
- BigTable needs GFS and Chubby, Dynamo depends on nothing, it is a standalone service.
  - 2 pizza rule at Amazon: Amazon doesn't want teams to be any bigger than can be fed with 2 pizzas. Don't want to build big systems. Each team needs to work on a self contained project. Amazon took on service-oriented architecture. Amazon needs to take in DOS attack. Amazon is trying to build a platform, Google tries to build products. AWS is the API that Amazon uses. Google makes more tightly coupled systems, Amazon loosely coupled.
  - Lots of redundancy between AWS systems. It is darwinian - they just put things out there.
  - Organizations sell their org chart - it is a reflection of how things work internally. Amazon’s is far flatter, Google is more hierarchical.
- Cassandra: Inbox search problem. You have a lot of messages and you want to be able to search it.
  - Optimized for writes, not for queries. Looks like log structured file system. Searching done in memory. Much smaller records than Google - Google had the entire web, facebook just has inbox.
  - Consistent Hashing: If you change size of hash buckets, don't have to rehash everything
- Gossip protocols vs Paxos: Replication in gossiping is done using p2p gossiping, like an infection. Paxos you have a few small number of nodes with lots of replication.
- Ring structures and gossip protocols tend to go together.
- In cassandra you find your inbox first and then search on it. Dynamo is just a key-value store.
  - Not a lot of indexing in Cassandra. Few searches, just load it into memory and do a linear search. Rarely do people search their inbox, don't need to index it like google does.
- These later systems are simpler, unlike Ceph which is a general solution and very complicated. General solutions that solve a problem completely rarely work.
- Amazon has been very good at exporting their services. Google doesn't have the right culture to catch up to Amazon.
- Big idea of this course is to see what's out there and try to reuse other peoples solutions. No such thing as the best solution, just what's best for your particular problem.