DistOS 2014W Lecture 19
Dynamo
- Key value-store.
- Build a distributed storage system:
- Scale
- Simple: key-value
- Highly available
- Guarantee Service Level Agreements (SLA).
- high concurrent.
- no dynamic routing.
- 0-hop DHT: means it is doe not have information when deliver packet from node to another , it has direct link to the destination
- Dynamo sacrifices consistency under certain failure scenarios.
- it has partition algorithm.
- Consistent hashing: the output range of a hash function is treated as a fixed circular space or “ring”.
- Key is linear and the nodes is partition.
- ”Virtual Nodes”: Each node can be responsible for more than one virtual node.
- Each data item is replicated at N hosts.
- “preference list”: The list of nodes that is responsible for storing a particular key.
- Sacrifice strong consistency for availability
- it work with 100 servers,it is not more big.
Bigtable
- BigTable is a distributed storage system for managing structured data.
- Designed to scale to a very large size
- it stores the column together ,the raw is web pages and the column is the contents.
- Each pages have incoming links
- A BigTable is a sparse, distributed persistent multi-dimensional sorted map.
- it have a many columns and it is look as table.
- Each raw has arbitrary column.
- It is multi-dimension map.
- An SSTable provides a persistent,ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.
- Large tables broken into tablets at row boundaries and each raw Tablet holds contiguous range of rows.
- Metadata operations: Create/delete tables, column families, change metadata.
The question to consider is- can big table be used in a shopping cart type of scenario, where latency and availability are the main focus( or to rephrase the question- can big table be used in place of dynamo and vice- versa ). The answer is- it can be but it wouldnt be as good as dynamo at latency parameter, Dynamo would probably do a lot better than big table but the reason is that big table was not designed to work under such a scenario, its use cases were different. There is no one solution that can solve all the problems in the world of distributed file systems, there is no silver bullet, no - one size fits all. file systems are usually designed for specific use cases and they work best for them, later if the need be they can be molded to work on other scenarios as well and they may provide good enough performance for the later added goals as well but they would work best for the use cases,which were the targets in the beginnings.
General talk
- Read the introduction and conclusion for each paper and think about cases in the paper more than look to how the author solve the problem.