DistOS 2015W Session 11
BigTable
- Google System used for storing data of various Google Products, for instance Google Analytics, Google Finance, Orkut, Personalized Search, Writely, Google Earth and many more
- Big table is
- Sparse
- Persistant
- Muti dimensional Sorted Map
- It is indexed by
- Row Key: Every read or write of data under single row key is atomic. Each row range is called Tablet. Select Row key to get good locality for data access.
- Column Key: Grouped into sets called Column Families. Forms basic unit of Access Control.All data stored is of same type.Syntax used: family:qualifier
- Time Stamp:Each cell consists of multiple versions of same data which are indexed by Timestamps.In order to avoid collisions, Timestamps need to be generated by applications.
- Big Table API: Provides functions for
- Creating and Deleting
- Tables
- Column Families
- Changing Cluster
- Changing Table
- Column Family metadata like Access Control Rights.
- Set of wrappers which allow Big Data to be used both as
- Input source
- Output Target
- Creating and Deleting
- The timestamp mechanism in BIG table helps clients to access recent versions of data with simple accessing aspects of using row and column.
- Parallel computation and cluster management system makes BIG table flexible and highly scalable.
Dynamo
- Amazon's Key Value Store
- Availability is the buzz word for Dynamo. Dynamo=Availability
- Shifted Computer Science paradigm from caring about the consistency to availability.
- Sacrifices consistency under certain failure scenarios.
- Treats failure handling as normal case without impact on availability and performance.
- Data is partitioned and replicated using consistent hashing and consistency is facilitated by use of object versioning.
- This system has certain requirements such as:
- Query Model: Simple read and write operations to data item that are uniquely identified by a key.
- ACID properties: Atomicity, Consistency, Isolation, Durability.
- Efficiency: System needs to function on a commodity hardware infrastructure.
- Service Level Agreements(SLA): They are a negotiated contract between a client and a service regarding characteristics related to systems. They are used in order to guarantee that in a bounded time period, an application can deliver it's functionality.
- System Architecture: It consists of System Interface, Partitioning Algorithm, Replication,Data Versioning.
- Successfully handles
- Server Failure
- Data Centre Failure
- Network Partitions
- Allows service owners to customize their own storage systems according to their storage systems to meet the desired performance, durability and consistency SLAs.
- Building block for highly available applications.
Cassandra
- Facebook's storage system to fulfil needs of the Inbox Search Problem
- Partitions data across the cluster using consistent hashing.
- Distributed multi dimensional map indexed by a key
- In it's data model:
- Columns grouped together into sets called column families. Column Families further of 2 types:
- Simple column families
- Super column families
- Columns grouped together into sets called column families. Column Families further of 2 types:
- API consists of :
- Insert
- Get
- Delete
- System Architecture consists of :
- Partitioning: Takes place using consistent hashing
- Replication: Each item replicated at n hosts where "n" is the replication factor configured per system.
- Membership: Cluster membership is based on Scuttle butt which is a highly efficient anti-entropy Gossip based mechanism.The Membership further has sub part such as:
- Failure Detection
- Bootstrapping
- Scaling the cluster
- It can run cheap commodity hardware and handle high throughput
- Its multiple usable structure makes it very scalable
Spanner
- Google's scalable, multi version, globally distributed database.
- Has been built on top of the Google's Big table.
- Provided data consistency and Supports SQL like Interface.
- Uses True time to guarantee the correctness properties around concurrency control.
- The timestamps are utilized.
- It shares data across machines and migrates data automatically across machines