DistOS 2015W Session 11: Difference between revisions
| Line 68: | Line 68: | ||
* Has been built on top of the Google's Big table.  | * Has been built on top of the Google's Big table.  | ||
*Provided data consistency and Supports SQL like Interface.  | *Provided data consistency and Supports SQL like Interface.  | ||
* Uses   | * Uses a separate high-reliability time service to guarantee the correctness properties around concurrency control.  | ||
** The timestamps are utilized.  | ** The timestamps are utilized.  | ||
*It shares data across machines and migrates data automatically across machines  | *It shares data across machines and migrates data automatically across machines  | ||
*Data Control Functions in spanner controls latency and performance  | *Data Control Functions in spanner controls latency and performance  | ||
Revision as of 05:13, 6 April 2015
BigTable
- Google System used for storing data of various Google Products, for instance Google Analytics, Google Finance, Orkut, Personalized Search, Writely, Google Earth and many more
 - Big table is
- Sparse
 - Persistant
 - Muti dimensional Sorted Map
 
 - It is indexed by
- Row Key: Every read or write of data under single row key is atomic. Each row range is called Tablet. Select Row key to get good locality for data access.
 - Column Key: Grouped into sets called Column Families. Forms basic unit of Access Control.All data stored is of same type.Syntax used: family:qualifier
 - Time Stamp:Each cell consists of multiple versions of same data which are indexed by Timestamps.In order to avoid collisions, Timestamps need to be generated by applications.
 
 - Big Table API: Provides functions for
- Creating and Deleting
- Tables
 - Column Families
 
 - Changing Cluster
 - Changing Table
 - Column Family metadata like Access Control Rights.
 - Set of wrappers which allow Big Data to be used both as
- Input source
 - Output Target
 
 
 - Creating and Deleting
 - The timestamp mechanism in BIG table helps clients to access recent versions of data with simple accessing aspects of using row and column.
 - Parallel computation and cluster management system makes BIG table flexible and highly scalable.
 
Dynamo
- Amazon's Key Value Store
 - Availability is the buzz word for Dynamo. Dynamo=Availability
 - Shifted Computer Science paradigm from caring about the consistency to availability.
 - Sacrifices consistency under certain failure scenarios.
 - Treats failure handling as normal case without impact on availability and performance.
 - Data is partitioned and replicated using consistent hashing and consistency is facilitated by use of object versioning.
 - This system has certain requirements such as:
- Query Model: Simple read and write operations to data item that are uniquely identified by a key.
 - ACID properties: Atomicity, Consistency, Isolation, Durability.
 - Efficiency: System needs to function on a commodity hardware infrastructure.
 
 - Service Level Agreements(SLA): They are a negotiated contract between a client and a service regarding characteristics related to systems. They are used in order to guarantee that in a bounded time period, an application can deliver it's functionality.
 - System Architecture: It consists of System Interface, Partitioning Algorithm, Replication,Data Versioning.
 - Successfully handles
- Server Failure
 - Data Centre Failure
 - Network Partitions
 
 - Allows service owners to customize their own storage systems according to their storage systems to meet the desired performance, durability and consistency SLAs.
 - Building block for highly available applications.
 
Cassandra
- Facebook's storage system to fulfil needs of the Inbox Search Problem
 - Partitions data across the cluster using consistent hashing.
 - Distributed multi dimensional map indexed by a key
 - In it's data model:
- Columns grouped together into sets called column families. Column Families further of 2 types:
- Simple column families
 - Super column families
 
 
 - Columns grouped together into sets called column families. Column Families further of 2 types:
 - API consists of :
- Insert
 - Get
 - Delete
 
 - System Architecture consists of :
- Partitioning: Takes place using consistent hashing
 - Replication: Each item replicated at n hosts where "n" is the replication factor configured per system.
 - Membership: Cluster membership is based on Scuttle butt which is a highly efficient anti-entropy Gossip based mechanism.The Membership further has sub part such as:
- Failure Detection
 
 - Bootstrapping
 - Scaling the cluster
 
 - It can run cheap commodity hardware and handle high throughput
 - Its multiple usable structure makes it very scalable
 
Spanner
- Google's scalable, multi version, globally distributed database.
 - Has been built on top of the Google's Big table.
 - Provided data consistency and Supports SQL like Interface.
 - Uses a separate high-reliability time service to guarantee the correctness properties around concurrency control.
- The timestamps are utilized.
 
 - It shares data across machines and migrates data automatically across machines
 - Data Control Functions in spanner controls latency and performance