Revision as of 05:13, 6 April 2015

BigTable

Google System used for storing data of various Google Products, for instance Google Analytics, Google Finance, Orkut, Personalized Search, Writely, Google Earth and many more
Big table is
- Sparse
- Persistant
- Muti dimensional Sorted Map
It is indexed by
- Row Key: Every read or write of data under single row key is atomic. Each row range is called Tablet. Select Row key to get good locality for data access.
- Column Key: Grouped into sets called Column Families. Forms basic unit of Access Control.All data stored is of same type.Syntax used: family:qualifier
- Time Stamp:Each cell consists of multiple versions of same data which are indexed by Timestamps.In order to avoid collisions, Timestamps need to be generated by applications.
Big Table API: Provides functions for
- Creating and Deleting
  - Tables
  - Column Families
- Changing Cluster
- Changing Table
- Column Family metadata like Access Control Rights.
- Set of wrappers which allow Big Data to be used both as
  - Input source
  - Output Target
The timestamp mechanism in BIG table helps clients to access recent versions of data with simple accessing aspects of using row and column.
Parallel computation and cluster management system makes BIG table flexible and highly scalable.

Amazon's Key Value Store
Availability is the buzz word for Dynamo. Dynamo=Availability
Shifted Computer Science paradigm from caring about the consistency to availability.
Sacrifices consistency under certain failure scenarios.
Treats failure handling as normal case without impact on availability and performance.
Data is partitioned and replicated using consistent hashing and consistency is facilitated by use of object versioning.
This system has certain requirements such as:
- Query Model: Simple read and write operations to data item that are uniquely identified by a key.
- ACID properties: Atomicity, Consistency, Isolation, Durability.
- Efficiency: System needs to function on a commodity hardware infrastructure.
Service Level Agreements(SLA): They are a negotiated contract between a client and a service regarding characteristics related to systems. They are used in order to guarantee that in a bounded time period, an application can deliver it's functionality.
System Architecture: It consists of System Interface, Partitioning Algorithm, Replication,Data Versioning.
Successfully handles
- Server Failure
- Data Centre Failure
- Network Partitions
Allows service owners to customize their own storage systems according to their storage systems to meet the desired performance, durability and consistency SLAs.
Building block for highly available applications.

Facebook's storage system to fulfil needs of the Inbox Search Problem
Partitions data across the cluster using consistent hashing.
Distributed multi dimensional map indexed by a key
In it's data model:
- Columns grouped together into sets called column families. Column Families further of 2 types:
  - Simple column families
  - Super column families
API consists of :
- Insert
- Get
- Delete
System Architecture consists of :
- Partitioning: Takes place using consistent hashing
- Replication: Each item replicated at n hosts where "n" is the replication factor configured per system.
- Membership: Cluster membership is based on Scuttle butt which is a highly efficient anti-entropy Gossip based mechanism.The Membership further has sub part such as:
  - Failure Detection
- Bootstrapping
- Scaling the cluster
It can run cheap commodity hardware and handle high throughput
Its multiple usable structure makes it very scalable

Google's scalable, multi version, globally distributed database.
Has been built on top of the Google's Big table.
Provided data consistency and Supports SQL like Interface.
Uses a separate high-reliability time service to guarantee the correctness properties around concurrency control.
- The timestamps are utilized.
It shares data across machines and migrates data automatically across machines
Data Control Functions in spanner controls latency and performance

@@ Line 68: / Line 68: @@
 * Has been built on top of the Google's Big table.
 *Provided data consistency and Supports SQL like Interface.
-* Uses True time to guarantee the correctness properties around concurrency control.
+* Uses a separate high-reliability time service to guarantee the correctness properties around concurrency control.
 ** The timestamps are utilized.
 *It shares data across machines and migrates data automatically across machines
 *Data Control Functions in spanner controls latency and performance