Revision as of 22:34, 30 March 2015

BigTable

Google System used for storing data of various Google Products, for instance Google Analytics, Google Finance, Orkut, Personalized Search, Writely, Google Earth and many more
Big table is
- Sparse
- Persistant
- Muti dimensional Sorted Map
It is indexed by
- Row Key: Every read or write of data under single row key is atomic. Each row range is called Tablet. Select Row key to get good locality for data access.
- Column Key: Grouped into sets called Column Families. Forms basic unit of Access Control.All data stored is of same type.Syntax used: family:qualifier
- Time Stamp:Each cell consists of multiple versions of same data which are indexed by Timestamps.In order to avoid collisions, Timestamps need to be generated by applications.
Big Table API: Provides functions for
- Creating and Deleting
  - Tables
  - Column Families
- Changing Cluster
- Changing Table
- Column Family metadata like Access Control Rights.
- Set of wrappers which allow Big Data to be used both as
  - Input source
  - Output Target

Amazon's Key Value Store
Availability is the buzz word for Dynamo. Dynamo=Availability
Shifted Computer Science paradigm from caring about the consistency to availability.
Sacrifices consistency under certain failure scenarios.
Treats failure handling as normal case without impact on availability and performance.
Data is partitioned and replicated using consistent hashing and consistency is facilitated by use of object versioning.
This system has certain requirements such as:
- Query Model: Simple read and write operations to data item that are uniquely identified by a key.
- ACID properties: Atomicity, Consistency, Isolation, Durability.
- Efficiency: System needs to function on a commodity hardware infrastructure.
Service Level Agreements(SLA): They are a negotiated contract between a client and a service regarding characteristics related to systems. They are used in order to guarantee that in a bounded time period, an application can deliver it's functionality.
System Architecture: It consists of System Interface, Partitioning Algorithm, Replication,Data Versioning.
Successfully handles
- Server Failure
- Data Centre Failure
- Network Partitions
Allows service owners to customize their own storage systems according to their storage systems to meet the desired performance, durability and consistency SLAs.
Building block for highly available applications.

Facebook's storage system to fulfil needs of the Inbox Search Problem
Partitions data across the cluster using consistent hashing.
Distributed multi dimensional map indexed by a key
In it's data model:
- Columns grouped together into sets called column families. Column Families further of 2 types:
  - Simple column families
  - Super column families
API consists of :
- Insert
- Get
- Delete
System Architecture consists of :
- Partitioning: Takes place using consistent hashing
- Replication: Each item replicated at n hosts where "n" is the replication factor configured per system.
- Membership: Cluster membership is based on Scuttle butt which is a highly efficient anti-entropy Gossip based mechanism.The Membership further has sub part such as:
  - Failure Detection
- Bootstrapping
- Scaling the cluster

@@ Line 61: / Line 61: @@
 =Spanner=
-*Provided data consistency and Supports SQL like Interface
+* Google's scalable, multi version, globally distributed database.
+*Provided data consistency and Supports SQL like Interface.
+*Main focus is managing cross-datacentre replicated data