DistOS 2015W Session 7: Difference between revisions
m →Chubby  | 
				 Formatting, updates and additions to both sections.  | 
				||
| Line 1: | Line 1: | ||
= Ceph =  | = Ceph =  | ||
Unlike GFS that was talked about previously, this is a general purpose distributed file system. It follows the same general model of distribution as GFS and Amoeba.  | |||
== Main Components ==  | |||
* Client  | |||
*   | |||
* Cluster of Object Storage Devices (OSD)  | |||
** It basically stores data and metadata and clients communicate directly with it to perform IO operations  | |||
** Data is stored in objects (variable size chunks)  | |||
* Meta-data Server (MDS)  | |||
** It is used to manage the file and directories. Clients interact with it to perform metadata operations like open, rename. It manages the capabilities of a client.  | |||
** Clients '''do not''' need to access MDSs to find where data is stored, improving scalability (more on that below)  | |||
== Key Features ==  | |||
* Decoupled data and metadata  | |||
* Dynamic Distributed Metadata Management  | |||
** It distributes the metadata among multiple metadata servers using dynamic sub-tree partitioning, meaning folders that get used more often get their meta-data replicated to more servers, spreading the load. This happens completely automatically  | |||
* Object based storage  | |||
** Uses cluster of OSDs to form a Reliable Autonomic Distributed Object-Store (RADOS) for Ceph failure detection and recovery  | |||
* CRUSH (Controlled, Replicated, Under Scalable, Hashing)  | |||
** The hashing algorithm used to calculate the location of object instead of looking for them  | |||
** This significantly reduces the load on the MDSs  | |||
** Responsible for automatically moving data when ODSs are added or removed (can be simplified as ''location = CRUSH(filename) % num_servers'')  | |||
** The CRUSH paper on Ceph’s website can be [http://ceph.com/papers/weil-crush-sc06.pdf viewed here]  | |||
* RADOS (Reliable Autonomic Distributed Object-Store) is the object store for Ceph  | |||
= Chubby =  | = Chubby =  | ||
It is   | It is a coarse grained lock service, made and used internally by Google, that serves multiple clients with small number of servers (chubby cell).  | ||
== System Components ==  | |||
* Chubby Cell  | |||
** Handles the actual locks  | |||
** Mainly consists of 5 servers known as replicas  | |||
** Consensus protocol ([https://en.wikipedia.org/wiki/Paxos_(computer_science) Paxos]) is used to elect the master from replicas  | |||
* Client  | |||
** Used by programs to request and use locks  | |||
== Main Features ==  | |||
* Implemented as a semi POSIX-compliant file-system with a 256KB file limit  | |||
** Permissions are for files only, not folders, thus breaking some compatibility  | |||
** Trivial to use by programs; just use the standard ''fopen()'' family of calls  | |||
* Uses consensus algorithm among a set of servers to agree on who is the master that is in charge of the metadata  | |||
* Meant for locks that last hours or days, not seconds (thus, "coarse grained")  | |||
* A master server can handle tens of thousands of simultaneous connections  | |||
** This can be further improved by using caching servers as most of the traffic are keep-alive messages  | |||
* Used by GFS for electing master server  | |||
* Also used by Google as a nameserver  | |||
*   | == Issues ==  | ||
*   | * Due to the use of Paxos (the only algorithm for this problem), the chubby cell is limited to only 5 servers. While this limits fault-tolerance, in practice this is more than enough.  | ||
* Since it has a file-system client interface, many programmers tend to abuse the system and need education (even inside Google)  | |||
Revision as of 16:59, 28 February 2015
Ceph
Unlike GFS that was talked about previously, this is a general purpose distributed file system. It follows the same general model of distribution as GFS and Amoeba.
Main Components
- Client
 
- Cluster of Object Storage Devices (OSD)
- It basically stores data and metadata and clients communicate directly with it to perform IO operations
 - Data is stored in objects (variable size chunks)
 
 
- Meta-data Server (MDS)
- It is used to manage the file and directories. Clients interact with it to perform metadata operations like open, rename. It manages the capabilities of a client.
 - Clients do not need to access MDSs to find where data is stored, improving scalability (more on that below)
 
 
Key Features
- Decoupled data and metadata
 
- Dynamic Distributed Metadata Management
 
- It distributes the metadata among multiple metadata servers using dynamic sub-tree partitioning, meaning folders that get used more often get their meta-data replicated to more servers, spreading the load. This happens completely automatically
 
- Object based storage
- Uses cluster of OSDs to form a Reliable Autonomic Distributed Object-Store (RADOS) for Ceph failure detection and recovery
 
 
- CRUSH (Controlled, Replicated, Under Scalable, Hashing)
- The hashing algorithm used to calculate the location of object instead of looking for them
 - This significantly reduces the load on the MDSs
 - Responsible for automatically moving data when ODSs are added or removed (can be simplified as location = CRUSH(filename) % num_servers)
 - The CRUSH paper on Ceph’s website can be viewed here
 
 
- RADOS (Reliable Autonomic Distributed Object-Store) is the object store for Ceph
 
Chubby
It is a coarse grained lock service, made and used internally by Google, that serves multiple clients with small number of servers (chubby cell).
System Components
- Chubby Cell
- Handles the actual locks
 - Mainly consists of 5 servers known as replicas
 - Consensus protocol (Paxos) is used to elect the master from replicas
 
 
- Client
- Used by programs to request and use locks
 
 
Main Features
- Implemented as a semi POSIX-compliant file-system with a 256KB file limit
- Permissions are for files only, not folders, thus breaking some compatibility
 - Trivial to use by programs; just use the standard fopen() family of calls
 
 
- Uses consensus algorithm among a set of servers to agree on who is the master that is in charge of the metadata
 
- Meant for locks that last hours or days, not seconds (thus, "coarse grained")
 
- A master server can handle tens of thousands of simultaneous connections
- This can be further improved by using caching servers as most of the traffic are keep-alive messages
 
 
- Used by GFS for electing master server
 
- Also used by Google as a nameserver
 
Issues
- Due to the use of Paxos (the only algorithm for this problem), the chubby cell is limited to only 5 servers. While this limits fault-tolerance, in practice this is more than enough.
 - Since it has a file-system client interface, many programmers tend to abuse the system and need education (even inside Google)