DistOS 2015W Session 7: Difference between revisions
m →Chubby |
Formatting, updates and additions to both sections. |
||
Line 1: | Line 1: | ||
= Ceph = | = Ceph = | ||
Unlike GFS that was talked about previously, this is a general purpose distributed file system. It follows the same general model of distribution as GFS and Amoeba. | |||
== Main Components == | |||
* Client | |||
* | |||
* Cluster of Object Storage Devices (OSD) | |||
** It basically stores data and metadata and clients communicate directly with it to perform IO operations | |||
** Data is stored in objects (variable size chunks) | |||
* Meta-data Server (MDS) | |||
** It is used to manage the file and directories. Clients interact with it to perform metadata operations like open, rename. It manages the capabilities of a client. | |||
** Clients '''do not''' need to access MDSs to find where data is stored, improving scalability (more on that below) | |||
== Key Features == | |||
* Decoupled data and metadata | |||
* Dynamic Distributed Metadata Management | |||
** It distributes the metadata among multiple metadata servers using dynamic sub-tree partitioning, meaning folders that get used more often get their meta-data replicated to more servers, spreading the load. This happens completely automatically | |||
* Object based storage | |||
** Uses cluster of OSDs to form a Reliable Autonomic Distributed Object-Store (RADOS) for Ceph failure detection and recovery | |||
* CRUSH (Controlled, Replicated, Under Scalable, Hashing) | |||
** The hashing algorithm used to calculate the location of object instead of looking for them | |||
** This significantly reduces the load on the MDSs | |||
** Responsible for automatically moving data when ODSs are added or removed (can be simplified as ''location = CRUSH(filename) % num_servers'') | |||
** The CRUSH paper on Ceph’s website can be [http://ceph.com/papers/weil-crush-sc06.pdf viewed here] | |||
* RADOS (Reliable Autonomic Distributed Object-Store) is the object store for Ceph | |||
= Chubby = | = Chubby = | ||
It is | It is a coarse grained lock service, made and used internally by Google, that serves multiple clients with small number of servers (chubby cell). | ||
== System Components == | |||
* Chubby Cell | |||
** Handles the actual locks | |||
** Mainly consists of 5 servers known as replicas | |||
** Consensus protocol ([https://en.wikipedia.org/wiki/Paxos_(computer_science) Paxos]) is used to elect the master from replicas | |||
* Client | |||
** Used by programs to request and use locks | |||
== Main Features == | |||
* Implemented as a semi POSIX-compliant file-system with a 256KB file limit | |||
** Permissions are for files only, not folders, thus breaking some compatibility | |||
** Trivial to use by programs; just use the standard ''fopen()'' family of calls | |||
* Uses consensus algorithm among a set of servers to agree on who is the master that is in charge of the metadata | |||
* Meant for locks that last hours or days, not seconds (thus, "coarse grained") | |||
* A master server can handle tens of thousands of simultaneous connections | |||
** This can be further improved by using caching servers as most of the traffic are keep-alive messages | |||
* Used by GFS for electing master server | |||
* Also used by Google as a nameserver | |||
* | == Issues == | ||
* | * Due to the use of Paxos (the only algorithm for this problem), the chubby cell is limited to only 5 servers. While this limits fault-tolerance, in practice this is more than enough. | ||
* Since it has a file-system client interface, many programmers tend to abuse the system and need education (even inside Google) |
Revision as of 16:59, 28 February 2015
Ceph
Unlike GFS that was talked about previously, this is a general purpose distributed file system. It follows the same general model of distribution as GFS and Amoeba.
Main Components
- Client
- Cluster of Object Storage Devices (OSD)
- It basically stores data and metadata and clients communicate directly with it to perform IO operations
- Data is stored in objects (variable size chunks)
- Meta-data Server (MDS)
- It is used to manage the file and directories. Clients interact with it to perform metadata operations like open, rename. It manages the capabilities of a client.
- Clients do not need to access MDSs to find where data is stored, improving scalability (more on that below)
Key Features
- Decoupled data and metadata
- Dynamic Distributed Metadata Management
- It distributes the metadata among multiple metadata servers using dynamic sub-tree partitioning, meaning folders that get used more often get their meta-data replicated to more servers, spreading the load. This happens completely automatically
- Object based storage
- Uses cluster of OSDs to form a Reliable Autonomic Distributed Object-Store (RADOS) for Ceph failure detection and recovery
- CRUSH (Controlled, Replicated, Under Scalable, Hashing)
- The hashing algorithm used to calculate the location of object instead of looking for them
- This significantly reduces the load on the MDSs
- Responsible for automatically moving data when ODSs are added or removed (can be simplified as location = CRUSH(filename) % num_servers)
- The CRUSH paper on Ceph’s website can be viewed here
- RADOS (Reliable Autonomic Distributed Object-Store) is the object store for Ceph
Chubby
It is a coarse grained lock service, made and used internally by Google, that serves multiple clients with small number of servers (chubby cell).
System Components
- Chubby Cell
- Handles the actual locks
- Mainly consists of 5 servers known as replicas
- Consensus protocol (Paxos) is used to elect the master from replicas
- Client
- Used by programs to request and use locks
Main Features
- Implemented as a semi POSIX-compliant file-system with a 256KB file limit
- Permissions are for files only, not folders, thus breaking some compatibility
- Trivial to use by programs; just use the standard fopen() family of calls
- Uses consensus algorithm among a set of servers to agree on who is the master that is in charge of the metadata
- Meant for locks that last hours or days, not seconds (thus, "coarse grained")
- A master server can handle tens of thousands of simultaneous connections
- This can be further improved by using caching servers as most of the traffic are keep-alive messages
- Used by GFS for electing master server
- Also used by Google as a nameserver
Issues
- Due to the use of Paxos (the only algorithm for this problem), the chubby cell is limited to only 5 servers. While this limits fault-tolerance, in practice this is more than enough.
- Since it has a file-system client interface, many programmers tend to abuse the system and need education (even inside Google)