Difference between revisions of "DistOS-2011W Naming and Locating Objects in Distributed Systems"

From Soma-notes
Jump to navigation Jump to search
(Created page with "Placeholder for David Barrera's literature review")
 
Line 1: Line 1:
Placeholder for David Barrera's literature review
This paper is a survey of existing approaches to naming and locating
resources in distributed file systems. We survey proposals from the past 20
years and find that while there have been many improvements in the hardware
that powers distributed file systems, there are only a few well known
proposals for dealing with resource location an naming.
 
 
=Introduction=
The ability to name resources is important in any file system. Mapping machine
readable names to human readable names allows users to forget about the way
the operating system (OS) is handling file access, and focus on completing
desired tasks.
 
In traditional file systems, users are mostly responsible for creating
meaningful file hierarchies for storing and later searching for files. Users
must be aware of file system restrictions (e.g., file name length, file size,
etc.). The
underlying file system is only in charge of moving data to or from physical
storage media. Distributed file systems offer a series of advantages to users
(e.g., increased storage space and data reliability), but must be designed
such that end-users are not aware of all the logic and processing ocurring in
the background. Indeed, a distributed file system loses its appeal if the user
is required to do all the heavy lifting.
 
Take for example, an end-user wanting to access a PDF document. In a local
file system, the user must only locate the PDF file in the file hierarchy, and
retrieve from disk. In a distributed file system, the PDF file might be
stored on a remote server, or perhaps stored multiple times on multiple
servers. The problem then becomes how to enable end-users to locate the
correct copy of a file amongst a large volume of shared data.
 
This paper focuses on two important aspects of distributed file systems: (1)
how files are named or identified uniquely; and (2) how files are found by
clients or metadata servers once they are stored in the network. We survey
distributed file systems and file system designs from as early as 1989 and as
recently as 2006. We find that there are a relatively small number of ways a
distributed file system can approach the problem of naming and locating files,
and the selected approach is always dependent on the requirements of the
system.
 
The remainder of this paper is structured as follows. Section 1
discusses approaches to naming resources on a distributed file system. Section
4 looks at how data can be distributed in a network and later
located for client use. We conclude in Section 5.
 
 
=Naming Resources=(1)
On non-distributed systems (e.g., a stand-alone desktop computer), file systems
use
an object's absolute path as a unique identifier for that object in the file
system. This usually translates to meaning that there can't be two objects with
the same name in the same location (e.g., a directory
like =/home/dbarrera/files/=  can't contain two files called
=file1= ). In distributed file systems, there is an obvious need for
allowing multiple files with the same human-readable name, and perhaps even the
same absolute path (although relative to a particular client) as other clients
sharing storage on the system. This section reviews
methods used by existing distributed file systems to handle object naming at a
massive (sometimes global) scale.
 
Depending on the requirements of the file system (maximum number of clients,
concurrent read/writes, etc.), different approaches to naming might be taken.
Some file systems such as Coda [[coda]], aim to mimic the UNIX-like file
naming. Others systems have relaxed POSIX-like behaviour to allow for
better scalability and speed.
 
==Local Naming==
The Sun Network File System (NFS) specifies that each client sees a UNIX file
namespace with a private root. Due to each client being free to manage
its own namespace, several workstations mounting the same remote directory
might not have the same view of the files contained in that directory. However,
if file-sharing or location transparency is required, it can be achieved by
convention (e.g., users agreeing on calling a file a specific name) rather than
by design.
 
One of the first distributed file systems, the Apollo DOMAIN File System
[[apollo]] uses 64-bit unique identifiers (UIDs) for every object in the
system. Each Apollo client also has a UID created the time of its manufacture.
When a new file is created, the UID for that file is derived from the time and
UID of the file's workstation (this guarantees uniqueness of UIDs per fil
e without a
central server assigning them).
 
The Andrew file system [[andrew]] uses an internal 96-bit identifier for
uniquely identifying files. These identifiers are used in the background to
refer to files, but are never shown to users. Andrew clients see a partitioned
namespace comprised of a local and shared namespace. The shared namespace is
identical on all workstations, managed by a central server which can be
replicated. The local namespace is typically only used for files required to
boot an Andrew client, and to initialize the distributed client operation.
 
==Cryptographic Naming==(2)
OceanStore [[oceanstore]] stores objects at the lowest level by identifying
them with a
globally unique identifier (GUID). GUIDs are convenient in distributed
systems because they do not require a central authority to give them out. This
allows any client on the system to autonomously generate a valid GUID
with low probability of collisions (GUIDs are typically long bit strings e.g.,
more than 128 bits). At the same time, the benefit of an autonomous,
de-centralized namespace management allows for malicious clients to hijack
someone else's namespace and intentionally create collisions. To address this
issue, OceanStore uses a technique proposed by Mazieres et al.\ [[mazieres]]
called
''self-certifying path names'' .
 
Self-certifying pathnames have all the benefits of public key cryptography
without the burden of key management, which is known to be difficult,
especially at a very large scale. One of the design goals of self-certifying
pathnames is for clients to cryptographically verify the contents of any file
on the network, without requiring exernal information. The novelty of this
approach is that file names inherently contain all information necessary to
communicate with remote servers. Essentially, an object's GUID is the secure
hash (SHA-1 or similar) of the object's owner's key and some human readable
name. By embedding a client key into the GUID, servers and other clients can
verify the identity and ownership of an object without querying a
third-party server.
 
Freenet [[freenet]] also uses keypair-based naming but in a slightly
different way than OceanStore. Freenet identifies all files by a binary key
which is obtained by applying a hash function. There are three types of keys in
this distributed file system:
 
\paragraph{Keyword-signed key (KSK)} This is the simplest identifier because it
is derived from an arbitrary text string chosen by the user who is storing the
file on the network. A user storing a PDF document might use the text string
"freenet/distributed/file/system" to describe the file. The string is used to
deterministically generate a private/public keypair. The public part of the key
is hashed and becomes the file identifier.
 
We note that files can be recovered by guessing or bruteforcing the text
string. Also, nothing stops two different users from coming up with the same
descriptive string, and the second user's file would be rejected by the system,
as there would be a collision in the namespace.
 
\paragraph{Signed-subspace key (SSK)} This method enables personal namespaces
for users. For this to work, users generate a public/private keypair using a
good random number generator. The user also creates a descriptive text string,
but in this case, it is XORed with the public key to generate the file key.
This method allows users to manage their own namespace (i.e., collisions can
still occur locally if the user picks the same string for two files). Users can
also
publish a list of keywords and a public key if they want to make those files
publicly available.
 
\paragraph{Content-hash key (CHK)} In this method, the file key is derived by
hashing the contents of file. Files are also encrypted with a random encryption
key specific to that file. For others to retrieve the file, the owner makes
available the file hash along with the decryption key.
 
==Hierarchical naming==(3)
Cheriton et al.\ [[decentralizing]] suggest naming objects using a long
name which includes multiple pieces of information: (1) the resource's name
and location on the file server where it resides; (2) the organization where
that file server is located; and (3) a global administrative domain
representing all the organizations participating the distributed file system.
For example a file name of =[edu/standford/server4/bin/listdir=  is split
into:
\begin{center}
\begin{math}
\underbrace{[edu}_{\textrm{Gobal
domain}} \underbrace{/stanford/server4}_{\textrm{Organization}}
\underbrace{/bin/listdir}_{\textrm{Directory and file}}
\end{math}
\end{center}
 
This naming scheme gives clients all the necessary information (using only the
file name) to locate a file in a globally distributed file system. While this
may seem like a good solution, there a few inherent limitations to the
proposal.
 
First, file replication and load balancing can only be done at the lowest level
(i.e., in the file server selected by the organization hosting the file). This
can lead to a bottleneck when multiple files in the same organization become
"hot". The authors suggest using caching and multicast to improve performance
and avoid congestion on inter-organization links. Second, it requires all
organizations participating in the system to agree or regulate the common
namespace, much like the current Domain Name System (DNS). For this to work
there must be an organization in which each stakeholder in the system is
equally represented. While systems like these do exist currently (e.g.,
ICANN\footnote{The Internet Corporation for Assigned Names and Numbers (ICANN)
is a non-profit organization that represents regional registrars, the Internet
Engineering Task Force (IETF), Internet users and providers to help keep the
Internet secure, stable and inter-operable.}), they have large amounts of
administrative overhead and therefore limit the speed at which changes to
deployed implementations can take place.
 
One advantage of the approach of Cheriton et al.\ is that names and directory
structures must only be unique within an organization/server. The system as a
whole does not have to keep track of every organization-level implementation,
yet different organizations should still be able to exchange data.
 
==Metadata Servers==
The Google File System (GFS) [[googlefs]] takes a different approach to
naming files. GFS assumes that all the clients communicate with a single master
server, who keeps a table mapping full pathnames to metadata (file locks and
location). The namespace is therefore centrally managed, and all clients must
register file operations with the master before they can be performed. While
this architecture has an obvious central point of failure (which can be
addressed by replication), it has the advantage of not having to deal with a
distributed namespace. This central design also has the advantage of improving
data consistency across multi-level distribution nodes. It also allows data
to be moved to optimal nodes to increase performance or distribute load. It's
worth noting that lookup tables are a fundamentally different way to find
contents in a directory as compared to UNIX ''inodes''  and related data
structures. This approach has inherent limitations such as not being able to
support symlinks .
 
Ceph [[ceph]] client nodes use near-POSIX file system interfaces which are
relayed back to a central metadata cluster. The metadata cluster is responsible
for managing the system-wide namespace, coordinating security and verifying
consistency. Ceph decouples data from metadata which enables the system to also
distribute metadata servers themselves. The metadata servers store pointers to
"object-storage clusters" which hold the actual data portion of the file. The
metadata servers also handle file read and write operations, which then
redirect clients to the appropriate object storage cluster or device.
 
=Locating Resources=(4)
==Local File Systems==
In some distributed systems, files are copied locally and replicated to remote
servers in the background. NFS [[survey]] is one example where clients
mount the remote file system locally. The remote directory structure is mapped
on to a local namespace which makes files transparently accessible to
clients. In this scheme, there is no need for distributing indexes or metadata,
since all files appear to be local. A client can find files on the
"distributed" file system in the same way local files are found.
 
 
==Metadata Servers==
File systems  that use lookup tables for storing the
location and
metadatada of files (e.g., [[googlefs,ceph]]) can locate resources trivially
by
querying the lookup table. The table usually contains a pointer to either the
file itself or a server hosting that file who can in turn handle the file
operation request.
 
A very basic implementation of a metadata lookup is used in the Apollo Domain
File System [[apollo]]. A central name server maps client-readable strings
(e.g., =/home/dbarrera/file1= ) to UIDs. The name server can be
distributed by replicating it a multiple locations, allowing clients to query
the nearest server instead of a central one.
 
The Andrew file system [[andrew]] uses unique file identifiers to
populate a ''location database''  on the central server which maps file
identifiers to locations. The server is therefore responsible for forwarding
file access requests to the correct client hosting that file.
 
==Distributed Index Search==
Systems like Freenet [[freenet]] by design want to make it difficult for
unauthorized users to access restricted files. This is a difficult problem,
since the system aims to be highly distributed, but at the same time provide
guarantees that files won't be read or modified by unauthorized third-parties.
However, Freenet has developed an interesting approach to locating files: when
a file is requested from the network, a user must first obtain or calculate the
file key (see Section 1). The user's node requests that file
from neighboring nodes, who in turn check if the file is stored locally, and if
not forward the request to the next nearest neighbor. If a node cannot forward
a request any longer (because a loop would be created or all nodes have
already been queried), then a failure message is transmitted back to the
previous node. If a file is found at some point along the request path,
then the file is sent back through all the intermediate nodes until it reaches
the request originator, which allows these intermediate nodes to keep a copy of
the file as a cache. The next time that file key is requested, a node which is
closer might have it, which will increase the retrieval speed. Nodes
"forget" about cached copies of files in a least recently used (LRU) manner,
allowing the network to automatically  balance load and use available space
optimally.
 
Distributing a file index was proposed Plaxton et al.\ [[nearby]] as well.
Their proposal however attempts have all nodes in the network maintain a
''virtual tree'' . The tree information is distributed such that each node
knows about copies of files residing on itself and all nodes that form the
subtree rooted at that node. All nodes are constantly being updated with
neighbor information, meaning that new nodes slowly obtain tree information to
become the roots of their subtrees. This method has the advantage of
distributing load and providing a hierarchical search functionality that can
use well known algorithms (BFS, DFS) to find resources on a network.
 
==Pseudo-random Data Distribution==
Ceph [[ceph]] distributes data through a method that maximizes bandwidth and
efficiently uses storage resources. Ceph also avoids data imbalance (e.g.,
new devices are under-used) and load-asymmetries (e.g., often requested data
placed on only new devices) with a globally known algorithm called CRUSH
(Controlled Replication Under Scalable Hashing). By using a predefined number
of ''placement groups''  (the smallest unit of object storage groups), the
CRUSH algorithm stores and replicates data across the network in a
pseudo-random way. This algorithm tells the metadata servers both where the
data should be stored and where it can be found later, which helps clients and
metadata servers in locating resources.
 
=Conclusions=(5)
This paper has presented a brief survey of distributed file system research
conducted over the past 20 years. A wide range of distributed file systems have
been designed to have varying levels of scalability, usability and efficiency.
Depending on the requirements of a distributed file system, different approaches
may be taken to address two main concerns: file naming and file retrieval.
Unfortunately there is no clear winner in either of these categories, which
means that selecting the "right" method for a given file system will always
depend on the requirements and users of that system.

Revision as of 13:53, 25 February 2011

This paper is a survey of existing approaches to naming and locating resources in distributed file systems. We survey proposals from the past 20 years and find that while there have been many improvements in the hardware that powers distributed file systems, there are only a few well known proposals for dealing with resource location an naming.


Introduction

The ability to name resources is important in any file system. Mapping machine readable names to human readable names allows users to forget about the way the operating system (OS) is handling file access, and focus on completing desired tasks.

In traditional file systems, users are mostly responsible for creating meaningful file hierarchies for storing and later searching for files. Users must be aware of file system restrictions (e.g., file name length, file size, etc.). The underlying file system is only in charge of moving data to or from physical storage media. Distributed file systems offer a series of advantages to users (e.g., increased storage space and data reliability), but must be designed such that end-users are not aware of all the logic and processing ocurring in the background. Indeed, a distributed file system loses its appeal if the user is required to do all the heavy lifting.

Take for example, an end-user wanting to access a PDF document. In a local file system, the user must only locate the PDF file in the file hierarchy, and retrieve from disk. In a distributed file system, the PDF file might be stored on a remote server, or perhaps stored multiple times on multiple servers. The problem then becomes how to enable end-users to locate the correct copy of a file amongst a large volume of shared data.

This paper focuses on two important aspects of distributed file systems: (1) how files are named or identified uniquely; and (2) how files are found by clients or metadata servers once they are stored in the network. We survey distributed file systems and file system designs from as early as 1989 and as recently as 2006. We find that there are a relatively small number of ways a distributed file system can approach the problem of naming and locating files, and the selected approach is always dependent on the requirements of the system.

The remainder of this paper is structured as follows. Section 1 discusses approaches to naming resources on a distributed file system. Section 4 looks at how data can be distributed in a network and later located for client use. We conclude in Section 5.


=Naming Resources=(1) On non-distributed systems (e.g., a stand-alone desktop computer), file systems use an object's absolute path as a unique identifier for that object in the file system. This usually translates to meaning that there can't be two objects with the same name in the same location (e.g., a directory like =/home/dbarrera/files/= can't contain two files called =file1= ). In distributed file systems, there is an obvious need for allowing multiple files with the same human-readable name, and perhaps even the same absolute path (although relative to a particular client) as other clients sharing storage on the system. This section reviews methods used by existing distributed file systems to handle object naming at a massive (sometimes global) scale.

Depending on the requirements of the file system (maximum number of clients, concurrent read/writes, etc.), different approaches to naming might be taken. Some file systems such as Coda coda, aim to mimic the UNIX-like file naming. Others systems have relaxed POSIX-like behaviour to allow for better scalability and speed.

Local Naming

The Sun Network File System (NFS) specifies that each client sees a UNIX file namespace with a private root. Due to each client being free to manage its own namespace, several workstations mounting the same remote directory might not have the same view of the files contained in that directory. However, if file-sharing or location transparency is required, it can be achieved by convention (e.g., users agreeing on calling a file a specific name) rather than by design.

One of the first distributed file systems, the Apollo DOMAIN File System apollo uses 64-bit unique identifiers (UIDs) for every object in the system. Each Apollo client also has a UID created the time of its manufacture. When a new file is created, the UID for that file is derived from the time and UID of the file's workstation (this guarantees uniqueness of UIDs per fil e without a central server assigning them).

The Andrew file system andrew uses an internal 96-bit identifier for uniquely identifying files. These identifiers are used in the background to refer to files, but are never shown to users. Andrew clients see a partitioned namespace comprised of a local and shared namespace. The shared namespace is identical on all workstations, managed by a central server which can be replicated. The local namespace is typically only used for files required to boot an Andrew client, and to initialize the distributed client operation.

==Cryptographic Naming==(2) OceanStore oceanstore stores objects at the lowest level by identifying them with a globally unique identifier (GUID). GUIDs are convenient in distributed systems because they do not require a central authority to give them out. This allows any client on the system to autonomously generate a valid GUID with low probability of collisions (GUIDs are typically long bit strings e.g., more than 128 bits). At the same time, the benefit of an autonomous, de-centralized namespace management allows for malicious clients to hijack someone else's namespace and intentionally create collisions. To address this issue, OceanStore uses a technique proposed by Mazieres et al.\ mazieres called self-certifying path names .

Self-certifying pathnames have all the benefits of public key cryptography without the burden of key management, which is known to be difficult, especially at a very large scale. One of the design goals of self-certifying pathnames is for clients to cryptographically verify the contents of any file on the network, without requiring exernal information. The novelty of this approach is that file names inherently contain all information necessary to communicate with remote servers. Essentially, an object's GUID is the secure hash (SHA-1 or similar) of the object's owner's key and some human readable name. By embedding a client key into the GUID, servers and other clients can verify the identity and ownership of an object without querying a third-party server.

Freenet freenet also uses keypair-based naming but in a slightly different way than OceanStore. Freenet identifies all files by a binary key which is obtained by applying a hash function. There are three types of keys in this distributed file system:

\paragraph{Keyword-signed key (KSK)} This is the simplest identifier because it is derived from an arbitrary text string chosen by the user who is storing the file on the network. A user storing a PDF document might use the text string "freenet/distributed/file/system" to describe the file. The string is used to deterministically generate a private/public keypair. The public part of the key is hashed and becomes the file identifier.

We note that files can be recovered by guessing or bruteforcing the text string. Also, nothing stops two different users from coming up with the same descriptive string, and the second user's file would be rejected by the system, as there would be a collision in the namespace.

\paragraph{Signed-subspace key (SSK)} This method enables personal namespaces for users. For this to work, users generate a public/private keypair using a good random number generator. The user also creates a descriptive text string, but in this case, it is XORed with the public key to generate the file key. This method allows users to manage their own namespace (i.e., collisions can still occur locally if the user picks the same string for two files). Users can also publish a list of keywords and a public key if they want to make those files publicly available.

\paragraph{Content-hash key (CHK)} In this method, the file key is derived by hashing the contents of file. Files are also encrypted with a random encryption key specific to that file. For others to retrieve the file, the owner makes available the file hash along with the decryption key.

==Hierarchical naming==(3) Cheriton et al.\ decentralizing suggest naming objects using a long name which includes multiple pieces of information: (1) the resource's name and location on the file server where it resides; (2) the organization where that file server is located; and (3) a global administrative domain representing all the organizations participating the distributed file system. For example a file name of =[edu/standford/server4/bin/listdir= is split into: \begin{center} \begin{math}

\underbrace{[edu}_{\textrm{Gobal

domain}} \underbrace{/stanford/server4}_{\textrm{Organization}} \underbrace{/bin/listdir}_{\textrm{Directory and file}} \end{math} \end{center}

This naming scheme gives clients all the necessary information (using only the file name) to locate a file in a globally distributed file system. While this may seem like a good solution, there a few inherent limitations to the proposal.

First, file replication and load balancing can only be done at the lowest level (i.e., in the file server selected by the organization hosting the file). This can lead to a bottleneck when multiple files in the same organization become "hot". The authors suggest using caching and multicast to improve performance and avoid congestion on inter-organization links. Second, it requires all organizations participating in the system to agree or regulate the common namespace, much like the current Domain Name System (DNS). For this to work there must be an organization in which each stakeholder in the system is equally represented. While systems like these do exist currently (e.g., ICANN\footnote{The Internet Corporation for Assigned Names and Numbers (ICANN) is a non-profit organization that represents regional registrars, the Internet Engineering Task Force (IETF), Internet users and providers to help keep the Internet secure, stable and inter-operable.}), they have large amounts of administrative overhead and therefore limit the speed at which changes to deployed implementations can take place.

One advantage of the approach of Cheriton et al.\ is that names and directory structures must only be unique within an organization/server. The system as a whole does not have to keep track of every organization-level implementation, yet different organizations should still be able to exchange data.

Metadata Servers

The Google File System (GFS) googlefs takes a different approach to naming files. GFS assumes that all the clients communicate with a single master server, who keeps a table mapping full pathnames to metadata (file locks and location). The namespace is therefore centrally managed, and all clients must register file operations with the master before they can be performed. While this architecture has an obvious central point of failure (which can be addressed by replication), it has the advantage of not having to deal with a distributed namespace. This central design also has the advantage of improving data consistency across multi-level distribution nodes. It also allows data to be moved to optimal nodes to increase performance or distribute load. It's worth noting that lookup tables are a fundamentally different way to find contents in a directory as compared to UNIX inodes and related data structures. This approach has inherent limitations such as not being able to support symlinks .

Ceph ceph client nodes use near-POSIX file system interfaces which are relayed back to a central metadata cluster. The metadata cluster is responsible for managing the system-wide namespace, coordinating security and verifying consistency. Ceph decouples data from metadata which enables the system to also distribute metadata servers themselves. The metadata servers store pointers to "object-storage clusters" which hold the actual data portion of the file. The metadata servers also handle file read and write operations, which then redirect clients to the appropriate object storage cluster or device.

=Locating Resources=(4)

Local File Systems

In some distributed systems, files are copied locally and replicated to remote servers in the background. NFS survey is one example where clients mount the remote file system locally. The remote directory structure is mapped on to a local namespace which makes files transparently accessible to clients. In this scheme, there is no need for distributing indexes or metadata, since all files appear to be local. A client can find files on the "distributed" file system in the same way local files are found.


Metadata Servers

File systems that use lookup tables for storing the location and metadatada of files (e.g., googlefs,ceph) can locate resources trivially by querying the lookup table. The table usually contains a pointer to either the file itself or a server hosting that file who can in turn handle the file operation request.

A very basic implementation of a metadata lookup is used in the Apollo Domain File System apollo. A central name server maps client-readable strings (e.g., =/home/dbarrera/file1= ) to UIDs. The name server can be distributed by replicating it a multiple locations, allowing clients to query the nearest server instead of a central one.

The Andrew file system andrew uses unique file identifiers to populate a location database on the central server which maps file identifiers to locations. The server is therefore responsible for forwarding file access requests to the correct client hosting that file.

Distributed Index Search

Systems like Freenet freenet by design want to make it difficult for unauthorized users to access restricted files. This is a difficult problem, since the system aims to be highly distributed, but at the same time provide guarantees that files won't be read or modified by unauthorized third-parties. However, Freenet has developed an interesting approach to locating files: when a file is requested from the network, a user must first obtain or calculate the file key (see Section 1). The user's node requests that file from neighboring nodes, who in turn check if the file is stored locally, and if not forward the request to the next nearest neighbor. If a node cannot forward a request any longer (because a loop would be created or all nodes have already been queried), then a failure message is transmitted back to the previous node. If a file is found at some point along the request path, then the file is sent back through all the intermediate nodes until it reaches the request originator, which allows these intermediate nodes to keep a copy of the file as a cache. The next time that file key is requested, a node which is closer might have it, which will increase the retrieval speed. Nodes "forget" about cached copies of files in a least recently used (LRU) manner, allowing the network to automatically balance load and use available space optimally.

Distributing a file index was proposed Plaxton et al.\ nearby as well. Their proposal however attempts have all nodes in the network maintain a virtual tree . The tree information is distributed such that each node knows about copies of files residing on itself and all nodes that form the subtree rooted at that node. All nodes are constantly being updated with neighbor information, meaning that new nodes slowly obtain tree information to become the roots of their subtrees. This method has the advantage of distributing load and providing a hierarchical search functionality that can use well known algorithms (BFS, DFS) to find resources on a network.

Pseudo-random Data Distribution

Ceph ceph distributes data through a method that maximizes bandwidth and efficiently uses storage resources. Ceph also avoids data imbalance (e.g., new devices are under-used) and load-asymmetries (e.g., often requested data placed on only new devices) with a globally known algorithm called CRUSH (Controlled Replication Under Scalable Hashing). By using a predefined number of placement groups (the smallest unit of object storage groups), the CRUSH algorithm stores and replicates data across the network in a pseudo-random way. This algorithm tells the metadata servers both where the data should be stored and where it can be found later, which helps clients and metadata servers in locating resources.

=Conclusions=(5) This paper has presented a brief survey of distributed file system research conducted over the past 20 years. A wide range of distributed file systems have been designed to have varying levels of scalability, usability and efficiency. Depending on the requirements of a distributed file system, different approaches may be taken to address two main concerns: file naming and file retrieval. Unfortunately there is no clear winner in either of these categories, which means that selecting the "right" method for a given file system will always depend on the requirements and users of that system.