COMP 3000 Essay 1 2010 Question 11
Question
Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.
Answer
Introduction
Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever" is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.
Enter object based storage. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions. This storage technology has the potential to address some of the problems with block-based storage.
With increased scalability, better security through per-object level access and insured integrity of data with unique hash key's for each object along with some benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of SAN and NAS networks.
Overview of Block-Based Storage
Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[1] Hard disks store data in blocks, which are fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[2] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.
Comparison of object and block based stores
Scalability
The Association of Storage Networking Professionals estimates that there were over 1 million full-time or part-time storage administrators in 2004. A survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data. Many Fortune 500 companies are known to be approaching 1 petabyte of data, that is, assuming 100 gigabytes per drive, 10,000 individual drives. Ref.
Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Manageability of metadata becomes more and more complex and has a major impact on scalability. We see this in SAN and NAS systems.
Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks and for clients to share read-write access, they must coordinate usage of data blocks through metadata.
NAS systems use metadata to map blocks into files and manage file security within a single system. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it. Ref. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. Metadata becomes very hard to manage.
With an OSD interface, however, metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. Ref This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.
Integrity
References
[2] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.