COMP 3000 Essay 1 2010 Question 11: Difference between revisions

From Soma-notes
Npradhan (talk | contribs)
Npradhan (talk | contribs)
Line 27: Line 27:
== Changing Storage Needs ==
== Changing Storage Needs ==


Storage needs have changed significantly since the first hard disks were developed in the 1950s, and the standardization of the interface in the 1970s. This means that the functionality of storage devices must also change to reflect these needs. Storage has become increasingly networked. Networked storage must deal with several issues. Firstly, the storage architecture must be able to scale to terabytes (10^12 bytes), petabytes (10^15 bytes) and beyond with many servers and clients while avoiding bottlenecks. The data stored on these networks has also become more sensitive. Personal information, such as financial, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has increased, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in its design. Object based storage is more suited to address these issues by design.
Storage needs have changed significantly since the first hard disks were developed in the 1950s, and the standardization of the interface in the 1970s. This means that the functionality of storage devices must also change to reflect these needs. Storage has become increasingly networked. Networked storage must deal with several issues. Firstly, the storage architecture must be able to scale to terabytes (10^12 bytes), petabytes (10^15 bytes) and beyond with many servers and clients while avoiding bottlenecks. Bottlenecks in networks are easier to avoid if data storage is distributed rather than centralized. The data stored on these networks has also become more sensitive. Personal information, such as financial, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has increased, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in its design. Object based storage is more suited to address these issues by design.


== Comparison of object and block based stores ==
== Comparison of object and block based stores ==

Revision as of 07:55, 15 October 2010

Question

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

Answer

Introduction

Each year we are faced with growing storage needs as the world's information increases exponentially, business' are increasingly choosing to archive and retain all the data they produce and "store everything, forever" (Dell, 2010)1 is the common mantra of storage administrators. The storage industry has been able to keep up with the increasing demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has hardly changed since the 1950s. The dominate storage mechanism is still block-based storage technology.

Innovation in storage technology is especially pertinent to businesses that use network storage. The two dominant technologies of network storage; storage area network (SAN) and network-attached storage (NAS), each have their own benefits and drawbacks and would benefit greatly with improvement in storage technology. Specifically, improvements that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions would be ideal.

Object Based Storage Devices (OSD) solve these issues by design. Using objects that consist of both data and metadata, they are accessed with defined methods such as read and write and carry a unique identifier. They also handle the underlying security, space allocation and basic storage routines.2 This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash keys and benefits in management and business intelligence with rich metadata, OSD can be seen as a viable alternative to improve the standard architectures of SAN and NAS.

Overview of Block-Based Storage

Hard disks as a storage medium date back to the 1950s with the introduction of the IBM 350 disk storage unit.3 Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.4 This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into a block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions" (Bandulet, 2007)2. This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these dated commands.

Overview of Object-Based Storage

Unlike block-based storage, object-based storage research started in the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical memory. These objects also have metadata and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction, which allows for much more flexibility, which in turn gives rise to numerous capabilities not present in block-based storage. This is important because the needs placed on filesystems have changed, and we will see as we compare object based storage with block based storage that the design of objects is more suited to the needs of today's filesystems, than blocks, especially with networked filesystems,.

Changing Storage Needs

Storage needs have changed significantly since the first hard disks were developed in the 1950s, and the standardization of the interface in the 1970s. This means that the functionality of storage devices must also change to reflect these needs. Storage has become increasingly networked. Networked storage must deal with several issues. Firstly, the storage architecture must be able to scale to terabytes (10^12 bytes), petabytes (10^15 bytes) and beyond with many servers and clients while avoiding bottlenecks. Bottlenecks in networks are easier to avoid if data storage is distributed rather than centralized. The data stored on these networks has also become more sensitive. Personal information, such as financial, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has increased, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in its design. Object based storage is more suited to address these issues by design.

Comparison of object and block based stores

Scalability

Scalability is very important for large businesses that need to manage large data centers. Managing metadata while ensuring data access speed as the systems grows is paramount.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS coordinates the interface between file level access and clients. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.5 All data traffic must flow through this single access point. The benefits of the NAS is through its ability to manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SANs on the other hand offer file systems that are distributed, but provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SANs of the past. Modern SANs can serve a much larger set of users, not all of whom can or should be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern6. Object stores can make user privilege management a much more manageable task, since each object can is aware of who is allowed to access it.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device.1 This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

Integrity

Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be checked for to ensure integrity and guard against data corruption. The hash key can also be used for disk management to quickly detect and flag duplicate data.1

Security

Security is an issue that must be confronted in all modern storage networks. Security issues come in a wide variety of types, so can be difficult to deal with. Both SAN and NAS have a variety of ways for handling security, but an object based approach can make the implementation of security measures more effective and easier to manage.

SAN has traditionally run on fibre channels.7 For the sake of security, running a SAN on fibre channels helps to isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and logical unit number (LUN) masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking.8

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD gives it a fair amount of flexibility in controlling access. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object.9 This can prevent a wide range of potential attacks, which gives OSD systems an advantage over block based systems.

Real World Implementation

Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions.10 Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.

Conclusion

Although object storage is relatively new compared to block storage, work has progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. However, there remains challenges to its adoption in the industry. One of which, is that OSD is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.11 As newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined with integrity controls for backups and redundancies will be an attractive choice for storage administrators in the future.

References

1 Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

2 C. Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html> [Accessed 13 October 2010].

3 IBM 350 disk storage unit, IBM Archives. [online] IBM Available at : <http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html> [Accessed 14 October 2010].

4 M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

5 TechRepublic Guest Contributor, Foundations of Network Storage, Lesson Two: NAS. [online] Available at <http://articles.techrepublic.com.com/5100-22_11-5841266.html> [Accessed 14 October 2010].

6 Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

7 J. Tate, F. Lucchese, R. Moore. Introduction to Storage Area Networks. [online] Available at <http://www.redbooks.ibm.com/redbooks/pdfs/sg245470.pdf> [Accessed 14 October 2010].

8 H. Yoshida. LUN Security Considerations for Storage Area Networks. [online] Available at <http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf> [Accessed 14 October 2010].

9 M. Factor, D. Nagle, D. Naor, E. Riedel, J.Satran, 2005. The OSD Security Protocol. [online] Available at <http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf> [Accessed 14 October 2010].

10 S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proc. OSDI, 2006. [online] Available at: <http://www.usenix.org/events/osdi06/tech/full_papers/weil/weil_html/> [Accessed 14 October 2010].

11 M. Factor, K. Meth, D. Naor, O. Rodeh, J. Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].