COMP 3000 Essay 1 2010 Question 11: Difference between revisions

From Soma-notes
Mbingham (talk | contribs)
Smcilroy (talk | contribs)
added a small paragraph to integrity
Line 39: Line 39:
SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.
SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.


Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device.<sup> [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] </sup> This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.
Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device.<sup> [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]</sup> This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.


=== Integrity ===
=== Integrity ===
Line 47: Line 47:


OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.
OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.
OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data.<sup> [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]</sup>


=== Security ===
=== Security ===
Line 84: Line 86:
[8] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]
[8] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]


[9] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]
[9] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].
 
[10] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]


[10] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]
[11] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]


[11] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]
[12] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

Revision as of 01:44, 14 October 2010

Question

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

Answer

Introduction

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[1] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[2] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access and insured integrity of data with unique hash key's for each object along with some benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

Overview of Block-Based Storage

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[3] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[4] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[5](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

Overview of Object-Based Storage

Anyone feel free to expand on this section

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

Changing Storage Needs

Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[6] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

Comparison of object and block based stores

Scalability

Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[7] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [8] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

Integrity

Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [9]

Security

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [10] For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [11]

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [12] This can prevent accidental or even malicious access to an OSD externally or internally.

Conclusion

Note: overall conclusions? In conclusion, object based storage devices...

References

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html> [Accessed 13 October 2010].

[3] IBM 350 Disk Storage Unit

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] Object-Based Storage Devices Christian Bandulet, July 2007

[6] Seagate

[7] Foundations of Network Storage

[8] Dell Object Storage Overview

[9] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[10] Storage Area Network

[11] Fibre Channel zoning

[12] IBM OSD Security Protocol Overview