Talk:COMP 3000 Essay 1 2010 Question 11

From Soma-notes

Last minute changes

Ok guys, so its due early tomorrow. We have the essay pretty much completed aside from a few things.

First. Are we getting rid of the headings? Other groups have them in at the moment, I know the prof said the essay should read as if they weren't there but it might not hurt for them to be there.

Second. The essay needs to flow better. Some intro and outro sentences acknowledging the next section and refering to the previous ones would be nice.

Otherwise, what else remains? --Smcilroy 23:12, 14 October 2010 (UTC)

I'm trying to cleanup the references, is this format acceptable? --Dagar 23:45, 14 October 2010 (UTC)

Yes, that looks alot better --Smcilroy 00:34, 15 October 2010 (UTC)
I think we can keep some of the main headings, but I don't think we need them all. I think the real meat of the essay is in the comparisons with networked storage like NAS and especially SAN, so those sections should probably have headings of some kind. I also agree on the flow needing some work, some of the sections have a bit of overlap.
Anil had mentioned to me today an example of a networked file system based on object store devices - Ceph. here is the full paper on the system. I was thinking it might be worth it to mention it at least, maybe even have a small section about it, just so we get in a real world example of this technology. What do you guys think?
--Mbingham 01:56, 15 October 2010 (UTC)
Heres a quick example section, I know this is pretty last minute but what do you guys think?
Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions. (insert reference to paper) Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.
--Mbingham 02:09, 15 October 2010 (UTC)
Also (sorry for all the comments), where does the first sentence of the Security section come from? It sounds like something that should be referenced, and seems kind of out of place because I don't think those four "quadrants" are brought up again?
--Mbingham 02:11, 15 October 2010 (UTC)
Ok if Anil mentioned it, it's probably a good idea to include it, maybe after the 3 comparisons. I got an email back from Anil and he said that headings are OK as long as they add to the essay. So I think we can leave them in. --Smcilroy 02:30, 15 October 2010 (UTC)
Cool, I added the section in. --Mbingham 02:39, 15 October 2010 (UTC)
The four quadrants thing is something I came up with cause that's how I visualized it. You can imagine how secure something is with some points mapped on those quadrants(external, internal, malicious, accidental). I was trying to point out the strength of an OSDs security with this analogy but I guess it didn't flow well.
--Myagi 23:38, 15 October 2010 (UTC)

Tightening up the Intro

Hey everyone,

I think it might be useful to re-work the intro a bit so that it better represents the direction the essay has taken since then. Heres a quick mockup of a reworked intro. It could be expanded on in some parts and worked on, etc. I would like any comments, if you guys think this better represents the essay, or what you think needs changing in the introduction. Here it is:

Storage needs have evolved over the past 60 years, and as a result the functionality expected from filesystems and storage solutions has evolved as well. The low level interface that a storage device implements, however, has remained mostly the same. A block based interface is still the most common mechanism for accessing storage devices. Recently, however, especially with the growth of networked storage architectures such as NAS and SAN, this interface needs to be reworked to accomodate changing needs. Object based storage is increasingly becoming an attractive alternative to block based storage. The design of object based storage devices (OSD), which store objects rather than blocks, easily associates data with meta-data. Objects are created, destroyed, read to, and written from, as well as carrying a unique ID. The device itself manages the physical space and can handle security on a per-object level. A storage network which is based on OSDs can provide better scalability without bottlenecks, better security with per-object access controls, and better integrity with unique has keys. In this way, the OSD interface is looking increasingly attractive as a building block for filesystems, especially in the context of netwoked storage.

I think the main thing is that it brings up networked storage earlier and puts a bit more focus on it. I think the main arguments for object based storage is its applicability to large storage networks, and the advantages it has over block based architectures. For this reason I think the intro should put a bit more focus on it. Does that make sense? Any comments or suggestions you guys have are welcome.

--Mbingham 21:18, 14 October 2010 (UTC)

I know what you mean, putting a focus on network storage is a good idea. Let me see if I can add your suggestions to the intro and maybe combine the two.--Smcilroy 23:12, 14 October 2010 (UTC)

Wikipedia Sources

I think we may want to replace the references to wikipedia with something more authoritative. this massive pdf from IBM supports the idea that fiber channels are the dominant infrastructure of SANs, but i'm not sure if it mentions how that is changing.

The wikipedia page for LUN masking has this as its reference for the definitions, there's also this microsoft article and this paper from Hitachi. I'm not sure which of these is most relevant since I just did a quick google search and haven't really read up on LUN masking or zoning, so someone else would probably be better suited to decide which one if any to use.

How does that sound to everyone?

--Mbingham 02:55, 14 October 2010 (UTC)

I agree, the Wikipedia references need to go. Whoever included those references should be able to find alternate sources from the one's you gave. --Smcilroy 17:45, 14 October 2010 (UTC)

Some Sourcing Issues and Other Stuff

Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--Mbingham 19:32, 12 October 2010 (UTC)

Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
--Myagi 15:47, 12 October 2010 (UTC)
No problem, it's just something to watch out for. I'll integrate it with the other section.
Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
--Mbingham 20:02, 12 October 2010 (UTC)
I apologize for the delay, this has been an easy thing to neglect during a busy week. What's the proper way to reference with this wiki? --Dagar 21:29, 13 October 2010 (UTC)
check out this reference guide, it explain how to reference any material you find online. Harvard System of Reference --Smcilroy 22:46, 13 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that more people aren't tagging the document outline and volunteering responsibilities is kind of unnerving...

--Myagi 07:57, 13 October 2010 (UTC)

I'm going to expand the scalability and integrity sections. Then once the security section is done, I think that just leaves the section on the OSD standard and future plans for the tech. Then in the conclusion we can recap. --Smcilroy 22:54, 13 October 2010 (UTC)

Sounds like a plan. I'll clean up/expand what I have written and get started with some initial stuff for the object sections. Anyone else is welcome to expand and edit as well.
--Mbingham 00:44, 14 October 2010 (UTC)

Essay Format and Assigned Tasks

So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit. Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think? --Smcilroy 15:16, 10 October 2010 (UTC)

Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:
  • Overview and history of block-based storage -Mbingham (I added a useful diagram here -Npradhan)
  • Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
  • Networked storage architectures: SAN and NAS -Smcilroy
  • How storage needs have changed since the development of block-based storage -Npradhan
(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)
  • Overview and History of object-based storage -Npradhan
  • Object-based storage standards (ANSI OSD specification)
  • Object-based storage applied to networked storage -dagar
Comparison of object and block based stores focusing on:
  • Scalability -Myagi
  • Integrity -Myagi
  • Security -Myagi
  • Conclusion -Smcilroy
Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
--Mbingham 16:45, 10 October 2010 (UTC)
Good plan, I took Scalability and Integrity comparisons of object and block stores.
--Myagi 13:26, 10 October 2010 (UTC)

Initial Outline

Introduction

  • Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
  • What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

Block based storage

  • NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages

- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead

Disadvantages

- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP 
- Not suitable for data transfer intensive apps 
  • SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages

- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism

Disadvantages

- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate
  • OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
  • Block storage has limitations that have become more apparent as demand for scalability and security has grown

Overview of OSD

  • An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.
  • ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
  • The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations


Scalability

  • Metadata is associated and stored directly with data objects and carried between layers and across devices
  • Space allocation delegated to storage device
  • Server has reduced overhead and processing, allowing larger clusters of storage

Integrity

  • OSD's have knowledge of its object layout
  • Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
  • OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

Security

  • Suited for network based storage
  • Associate security attributes directly with data object
  • Security requests handled directly by storage device
  • Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

Conclusion

  • Reiteration of thesis statement

--Myagi 18:15, 7 October 2010 (UTC)


Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--Mbingham 02:31, 8 October 2010 (UTC)

It's all good.
--Myagi 10:00, 8 October 2010 (UTC)
This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.
For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.
Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.
--Mbingham 02:31, 8 October 2010 (UTC)

Quick Overview

So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

Object Storage: The Future Building Block for Storage Systems

Object-Based Storage

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--Mbingham 23:56, 1 October 2010 (UTC)

Some more links

I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

Object Storage Overview

and

File Systems for OSD's

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

The advantages of OSD's

--Myagi 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--Myagi 10:42, 6 October 2010 (UTC)

I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?
I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?
--Mbingham 01:55, 7 October 2010 (UTC)
You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
--Dagar 12:59, 7 October 2010 (UTC)
Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. T10 OSD Working Drafts. But then again I'm probably misunderstanding something...
--Myagi 10:08, 7 October 2010 (UTC)
I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
--Mbingham 15:44, 7 October 2010 (UTC)


I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
--Myagi 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--Myagi 18:15, 7 October 2010 (UTC)

(moved Myagi's outline to top of page) --Mbingham 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems: http://dsc.sun.com/solaris/articles/osd.html http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--Npradhan 23:45, 9 October 2010 (UTC)

Other

-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks