Soma-notes - User contributions [en]

COMP 3000 Essay 2 2010 Question 5

2010-11-24T22:39:33Z

Myagi: /* Background Concepts */

==Paper==
'''Title:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]

'''Authors:''' Jingyue Wu, Heming Cui, Junfeng Yang

'''Affiliations:''' Computer Science Department, Columbia University

'''Supplementary Information:''' Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]

==Background Concepts==
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time." Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather find it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.

LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.

The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manor.

This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

'''Deadlock:''' Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each others thread to release the variable. Thus a deadlock occurs and nothing can happen.

'''Execution Filters:''' Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.

'''Hot Patches:''' "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[[#References | [1]]]

'''Hybrid Instrumentation Engine:''' "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [[#References | [2]]] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.

'''Lock:''' A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [[#References | [3]]]

'''Mutex:''' Unable to be both true at the same time.

'''Race Condition:''' "A race condition occurs when two threads access a shared variable at the same time." [[#References | [4]]]

'''Semaphore:''' Semaphores are basically a special type of flag and generalize a down and up state(sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems. [[#References | [5]]]

==Research problem==
===Problem being addressed===
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
===Related work===
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.

==Contribution==
===Current solution expressed===
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

==Critique==
===Good===
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up their topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnecessary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

===Not-So-Good===
One of the problems with this paper is that although many of the examples are simplified in order to expedite the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunately, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

==References==
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].

[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx]

[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]

[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]

[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008

COMP 3000 Essay 2 2010 Question 5

2010-11-24T15:48:52Z

Myagi:

==Paper==
'''Title:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]

'''Authors:''' Jingyue Wu, Heming Cui, Junfeng Yang

'''Affiliations:''' Computer Science Department, Columbia University

'''Supplementary Information:''' Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]

==Background Concepts==
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

'''Deadlock:''' Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each others thread to release the variable. Thus a deadlock occurs and nothing can happen.

'''Execution Filters:''' Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.

'''Hot Patches:''' "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[[#References | [1]]]

'''Hybrid Instrumentation Engine:''' "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [[#References | [2]]] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.

'''Lock:''' A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [[#References | [3]]]

'''Mutex:''' Unable to be both true at the same time.

'''Race Condition:''' "A race condition occurs when two threads access a shared variable at the same time." [[#References | [4]]]

'''Semaphore:''' Semaphores are basically a special type of flag and generalize a down and up state(sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems. [[#References | [5]]]

==Research problem==
===Problem being addressed===
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
===Related work===
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.

==Contribution==
===Current solution expressed===
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

==Critique==
===Good===
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up their topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnecessary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

===Not-So-Good===
One of the problems with this paper is that although many of the examples are simplified in order to expedite the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunately, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

==References==
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].

[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx]

[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]

[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]

[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-24T15:46:56Z

Myagi:

Maybe we can all add our names below so we know who's still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

Group members:

* Michael Yagi
* Nicolas Lessard
* Julie Powers
* Derek Langlois
* Dustin Martin

Jeffrey Francom contacted me earlier so I know he is also still in the course. <strike>Now we are only waiting on Dustin Martin.</strike> Everyone has been accounted for. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)

Just kicking things off. Feel free to make suggestions or change anything. --[[User:Myagi|Myagi]] 11:36, 17 November 2010 (UTC)

Edited and filled out the critique section. Edited a little bit here and there. --[[User:Afranco2|Afranco2]] 17:41, 22 November 2010 (UTC)

Moved stuff to the front page and cleaned up references. Still waiting for people to expand if possible. Also, spellcheck ;) --[[User:Myagi|Myagi]] 10:37, 24 November 2010 (UTC)

==Essay==
===Paper===
<blockquote>The paper's title, authors, and their affiliations. Include a link to the paper and any particularly helpful supplementary information.</blockquote>
* Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
* Authors: Jingyue Wu, Heming Cui, Junfeng Yang
* Affiliations: Computer Science Department, Columbia University
* Supplementary Information: [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 Video], [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf Slides]

===Background Concepts===
<blockquote>Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.</blockquote>

-------------
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time." Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather find it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.

LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.

The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manor.
-------------

This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

* Race Condition: "A race condition occurs when two threads access a shared variable at the same time." [http://support.microsoft.com/kb/317723 Race Condition]
* Execution Filters: Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.
* Hot Patches: "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx Hot Patching]
* Hybrid Instrumentation Engine: "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx Instrumentation] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.
* Lock: A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 Mutex Locks]
* Mutex: Unable to be both true at the same time.
* Semaphore: "A semaphore is a protected variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment." [http://en.wikipedia.org/wiki/Semaphore_%28programming%29 Semaphore]

===Research problem===
<blockquote>What is the research problem being addressed by the paper? How does this problem relate to past related work?</blockquote>
====Problem being addressed====
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
====Past related work====
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.
===Contribution===
<blockquote>What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)</blockquote>
====Current solution expressed====
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

===Critique===
<blockquote>What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.</blockquote>
====Good====
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

====Not-So-Good====
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

===References===
<blockquote>You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.</blockquote>

COMP 3000 Essay 2 2010 Question 5

2010-11-24T15:46:26Z

Myagi: /* Good */

==Paper==
'''Title:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]

'''Authors:''' Jingyue Wu, Heming Cui, Junfeng Yang

'''Affiliations:''' Computer Science Department, Columbia University

'''Supplementary Information:''' Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]

==Background Concepts==
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

'''Deadlock:''' Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each others thread to release the variable. Thus a deadlock occurs and nothing can happen.

'''Execution Filters:''' Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.

'''Hot Patches:''' "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[[#References | [1]]]

'''Hybrid Instrumentation Engine:''' "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [[#References | [2]]] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.

'''Lock:''' A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [[#References | [3]]]

'''Mutex:''' Unable to be both true at the same time.

'''Race Condition:''' "A race condition occurs when two threads access a shared variable at the same time." [[#References | [4]]]

'''Semaphore:''' Semaphores are basically a special type of flag and generalize a down and up state(sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems. [[#References | [5]]]

==Research problem==
===Problem being addressed===
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
===Related work===
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.

==Contribution==
===Current solution expressed===
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

==Critique==
===Good===
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up their topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnecessary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

===Not-So-Good===
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

==References==
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].

[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx]

[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]

[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]

[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008

COMP 3000 Essay 2 2010 Question 5

2010-11-24T15:45:16Z

Myagi: /* Research problem */

==Paper==
'''Title:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]

'''Authors:''' Jingyue Wu, Heming Cui, Junfeng Yang

'''Affiliations:''' Computer Science Department, Columbia University

'''Supplementary Information:''' Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]

==Background Concepts==
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

'''Deadlock:''' Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each others thread to release the variable. Thus a deadlock occurs and nothing can happen.

'''Execution Filters:''' Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.

'''Hot Patches:''' "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[[#References | [1]]]

'''Hybrid Instrumentation Engine:''' "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [[#References | [2]]] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.

'''Lock:''' A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [[#References | [3]]]

'''Mutex:''' Unable to be both true at the same time.

'''Race Condition:''' "A race condition occurs when two threads access a shared variable at the same time." [[#References | [4]]]

'''Semaphore:''' Semaphores are basically a special type of flag and generalize a down and up state(sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems. [[#References | [5]]]

==Research problem==
===Problem being addressed===
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
===Related work===
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.

==Contribution==
===Current solution expressed===
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

==Critique==
===Good===
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

===Not-So-Good===
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

==References==
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].

[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx]

[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]

[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]

[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008

COMP 3000 Essay 2 2010 Question 5

2010-11-24T15:44:36Z

Myagi: /* Background Concepts */

==Paper==
'''Title:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]

'''Authors:''' Jingyue Wu, Heming Cui, Junfeng Yang

'''Affiliations:''' Computer Science Department, Columbia University

'''Supplementary Information:''' Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]

==Background Concepts==
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

'''Deadlock:''' Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each others thread to release the variable. Thus a deadlock occurs and nothing can happen.

'''Execution Filters:''' Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.

'''Hot Patches:''' "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[[#References | [1]]]

'''Hybrid Instrumentation Engine:''' "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [[#References | [2]]] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.

'''Lock:''' A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [[#References | [3]]]

'''Mutex:''' Unable to be both true at the same time.

'''Race Condition:''' "A race condition occurs when two threads access a shared variable at the same time." [[#References | [4]]]

'''Semaphore:''' Semaphores are basically a special type of flag and generalize a down and up state(sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems. [[#References | [5]]]

==Research problem==
===Problem being addressed===
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
===Past related work===
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.
==Contribution==
===Current solution expressed===
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

==Critique==
===Good===
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

===Not-So-Good===
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

==References==
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].

[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx]

[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]

[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]

[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-24T15:37:37Z

Myagi:

Maybe we can all add our names below so we know who's still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

Group members:

* Michael Yagi
* Nicolas Lessard
* Julie Powers
* Derek Langlois
* Dustin Martin

Jeffrey Francom contacted me earlier so I know he is also still in the course. <strike>Now we are only waiting on Dustin Martin.</strike> Everyone has been accounted for. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)

Just kicking things off. Feel free to make suggestions or change anything. --[[User:Myagi|Myagi]] 11:36, 17 November 2010 (UTC)

Edited and filled out the critique section. Edited a little bit here and there. --[[User:Afranco2|Afranco2]] 17:41, 22 November 2010 (UTC)

Moved stuff to the front page and cleaned up references. Still waiting for people to expand if possible. --[[User:Myagi|Myagi]] 10:37, 24 November 2010 (UTC)

==Essay==
===Paper===
<blockquote>The paper's title, authors, and their affiliations. Include a link to the paper and any particularly helpful supplementary information.</blockquote>
* Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
* Authors: Jingyue Wu, Heming Cui, Junfeng Yang
* Affiliations: Computer Science Department, Columbia University
* Supplementary Information: [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 Video], [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf Slides]

===Background Concepts===
<blockquote>Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.</blockquote>

-------------
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time." Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather find it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.

LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.

The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manor.
-------------

This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

* Race Condition: "A race condition occurs when two threads access a shared variable at the same time." [http://support.microsoft.com/kb/317723 Race Condition]
* Execution Filters: Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.
* Hot Patches: "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx Hot Patching]
* Hybrid Instrumentation Engine: "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx Instrumentation] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.
* Lock: A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 Mutex Locks]
* Mutex: Unable to be both true at the same time.
* Semaphore: "A semaphore is a protected variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment." [http://en.wikipedia.org/wiki/Semaphore_%28programming%29 Semaphore]

===Research problem===
<blockquote>What is the research problem being addressed by the paper? How does this problem relate to past related work?</blockquote>
====Problem being addressed====
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
====Past related work====
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.
===Contribution===
<blockquote>What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)</blockquote>
====Current solution expressed====
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

===Critique===
<blockquote>What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.</blockquote>
====Good====
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

====Not-So-Good====
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

===References===
<blockquote>You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.</blockquote>

COMP 3000 Essay 2 2010 Question 5

2010-11-24T15:24:28Z

Myagi:

==Paper==
'''Title:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]

'''Authors:''' Jingyue Wu, Heming Cui, Junfeng Yang

'''Affiliations:''' Computer Science Department, Columbia University

'''Supplementary Information:''' Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]

==Background Concepts==
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

'''Deadlock:''' Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked. The result of this is that each thread is waiting for each others thread to release the variable. Thus a deadlock occurs and nothing can happen.

'''Execution Filters:''' Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.

'''Hot Patches:''' "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[[#References | [1]]]

'''Hybrid Instrumentation Engine:''' "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [[#References | [2]]] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.

'''Lock:''' A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [[#References | [3]]]

'''Mutex:''' Unable to be both true at the same time.

'''Race Condition:''' "A race condition occurs when two threads access a shared variable at the same time." [[#References | [4]]]

'''Semaphore:''' Semaphores are basically a special type of flag and generalize a down and up state(sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems. [[#References | [5]]]

==Research problem==
===Problem being addressed===
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
===Past related work===
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.
==Contribution==
===Current solution expressed===
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

==Critique==
===Good===
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

===Not-So-Good===
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

==References==
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].

[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx]

[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]

[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]

[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-24T14:23:41Z

Myagi: /* Background Concepts */

Maybe we can all add our names below so we know who's still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

Group members:

* Michael Yagi
* Nicolas Lessard
* Julie Powers
* Derek Langlois
* Dustin Martin

Jeffrey Francom contacted me earlier so I know he is also still in the course. <strike>Now we are only waiting on Dustin Martin.</strike> Everyone has been accounted for. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)

Just kicking things off. Feel free to make suggestions or change anything. --[[User:Myagi|Myagi]] 11:36, 17 November 2010 (UTC)

Edited and filled out the critique section. Edited a little bit here and there. --[[User:Afranco2|Afranco2]] 17:41, 22 November 2010 (UTC)

==Essay==
===Paper===
<blockquote>The paper's title, authors, and their affiliations. Include a link to the paper and any particularly helpful supplementary information.</blockquote>
* Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
* Authors: Jingyue Wu, Heming Cui, Junfeng Yang
* Affiliations: Computer Science Department, Columbia University
* Supplementary Information: [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 Video], [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf Slides]

===Background Concepts===
<blockquote>Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.</blockquote>

This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

* Race Condition: "A race condition occurs when two threads access a shared variable at the same time." [http://support.microsoft.com/kb/317723 Race Condition]
* Execution Filters: Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.
* Hot Patches: "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."[http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx Hot Patching]
* Hybrid Instrumentation Engine: "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx Instrumentation] Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.
* Lock: A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 Mutex Locks]
* Mutex: Unable to be both true at the same time.
* Semaphore: "A semaphore is a protected variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment." [http://en.wikipedia.org/wiki/Semaphore_%28programming%29 Semaphore]

===Research problem===
<blockquote>What is the research problem being addressed by the paper? How does this problem relate to past related work?</blockquote>
====Problem being addressed====
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.
====Past related work====
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.
===Contribution===
<blockquote>What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)</blockquote>
====Current solution expressed====
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

===Critique===
<blockquote>What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.</blockquote>
====Good====
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

====Not-So-Good====
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

===References===
<blockquote>You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.</blockquote>

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-21T07:47:07Z

Myagi: /* Current solution expressed */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-21T07:44:54Z

Myagi: /* Current solution expressed */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-21T07:44:02Z

Myagi: /* Background Concepts */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-21T07:13:25Z

Myagi: /* Current solution expressed */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-21T06:54:06Z

Myagi: /* Contribution */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-21T06:35:22Z

Myagi: /* Current solution expressed */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-21T06:29:38Z

Myagi: /* Paper */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-20T21:32:29Z

Myagi:

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-20T21:26:49Z

Myagi: /* Contribution */

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-17T16:37:05Z

Myagi:

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-17T16:35:56Z

Myagi: Kicking things off

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-14T17:49:09Z

Myagi:

Maybe we can all add our names below so we know who's still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

Group members:

* Michael Yagi

==Essay==
* Paper
** Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
** Authors: Jingyue Wu, Heming Cui, Junfeng Yang
** Affiliations: Computer Science Department, Columbia University
** Supplementary Information:
* Background Concepts
* Research problem
* Contribution
* Critique
* References

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-14T17:48:22Z

Myagi:

Maybe we can all add our names below so we know who's still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

Group members:

* Michael Yagi

==Essay==
* Paper
** Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
** Authors: Jingyue Wu, Heming Cui, Junfeng Yang
** Affiliations: Computer Science Department, Columbia University
* Background Concepts
* Research problem
* Contribution
* Critique
* References

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-14T17:46:55Z

Myagi:

Group members:

* Michael Yagi

---------------
Maybe we can all add our names above so we know who's still in this course. --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

==Essay==
* Paper
** Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
** Authors: Jingyue Wu, Heming Cui, Junfeng Yang
** Affiliations: Computer Science Department, Columbia University
* Background Concepts
* Research problem
* Contribution
* Critique
* References

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-14T17:45:21Z

Myagi: /* Essay */

Group members:

* Michael Yagi

---------------
Maybe we can all add our names above? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

==Essay==
* Paper
** Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
** Authors: Jingyue Wu, Heming Cui, Junfeng Yang
** Affiliations: Computer Science Department, Columbia University
* Background Concepts
* Research problem
* Contribution
* Critique
* References

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-14T17:44:34Z

Myagi:

Group members:

* Michael Yagi

---------------
Maybe we can all add our names above? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

==Essay==
* Paper
* Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
* Authors: Jingyue Wu, Heming Cui, Junfeng Yang
* Affiliations: Computer Science Department, Columbia University
* Background Concepts
* Research problem
* Contribution
* Critique
* References

Talk:COMP 3000 Essay 2 2010 Question 5

2010-11-14T17:43:39Z

Myagi: Created page with "Group members: * Michael Yagi --------------- Maybe we can all add our names above? --Myagi 12:38, 14 November 2010 (UTC) ==Essay== * Paper Title: [http://www.…"

Group members:

* Michael Yagi

---------------
Maybe we can all add our names above? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)

==Essay==
* Paper
Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]
Authors: Jingyue Wu, Heming Cui, Junfeng Yang
Affiliations: Computer Science Department, Columbia University
* Background Concepts
* Research problem
* Contribution
* Critique
* References

COMP 3000 Essay 2 2010 Question 5

2010-11-14T17:36:27Z

Myagi: Created page with "See discussion"

See discussion

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-15T03:43:38Z

Myagi: /* Last minute changes */

== Last minute changes ==
Ok guys, so its due early tomorrow. We have the essay pretty much completed aside from a few things.

First. Are we getting rid of the headings? Other groups have them in at the moment, I know the prof said the essay should read as if they weren't there but it might not hurt for them to be there.

Second. The essay needs to flow better. Some intro and outro sentences acknowledging the next section and refering to the previous ones would be nice.

Otherwise, what else remains?
--[[User:Smcilroy|Smcilroy]] 23:12, 14 October 2010 (UTC)

I'm trying to cleanup the references, is this format acceptable? --[[User:Dagar|Dagar]] 23:45, 14 October 2010 (UTC)
: Yes, that looks alot better --[[User:Smcilroy|Smcilroy]] 00:34, 15 October 2010 (UTC)

::I think we can keep some of the main headings, but I don't think we need them all. I think the real meat of the essay is in the comparisons with networked storage like NAS and especially SAN, so those sections should probably have headings of some kind. I also agree on the flow needing some work, some of the sections have a bit of overlap.

::Anil had mentioned to me today an example of a networked file system based on object store devices - [http://ceph.newdream.net/about/ Ceph]. [http://www.usenix.org/events/osdi06/tech/full_papers/weil/weil_html/ here is the full paper] on the system. I was thinking it might be worth it to mention it at least, maybe even have a small section about it, just so we get in a real world example of this technology. What do you guys think?

::--[[User:Mbingham|Mbingham]] 01:56, 15 October 2010 (UTC)

::Heres a quick example section, I know this is pretty last minute but what do you guys think?

::Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions. (insert reference to paper) Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.

::--[[User:Mbingham|Mbingham]] 02:09, 15 October 2010 (UTC)

::Also (sorry for all the comments), where does the first sentence of the Security section come from? It sounds like something that should be referenced, and seems kind of out of place because I don't think those four "quadrants" are brought up again?

::--[[User:Mbingham|Mbingham]] 02:11, 15 October 2010 (UTC)

::: Ok if Anil mentioned it, it's probably a good idea to include it, maybe after the 3 comparisons. I got an email back from Anil and he said that headings are OK as long as they add to the essay. So I think we can leave them in. --[[User:Smcilroy|Smcilroy]] 02:30, 15 October 2010 (UTC)

::::Cool, I added the section in. --[[User:Mbingham|Mbingham]] 02:39, 15 October 2010 (UTC)

::The four quadrants thing is something I came up with cause that's how I visualized it. You can imagine how secure something is with some points mapped on those quadrants(external, internal, malicious, accidental). I was trying to point out the strength of an OSDs security with this analogy but I guess it didn't flow well.

::--[[User:Myagi|Myagi]] 23:38, 15 October 2010 (UTC)

== Tightening up the Intro ==
Hey everyone,

I think it might be useful to re-work the intro a bit so that it better represents the direction the essay has taken since then. Heres a quick mockup of a reworked intro. It could be expanded on in some parts and worked on, etc. I would like any comments, if you guys think this better represents the essay, or what you think needs changing in the introduction. Here it is:

:Storage needs have evolved over the past 60 years, and as a result the functionality expected from filesystems and storage solutions has evolved as well. The low level interface that a storage device implements, however, has remained mostly the same. A block based interface is still the most common mechanism for accessing storage devices. Recently, however, especially with the growth of networked storage architectures such as NAS and SAN, this interface needs to be reworked to accomodate changing needs. Object based storage is increasingly becoming an attractive alternative to block based storage. The design of object based storage devices (OSD), which store objects rather than blocks, easily associates data with meta-data. Objects are created, destroyed, read to, and written from, as well as carrying a unique ID. The device itself manages the physical space and can handle security on a per-object level. A storage network which is based on OSDs can provide better scalability without bottlenecks, better security with per-object access controls, and better integrity with unique has keys. In this way, the OSD interface is looking increasingly attractive as a building block for filesystems, especially in the context of netwoked storage.

I think the main thing is that it brings up networked storage earlier and puts a bit more focus on it. I think the main arguments for object based storage is its applicability to large storage networks, and the advantages it has over block based architectures. For this reason I think the intro should put a bit more focus on it. Does that make sense? Any comments or suggestions you guys have are welcome.

--[[User:Mbingham|Mbingham]] 21:18, 14 October 2010 (UTC)

:I know what you mean, putting a focus on network storage is a good idea. Let me see if I can add your suggestions to the intro and maybe combine the two.--[[User:Smcilroy|Smcilroy]] 23:12, 14 October 2010 (UTC)

== Wikipedia Sources ==
I think we may want to replace the references to wikipedia with something more authoritative. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open this massive pdf] from IBM supports the idea that fiber channels are the dominant infrastructure of SANs, but i'm not sure if it mentions how that is changing.

The wikipedia page for LUN masking has [http://www.sansecurity.com/san-security-faq.shtml this] as its reference for the definitions, there's also [http://technet.microsoft.com/en-us/library/cc758640(WS.10).aspx this] microsoft article and [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf this] paper from Hitachi. I'm not sure which of these is most relevant since I just did a quick google search and haven't really read up on LUN masking or zoning, so someone else would probably be better suited to decide which one if any to use.

How does that sound to everyone?

--[[User:Mbingham|Mbingham]] 02:55, 14 October 2010 (UTC)

:I agree, the Wikipedia references need to go. Whoever included those references should be able to find alternate sources from the one's you gave. --[[User:Smcilroy|Smcilroy]] 17:45, 14 October 2010 (UTC)

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

::I apologize for the delay, this has been an easy thing to neglect during a busy week. What's the proper way to reference with this wiki? --[[User:Dagar|Dagar]] 21:29, 13 October 2010 (UTC)

:::check out this reference guide, it explain how to reference any material you find online. [http://libweb.anglia.ac.uk/referencing/harvard.htm Harvard System of Reference] --[[User:Smcilroy|Smcilroy]] 22:46, 13 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that more people aren't tagging the document outline and volunteering responsibilities is kind of unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

I'm going to expand the scalability and integrity sections. Then once the security section is done, I think that just leaves the section on the OSD standard and future plans for the tech. Then in the conclusion we can recap.
--[[User:Smcilroy|Smcilroy]] 22:54, 13 October 2010 (UTC)

:Sounds like a plan. I'll clean up/expand what I have written and get started with some initial stuff for the object sections. Anyone else is welcome to expand and edit as well.
:--[[User:Mbingham|Mbingham]] 00:44, 14 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham (I added a useful diagram here -Npradhan)
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage -Npradhan
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage -Npradhan
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage -dagar

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security -Myagi

:*Conclusion -Smcilroy

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-15T03:40:43Z

Myagi: /* Last minute changes */

== Last minute changes ==
Ok guys, so its due early tomorrow. We have the essay pretty much completed aside from a few things.

First. Are we getting rid of the headings? Other groups have them in at the moment, I know the prof said the essay should read as if they weren't there but it might not hurt for them to be there.

Second. The essay needs to flow better. Some intro and outro sentences acknowledging the next section and refering to the previous ones would be nice.

Otherwise, what else remains?
--[[User:Smcilroy|Smcilroy]] 23:12, 14 October 2010 (UTC)

I'm trying to cleanup the references, is this format acceptable? --[[User:Dagar|Dagar]] 23:45, 14 October 2010 (UTC)
: Yes, that looks alot better --[[User:Smcilroy|Smcilroy]] 00:34, 15 October 2010 (UTC)

::I think we can keep some of the main headings, but I don't think we need them all. I think the real meat of the essay is in the comparisons with networked storage like NAS and especially SAN, so those sections should probably have headings of some kind. I also agree on the flow needing some work, some of the sections have a bit of overlap.

::Anil had mentioned to me today an example of a networked file system based on object store devices - [http://ceph.newdream.net/about/ Ceph]. [http://www.usenix.org/events/osdi06/tech/full_papers/weil/weil_html/ here is the full paper] on the system. I was thinking it might be worth it to mention it at least, maybe even have a small section about it, just so we get in a real world example of this technology. What do you guys think?

::--[[User:Mbingham|Mbingham]] 01:56, 15 October 2010 (UTC)

::Heres a quick example section, I know this is pretty last minute but what do you guys think?

::Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions. (insert reference to paper) Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.

::--[[User:Mbingham|Mbingham]] 02:09, 15 October 2010 (UTC)

::Also (sorry for all the comments), where does the first sentence of the Security section come from? It sounds like something that should be referenced, and seems kind of out of place because I don't think those four "quadrants" are brought up again?

::--[[User:Mbingham|Mbingham]] 02:11, 15 October 2010 (UTC)

::: Ok if Anil mentioned it, it's probably a good idea to include it, maybe after the 3 comparisons. I got an email back from Anil and he said that headings are OK as long as they add to the essay. So I think we can leave them in. --[[User:Smcilroy|Smcilroy]] 02:30, 15 October 2010 (UTC)

::::Cool, I added the section in. --[[User:Mbingham|Mbingham]] 02:39, 15 October 2010 (UTC)

::The four quadrants thing is something I came up with cause that's how I visualized it. You can imagine how secure something is with some points mapped on quadrants. I was trying to point out the strength of an OSDs security with this analogy but I guess it didn't flow well.

::--[[User:Myagi|Myagi]] 23:38, 15 October 2010 (UTC)

== Tightening up the Intro ==
Hey everyone,

I think it might be useful to re-work the intro a bit so that it better represents the direction the essay has taken since then. Heres a quick mockup of a reworked intro. It could be expanded on in some parts and worked on, etc. I would like any comments, if you guys think this better represents the essay, or what you think needs changing in the introduction. Here it is:

:Storage needs have evolved over the past 60 years, and as a result the functionality expected from filesystems and storage solutions has evolved as well. The low level interface that a storage device implements, however, has remained mostly the same. A block based interface is still the most common mechanism for accessing storage devices. Recently, however, especially with the growth of networked storage architectures such as NAS and SAN, this interface needs to be reworked to accomodate changing needs. Object based storage is increasingly becoming an attractive alternative to block based storage. The design of object based storage devices (OSD), which store objects rather than blocks, easily associates data with meta-data. Objects are created, destroyed, read to, and written from, as well as carrying a unique ID. The device itself manages the physical space and can handle security on a per-object level. A storage network which is based on OSDs can provide better scalability without bottlenecks, better security with per-object access controls, and better integrity with unique has keys. In this way, the OSD interface is looking increasingly attractive as a building block for filesystems, especially in the context of netwoked storage.

I think the main thing is that it brings up networked storage earlier and puts a bit more focus on it. I think the main arguments for object based storage is its applicability to large storage networks, and the advantages it has over block based architectures. For this reason I think the intro should put a bit more focus on it. Does that make sense? Any comments or suggestions you guys have are welcome.

--[[User:Mbingham|Mbingham]] 21:18, 14 October 2010 (UTC)

:I know what you mean, putting a focus on network storage is a good idea. Let me see if I can add your suggestions to the intro and maybe combine the two.--[[User:Smcilroy|Smcilroy]] 23:12, 14 October 2010 (UTC)

== Wikipedia Sources ==
I think we may want to replace the references to wikipedia with something more authoritative. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open this massive pdf] from IBM supports the idea that fiber channels are the dominant infrastructure of SANs, but i'm not sure if it mentions how that is changing.

The wikipedia page for LUN masking has [http://www.sansecurity.com/san-security-faq.shtml this] as its reference for the definitions, there's also [http://technet.microsoft.com/en-us/library/cc758640(WS.10).aspx this] microsoft article and [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf this] paper from Hitachi. I'm not sure which of these is most relevant since I just did a quick google search and haven't really read up on LUN masking or zoning, so someone else would probably be better suited to decide which one if any to use.

How does that sound to everyone?

--[[User:Mbingham|Mbingham]] 02:55, 14 October 2010 (UTC)

:I agree, the Wikipedia references need to go. Whoever included those references should be able to find alternate sources from the one's you gave. --[[User:Smcilroy|Smcilroy]] 17:45, 14 October 2010 (UTC)

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

::I apologize for the delay, this has been an easy thing to neglect during a busy week. What's the proper way to reference with this wiki? --[[User:Dagar|Dagar]] 21:29, 13 October 2010 (UTC)

:::check out this reference guide, it explain how to reference any material you find online. [http://libweb.anglia.ac.uk/referencing/harvard.htm Harvard System of Reference] --[[User:Smcilroy|Smcilroy]] 22:46, 13 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that more people aren't tagging the document outline and volunteering responsibilities is kind of unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

I'm going to expand the scalability and integrity sections. Then once the security section is done, I think that just leaves the section on the OSD standard and future plans for the tech. Then in the conclusion we can recap.
--[[User:Smcilroy|Smcilroy]] 22:54, 13 October 2010 (UTC)

:Sounds like a plan. I'll clean up/expand what I have written and get started with some initial stuff for the object sections. Anyone else is welcome to expand and edit as well.
:--[[User:Mbingham|Mbingham]] 00:44, 14 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham (I added a useful diagram here -Npradhan)
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage -Npradhan
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage -Npradhan
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage -dagar

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security -Myagi

:*Conclusion -Smcilroy

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

COMP 3000 Essay 1 2010 Question 11

2010-10-14T19:31:24Z

Myagi: /* Security */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash key's and benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

One application where the utility of object stores has become increasingly apparent is in SAN (Storage Area Network) systems. SAN file systems are distributed, however they provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SAN networks of the past. Modern SAN networks can serve a much larger set of users, not all of whom can be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf]. Object stores can make user privilege management a much more manageable task, since each object can 'know' who is allowed to access it.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

[8] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[9] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[10] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[11] [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open Storage Area Network]

[12] [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf Fibre Channel zoning]

[13] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[14] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

COMP 3000 Essay 1 2010 Question 11

2010-10-14T19:25:37Z

Myagi: /* References */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash key's and benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

One application where the utility of object stores has become increasingly apparent is in SAN (Storage Area Network) systems. SAN file systems are distributed, however they provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SAN networks of the past. Modern SAN networks can serve a much larger set of users, not all of whom can be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf]. Object stores can make user privilege management a much more manageable task, since each object can 'know' who is allowed to access it.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

[8] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[9] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[10] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[11] [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open Storage Area Network]

[12] [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf Fibre Channel zoning]

[13] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[14] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

COMP 3000 Essay 1 2010 Question 11

2010-10-14T19:24:38Z

Myagi: /* Security */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash key's and benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

One application where the utility of object stores has become increasingly apparent is in SAN (Storage Area Network) systems. SAN file systems are distributed, however they provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SAN networks of the past. Modern SAN networks can serve a much larger set of users, not all of whom can be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf]. Object stores can make user privilege management a much more manageable task, since each object can 'know' who is allowed to access it.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

[8] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[9] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[10] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[11] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]

[12] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]

[13] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[14] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T22:40:59Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

::I apologize for the delay, this has been an easy thing to neglect during a busy week. What's the proper way to reference with this wiki? --[[User:Dagar|Dagar]] 21:29, 13 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that more people aren't tagging the document outline and volunteering responsibilities is kind of unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage -dagar

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security -Myagi

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

COMP 3000 Essay 1 2010 Question 11

2010-10-13T22:39:02Z

Myagi: /* References */

COMP 3000 Essay 1 2010 Question 11

2010-10-13T22:34:22Z

Myagi: /* Security */

COMP 3000 Essay 1 2010 Question 11

2010-10-13T22:29:14Z

Myagi: /* Comparison of object and block based stores */

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T21:10:28Z

Myagi: /* Essay Format and Assigned Tasks */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that nobody else is tagging the document outline and taking anymore responsibility is unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security -Myagi

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T12:40:31Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that nobody else is tagging the document outline and taking anymore responsibility is unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T12:06:16Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that nobody is tagging the document outline and taking anymore responsibility is unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T12:04:45Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. We should already have a finished essay so we can edit it, so the lack of progress is unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T12:03:28Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I already have a draft written up. We should already have a finished essay so we can edit it so the lack of progress is unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T11:58:45Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I already have a draft written up.

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-13T11:58:16Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I already have a draft written up.
--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-12T19:49:00Z

Myagi: /* Some Sourcing Issues and Other Stuff */

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

COMP 3000 Essay 1 2010 Question 11

2010-10-11T16:52:54Z

Myagi: /* Integrity */

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-11T16:52:28Z

Myagi: /* Essay Format and Assigned Tasks */

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

COMP 3000 Essay 1 2010 Question 11

2010-10-11T16:51:14Z

Myagi: /* Scalability */

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-11T16:48:55Z

Myagi: /* Essay Format and Assigned Tasks */

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

:I put my portion up, I'm going to do edits here and there but a peer review would be nice. I've put links up that are meant to be references, but I won't put it in the references section yet as more people will be adding to it.
:--[[User:Myagi|Myagi]] 12:48, 11 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

COMP 3000 Essay 1 2010 Question 11

2010-10-11T16:36:02Z

Myagi: /* Answer */

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-10T17:26:41Z

Myagi: /* Essay Format and Assigned Tasks */

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security

:*Conclusion

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks