Soma-notes - User contributions [en]

COMP 3000 Essay 2 2010 Question 4

2010-12-03T09:39:13Z

Npradhan: /* Critique */ added paragraph on merits of AVM cheat detection

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation.

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it casts a much wider net for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

While the AVM cheat detection method comes with a performance penalty, it was successful in its goal. This cheat detection method was able to detect all of the 26 downloaded cheats for Counterstrike, without prior knowledge of the nature of the cheats. This is an important step since most cheat detection is responsive. That is, cheat detection for a specific type of cheat is added when that cheat is developed, used, and discovered. The AVM cheat detection method shuts the door on a wide variety of cheats without any prior knowledge of how the cheat works.

When discussing the Counterstrike use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same. This type of exploit would not be detected, since both the real time execution and the execution log that is known to be correct would both show the exploit as being correct.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-03T09:25:07Z

Npradhan: /* Critique */ better expression

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation.

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it casts a much wider net for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

When discussing the Counterstrike use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same. This type of exploit would not be detected, since both the real time execution and the execution log that is known to be correct would both show the exploit as being correct.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-03T09:15:23Z

Npradhan: /* Critique */

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation.

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it has a wider support for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

When discussing the Counterstrike use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same. This type of exploit would not be detected, since both the real time execution and the execution log that is known to be correct would both show the exploit as being correct.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-03T09:14:56Z

Npradhan: /* Critique */ organisation

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation.

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it has a wider support for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

Also for their use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same. This type of exploit would not be detected, since both the real time execution and the execution log that is known to be correct would both show the exploit as being correct.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-03T09:12:24Z

Npradhan: /* Contribution */

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation.

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it has a wider support for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same. This type of exploit would not be detected, since both the real time execution and the execution log that is known to be correct would both show the exploit as being correct.

For their use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-03T09:11:54Z

Npradhan: /* Critique */ emphasis

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it has a wider support for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same. This type of exploit would not be detected, since both the real time execution and the execution log that is known to be correct would both show the exploit as being correct.

For their use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-02T09:48:55Z

Npradhan: /* Research problem */ asked a question without question mark, phrasing was awkward

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it has a wider support for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same.

For their use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-02T09:39:08Z

Npradhan: /* Research problem */ phrasing

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. How can we determine if a remote node is running the code correctly or if the machine itself is working as intended. Network activity is a common solution to this problem, as they look at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it has a wider support for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same.

For their use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

COMP 3000 Essay 2 2010 Question 4

2010-12-02T09:22:38Z

Npradhan: /* Research problem */ phrasing needed work.

==Accountable Virtual Machines ==
'''Authors:''' Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

'''Affiliates:'''
University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

'''Link to Paper:''' [http://www.usenix.org/events/osdi10/tech/full_papers/Haeberlen.pdf Accountable Virtual Machines]

==Background Concepts==

'''Accountable Virtual Machine (AVM)'''

'''Deterministic Replay''': A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [[#References | [1]]] has contributed a highly efficient snap-shotting mechanism for these replays.

'''Accountability:''' Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

'''Remote Fault Detection:''' There are programs like GridCop[[#References | [2]]] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

'''Cheat Detection:''' Cheating in games or any specific modification in a program can be either scanned[[#References | [3][4]]] for or prevented[[#References | [5][6]]] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

'''Integrity Violations:''' This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

=Research problem=

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect '''integrity violations''' can be separated into different categories of operations. The first would be '''Cheat Detection''', where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[[#References |[4]]] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

'''Accountability''' is another important problem that many have already worked on. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[[#References |[7]]], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an '''AVM'''. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is '''remote fault detection''' in a distributed system. How can we determine if a remote node is running the code correctly or if the machine itself is working as intended. Network activity is a common solution to this problem, as they look at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[[#References |[8]]] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for '''deterministic replay'''. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

=Contribution=
The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[[#References |[9]]], the temper-evident log was adapted from code in PeerReview[[#References |[7]]], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used.
The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log
and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs.
It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

'''Tamper-Evident Log'''

The log is made up of hash code entries.
Each log entry in form e = (s,t,c,h)
s = monotonically increasing sequence number
t = type
c = data of the type
h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c))
H() is a hash function.
|| stands for concatenation

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

'''Auditing Mechanism'''

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

=Critique=

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it has a wider support for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[[#References |[10]]]. A 10 year old game [[#References |[10]]] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [[#References |[12]]]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same.

For their use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

=References=
[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and
A. Warfield. Remus: High availability via asynchronous virtual
machine replication. In Proceedings of the USENIX Symposium
on Networked Systems Design and Implementation (NSDI), Apr.
2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware.
http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof
playout for centralized and peer-to-peer gaming. IEEE/ACM
Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online
games against cheating. In Proceedings of the Workshop on Network
and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical
accountability for distributed systems. In Proceedings of
the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but
verify: Monitoring remotely executing programs for progress
and correctness. In Proceedings of the ACM SIGPLAN Annual
Symposium on Principles and Practice of Parallel Programming
(PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007

Talk:COMP 3000 Essay 2 2010 Question 4

2010-11-16T15:54:47Z

Npradhan: /* Group Essay 2 */

= Group Essay 2 =

Hello Group. Please post your information here. I assume everybody read the email at your connect account. Anyone specific wants to send him the email with the group members inside? If not, I just go ahead tomorrow at about 13:00 and send the email with the group members who wrote their contact information in here. - [[User:Sschnei1|Sschnei1]] 03:25, 15 November 2010 (UTC)
 

Sebastian Schneider sschnei1@connect.carleton.ca

Matthew Chou mchou2@connect.carleton.ca

Mark Walts mwalts@connect.carleton.ca

Henry Irving hirving@connect.carleton.ca

Jean-Benoit Aubin jbaubin@connect.carleton.ca

Pradhan Nishant npradhan npradhan@connect.carleton.ca

Only Paul Cox didn't answer i sent this morning.

Cox Paul pcox

And I just sent an email to the teacher.

--Jean-Benoit

COMP 3000 Essay 1 2010 Question 11

2010-10-15T07:55:47Z

Npradhan: /* Changing Storage Needs */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially, business' are increasingly choosing to archive and retain all the data they produce and "store everything, forever" (Dell, 2010)[[#Foot1|1]] is the common mantra of storage administrators. The storage industry has been able to keep up with the increasing demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has hardly changed since the 1950s. The dominate storage mechanism is still block-based storage technology.

Innovation in storage technology is especially pertinent to businesses that use network storage. The two dominant technologies of network storage; storage area network (SAN) and network-attached storage (NAS), each have their own benefits and drawbacks and would benefit greatly with improvement in storage technology. Specifically, improvements that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions would be ideal.

Object Based Storage Devices (OSD) solve these issues by design. Using objects that consist of both data and metadata, they are accessed with defined methods such as read and write and carry a unique identifier. They also handle the underlying security, space allocation and basic storage routines.[[#Foot2|2]] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash keys and benefits in management and business intelligence with rich metadata, OSD can be seen as a viable alternative to improve the standard architectures of SAN and NAS.

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950s with the introduction of the IBM 350 disk storage unit.[[#Foot3|3]] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[[#Foot4|4]] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into a block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions" (Bandulet, 2007)[[#Foot2|2]]. This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these dated commands.

== Overview of Object-Based Storage ==

Unlike block-based storage, object-based storage research started in the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical memory. These objects also have metadata and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction, which allows for much more flexibility, which in turn gives rise to numerous capabilities not present in block-based storage. This is important because the needs placed on filesystems have changed, and we will see as we compare object based storage with block based storage that the design of objects is more suited to the needs of today's filesystems, than blocks, especially with networked filesystems,.

== Changing Storage Needs ==

Storage needs have changed significantly since the first hard disks were developed in the 1950s, and the standardization of the interface in the 1970s. This means that the functionality of storage devices must also change to reflect these needs. Storage has become increasingly networked. Networked storage must deal with several issues. Firstly, the storage architecture must be able to scale to terabytes (10^12 bytes), petabytes (10^15 bytes) and beyond with many servers and clients while avoiding bottlenecks. Bottlenecks in networks are easier to avoid if data storage is distributed rather than centralized. The data stored on these networks has also become more sensitive. Personal information, such as financial, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has increased, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in its design. Object based storage is more suited to address these issues by design.

== Comparison of object and block based stores ==
=== Scalability ===
Scalability is very important for large businesses that need to manage large data centers. Managing metadata while ensuring data access speed as the systems grows is paramount.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS coordinates the interface between file level access and clients. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[[#Foot5|5]] All data traffic must flow through this single access point. The benefits of the NAS is through its ability to manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SANs on the other hand offer file systems that are distributed, but provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SANs of the past. Modern SANs can serve a much larger set of users, not all of whom can or should be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[[#Foot6|6]]. Object stores can make user privilege management a much more manageable task, since each object can is aware of who is allowed to access it.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device.[[#Foot1|1]] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be checked for to ensure integrity and guard against data corruption. The hash key can also be used for disk management to quickly detect and flag duplicate data.[[#Foot1|1]]

=== Security ===

Security is an issue that must be confronted in all modern storage networks. Security issues come in a wide variety of types, so can be difficult to deal with. Both SAN and NAS have a variety of ways for handling security, but an object based approach can make the implementation of security measures more effective and easier to manage.

SAN has traditionally run on fibre channels.[[#Foot7|7]] For the sake of security, running a SAN on fibre channels helps to isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and logical unit number (LUN) masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking.[[#Foot8|8]]

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD gives it a fair amount of flexibility in controlling access. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object.[[#Foot9|9]] This can prevent a wide range of potential attacks, which gives OSD systems an advantage over block based systems.

== Real World Implementation ==

Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions.[[#Foot10|10]] Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.

== Conclusion ==
Although object storage is relatively new compared to block storage, work has progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. However, there remains challenges to its adoption in the industry. One of which, is that OSD is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[[#Foot11|11]] As newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined with integrity controls for backups and redundancies will be an attractive choice for storage administrators in the future.

==References==
1 Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

2 C. Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html> [Accessed 13 October 2010].

3 IBM 350 disk storage unit, IBM Archives. [online] IBM Available at : <http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html> [Accessed 14 October 2010].

4 M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

5 TechRepublic Guest Contributor, Foundations of Network Storage, Lesson Two: NAS. [online] Available at <http://articles.techrepublic.com.com/5100-22_11-5841266.html> [Accessed 14 October 2010].

6 Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

7 J. Tate, F. Lucchese, R. Moore. Introduction to Storage Area Networks. [online] Available at <http://www.redbooks.ibm.com/redbooks/pdfs/sg245470.pdf> [Accessed 14 October 2010].

8 H. Yoshida. LUN Security Considerations for Storage Area Networks. [online] Available at <http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf> [Accessed 14 October 2010].

9 M. Factor, D. Nagle, D. Naor, E. Riedel, J.Satran, 2005. The OSD Security Protocol. [online] Available at <http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf> [Accessed 14 October 2010].

10 S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long,
and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proc. OSDI, 2006. [online] Available at: <http://www.usenix.org/events/osdi06/tech/full_papers/weil/weil_html/> [Accessed 14 October 2010].

11 M. Factor, K. Meth, D. Naor, O. Rodeh, J. Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

COMP 3000 Essay 1 2010 Question 11

2010-10-15T07:50:27Z

Npradhan: /* Overview of Object-Based Storage */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially, business' are increasingly choosing to archive and retain all the data they produce and "store everything, forever" (Dell, 2010)[[#Foot1|1]] is the common mantra of storage administrators. The storage industry has been able to keep up with the increasing demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has hardly changed since the 1950s. The dominate storage mechanism is still block-based storage technology.

Innovation in storage technology is especially pertinent to businesses that use network storage. The two dominant technologies of network storage; storage area network (SAN) and network-attached storage (NAS), each have their own benefits and drawbacks and would benefit greatly with improvement in storage technology. Specifically, improvements that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions would be ideal.

Object Based Storage Devices (OSD) solve these issues by design. Using objects that consist of both data and metadata, they are accessed with defined methods such as read and write and carry a unique identifier. They also handle the underlying security, space allocation and basic storage routines.[[#Foot2|2]] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash keys and benefits in management and business intelligence with rich metadata, OSD can be seen as a viable alternative to improve the standard architectures of SAN and NAS.

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950s with the introduction of the IBM 350 disk storage unit.[[#Foot3|3]] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[[#Foot4|4]] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into a block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions" (Bandulet, 2007)[[#Foot2|2]]. This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these dated commands.

== Overview of Object-Based Storage ==

Unlike block-based storage, object-based storage research started in the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical memory. These objects also have metadata and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction, which allows for much more flexibility, which in turn gives rise to numerous capabilities not present in block-based storage. This is important because the needs placed on filesystems have changed, and we will see as we compare object based storage with block based storage that the design of objects is more suited to the needs of today's filesystems, than blocks, especially with networked filesystems,.

== Changing Storage Needs ==

Storage needs have changed significantly since the first hard disks were developed in the 1950s, and the standardization of the interface in the 1970s. This means that the functionality of storage devices must also change to reflect these needs. Storage has become increasingly networked. Networked storage must deal with several issues. Firstly, the storage architecture must be able to scale to terabytes (10^12 bytes), petabytes (10^15 bytes) and beyond with many servers and clients while avoiding bottlenecks. The data stored on these networks has also become more sensitive. Personal information, such as financial, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has increased, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in its design. Object based storage is more suited to address these issues by design.

== Comparison of object and block based stores ==
=== Scalability ===
Scalability is very important for large businesses that need to manage large data centers. Managing metadata while ensuring data access speed as the systems grows is paramount.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS coordinates the interface between file level access and clients. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[[#Foot5|5]] All data traffic must flow through this single access point. The benefits of the NAS is through its ability to manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SANs on the other hand offer file systems that are distributed, but provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SANs of the past. Modern SANs can serve a much larger set of users, not all of whom can or should be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[[#Foot6|6]]. Object stores can make user privilege management a much more manageable task, since each object can is aware of who is allowed to access it.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device.[[#Foot1|1]] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be checked for to ensure integrity and guard against data corruption. The hash key can also be used for disk management to quickly detect and flag duplicate data.[[#Foot1|1]]

=== Security ===

Security is an issue that must be confronted in all modern storage networks. Security issues come in a wide variety of types, so can be difficult to deal with. Both SAN and NAS have a variety of ways for handling security, but an object based approach can make the implementation of security measures more effective and easier to manage.

SAN has traditionally run on fibre channels.[[#Foot7|7]] For the sake of security, running a SAN on fibre channels helps to isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and logical unit number (LUN) masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking.[[#Foot8|8]]

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD gives it a fair amount of flexibility in controlling access. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object.[[#Foot9|9]] This can prevent a wide range of potential attacks, which gives OSD systems an advantage over block based systems.

== Real World Implementation ==

Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions.[[#Foot10|10]] Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.

== Conclusion ==
Although object storage is relatively new compared to block storage, work has progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. However, there remains challenges to its adoption in the industry. One of which, is that OSD is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[[#Foot11|11]] As newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined with integrity controls for backups and redundancies will be an attractive choice for storage administrators in the future.

==References==
1 Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

2 C. Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html> [Accessed 13 October 2010].

3 IBM 350 disk storage unit, IBM Archives. [online] IBM Available at : <http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html> [Accessed 14 October 2010].

4 M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

5 TechRepublic Guest Contributor, Foundations of Network Storage, Lesson Two: NAS. [online] Available at <http://articles.techrepublic.com.com/5100-22_11-5841266.html> [Accessed 14 October 2010].

6 Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

7 J. Tate, F. Lucchese, R. Moore. Introduction to Storage Area Networks. [online] Available at <http://www.redbooks.ibm.com/redbooks/pdfs/sg245470.pdf> [Accessed 14 October 2010].

8 H. Yoshida. LUN Security Considerations for Storage Area Networks. [online] Available at <http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf> [Accessed 14 October 2010].

9 M. Factor, D. Nagle, D. Naor, E. Riedel, J.Satran, 2005. The OSD Security Protocol. [online] Available at <http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf> [Accessed 14 October 2010].

10 S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long,
and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proc. OSDI, 2006. [online] Available at: <http://www.usenix.org/events/osdi06/tech/full_papers/weil/weil_html/> [Accessed 14 October 2010].

11 M. Factor, K. Meth, D. Naor, O. Rodeh, J. Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

COMP 3000 Essay 1 2010 Question 11

2010-10-15T07:45:44Z

Npradhan: /* Integrity */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially, business' are increasingly choosing to archive and retain all the data they produce and "store everything, forever" (Dell, 2010)[[#Foot1|1]] is the common mantra of storage administrators. The storage industry has been able to keep up with the increasing demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has hardly changed since the 1950s. The dominate storage mechanism is still block-based storage technology.

Innovation in storage technology is especially pertinent to businesses that use network storage. The two dominant technologies of network storage; storage area network (SAN) and network-attached storage (NAS), each have their own benefits and drawbacks and would benefit greatly with improvement in storage technology. Specifically, improvements that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions would be ideal.

Object Based Storage Devices (OSD) solve these issues by design. Using objects that consist of both data and metadata, they are accessed with defined methods such as read and write and carry a unique identifier. They also handle the underlying security, space allocation and basic storage routines.[[#Foot2|2]] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash keys and benefits in management and business intelligence with rich metadata, OSD can be seen as a viable alternative to improve the standard architectures of SAN and NAS.

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950s with the introduction of the IBM 350 disk storage unit.[[#Foot3|3]] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[[#Foot4|4]] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into a block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions" (Bandulet, 2007)[[#Foot2|2]]. This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these dated commands.

== Overview of Object-Based Storage ==

Unlike block-based storage, object-based storage research started in the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical memory. These objects also have metadata and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems have changed, and we will see as we compare object based storage with block based storage that the design of objects is more suited to the needs of today's filesystems, than blocks, especially with networked filesystems,.

== Changing Storage Needs ==

Storage needs have changed significantly since the first hard disks were developed in the 1950s, and the standardization of the interface in the 1970s. This means that the functionality of storage devices must also change to reflect these needs. Storage has become increasingly networked. Networked storage must deal with several issues. Firstly, the storage architecture must be able to scale to terabytes (10^12 bytes), petabytes (10^15 bytes) and beyond with many servers and clients while avoiding bottlenecks. The data stored on these networks has also become more sensitive. Personal information, such as financial, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has increased, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in its design. Object based storage is more suited to address these issues by design.

== Comparison of object and block based stores ==
=== Scalability ===
Scalability is very important for large businesses that need to manage large data centers. Managing metadata while ensuring data access speed as the systems grows is paramount.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS coordinates the interface between file level access and clients. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[[#Foot5|5]] All data traffic must flow through this single access point. The benefits of the NAS is through its ability to manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SANs on the other hand offer file systems that are distributed, but provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SANs of the past. Modern SANs can serve a much larger set of users, not all of whom can or should be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[[#Foot6|6]]. Object stores can make user privilege management a much more manageable task, since each object can is aware of who is allowed to access it.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device.[[#Foot1|1]] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be checked for to ensure integrity and guard against data corruption. The hash key can also be used for disk management to quickly detect and flag duplicate data.[[#Foot1|1]]

=== Security ===

Security is an issue that must be confronted in all modern storage networks. Security issues come in a wide variety of types, so can be difficult to deal with. Both SAN and NAS have a variety of ways for handling security, but an object based approach can make the implementation of security measures more effective and easier to manage.

SAN has traditionally run on fibre channels.[[#Foot7|7]] For the sake of security, running a SAN on fibre channels helps to isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and logical unit number (LUN) masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking.[[#Foot8|8]]

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD gives it a fair amount of flexibility in controlling access. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object.[[#Foot9|9]] This can prevent a wide range of potential attacks, which gives OSD systems an advantage over block based systems.

== Real World Implementation ==

Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions.[[#Foot10|10]] Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.

== Conclusion ==
Although object storage is relatively new compared to block storage, work has progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. However, there remains challenges to its adoption in the industry. One of which, is that OSD is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[[#Foot11|11]] As newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined with integrity controls for backups and redundancies will be an attractive choice for storage administrators in the future.

==References==
1 Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

2 C. Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html> [Accessed 13 October 2010].

3 IBM 350 disk storage unit, IBM Archives. [online] IBM Available at : <http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html> [Accessed 14 October 2010].

4 M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

5 TechRepublic Guest Contributor, Foundations of Network Storage, Lesson Two: NAS. [online] Available at <http://articles.techrepublic.com.com/5100-22_11-5841266.html> [Accessed 14 October 2010].

6 Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

7 J. Tate, F. Lucchese, R. Moore. Introduction to Storage Area Networks. [online] Available at <http://www.redbooks.ibm.com/redbooks/pdfs/sg245470.pdf> [Accessed 14 October 2010].

8 H. Yoshida. LUN Security Considerations for Storage Area Networks. [online] Available at <http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf> [Accessed 14 October 2010].

9 M. Factor, D. Nagle, D. Naor, E. Riedel, J.Satran, 2005. The OSD Security Protocol. [online] Available at <http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf> [Accessed 14 October 2010].

10 S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long,
and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proc. OSDI, 2006. [online] Available at: <http://www.usenix.org/events/osdi06/tech/full_papers/weil/weil_html/> [Accessed 14 October 2010].

11 M. Factor, K. Meth, D. Naor, O. Rodeh, J. Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

COMP 3000 Essay 1 2010 Question 11

2010-10-15T07:44:58Z

Npradhan: /* Integrity */ Removal of redundant phrasing

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially, business' are increasingly choosing to archive and retain all the data they produce and "store everything, forever" (Dell, 2010)[[#Foot1|1]] is the common mantra of storage administrators. The storage industry has been able to keep up with the increasing demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has hardly changed since the 1950s. The dominate storage mechanism is still block-based storage technology.

Innovation in storage technology is especially pertinent to businesses that use network storage. The two dominant technologies of network storage; storage area network (SAN) and network-attached storage (NAS), each have their own benefits and drawbacks and would benefit greatly with improvement in storage technology. Specifically, improvements that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions would be ideal.

Object Based Storage Devices (OSD) solve these issues by design. Using objects that consist of both data and metadata, they are accessed with defined methods such as read and write and carry a unique identifier. They also handle the underlying security, space allocation and basic storage routines.[[#Foot2|2]] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access, ensured integrity of data with unique hash keys and benefits in management and business intelligence with rich metadata, OSD can be seen as a viable alternative to improve the standard architectures of SAN and NAS.

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950s with the introduction of the IBM 350 disk storage unit.[[#Foot3|3]] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[[#Foot4|4]] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into a block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions" (Bandulet, 2007)[[#Foot2|2]]. This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these dated commands.

== Overview of Object-Based Storage ==

Unlike block-based storage, object-based storage research started in the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical memory. These objects also have metadata and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems have changed, and we will see as we compare object based storage with block based storage that the design of objects is more suited to the needs of today's filesystems, than blocks, especially with networked filesystems,.

== Changing Storage Needs ==

Storage needs have changed significantly since the first hard disks were developed in the 1950s, and the standardization of the interface in the 1970s. This means that the functionality of storage devices must also change to reflect these needs. Storage has become increasingly networked. Networked storage must deal with several issues. Firstly, the storage architecture must be able to scale to terabytes (10^12 bytes), petabytes (10^15 bytes) and beyond with many servers and clients while avoiding bottlenecks. The data stored on these networks has also become more sensitive. Personal information, such as financial, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has increased, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in its design. Object based storage is more suited to address these issues by design.

== Comparison of object and block based stores ==
=== Scalability ===
Scalability is very important for large businesses that need to manage large data centers. Managing metadata while ensuring data access speed as the systems grows is paramount.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS coordinates the interface between file level access and clients. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[[#Foot5|5]] All data traffic must flow through this single access point. The benefits of the NAS is through its ability to manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SANs on the other hand offer file systems that are distributed, but provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SANs of the past. Modern SANs can serve a much larger set of users, not all of whom can or should be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[[#Foot6|6]]. Object stores can make user privilege management a much more manageable task, since each object can is aware of who is allowed to access it.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device.[[#Foot1|1]] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be checked for to ensure integrity and guard against data corruption. The hash key can also be used for management of data to flag duplicate data.[[#Foot1|1]]

=== Security ===

Security is an issue that must be confronted in all modern storage networks. Security issues come in a wide variety of types, so can be difficult to deal with. Both SAN and NAS have a variety of ways for handling security, but an object based approach can make the implementation of security measures more effective and easier to manage.

SAN has traditionally run on fibre channels.[[#Foot7|7]] For the sake of security, running a SAN on fibre channels helps to isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and logical unit number (LUN) masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking.[[#Foot8|8]]

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD gives it a fair amount of flexibility in controlling access. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object.[[#Foot9|9]] This can prevent a wide range of potential attacks, which gives OSD systems an advantage over block based systems.

== Real World Implementation ==

Ceph is an example of a real world networked storage system based around OSDs. The Ceph developers specifically list performance, reliability, and scalability as the benefits their system offers over current solutions.[[#Foot10|10]] Since Ceph is based on OSDs, it takes advantage of the ability for clients to interact directly with the devices, which avoids the traditional bottlenecks to performance caused by SAN controllers or NAS heads. This direct access allows Ceph to support a very large number of clients concurrently accessing data on the system. Since objects have security controls it can allow this direct access safely, unlike other network storage architectures.

== Conclusion ==
Although object storage is relatively new compared to block storage, work has progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. However, there remains challenges to its adoption in the industry. One of which, is that OSD is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[[#Foot11|11]] As newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined with integrity controls for backups and redundancies will be an attractive choice for storage administrators in the future.

==References==
1 Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

2 C. Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html> [Accessed 13 October 2010].

3 IBM 350 disk storage unit, IBM Archives. [online] IBM Available at : <http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html> [Accessed 14 October 2010].

4 M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

5 TechRepublic Guest Contributor, Foundations of Network Storage, Lesson Two: NAS. [online] Available at <http://articles.techrepublic.com.com/5100-22_11-5841266.html> [Accessed 14 October 2010].

6 Satran and Teperman, Object Store Based SAN File Systems. [online] IBM Labs Available at: <http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf> [Accessed 14 October 2010].

7 J. Tate, F. Lucchese, R. Moore. Introduction to Storage Area Networks. [online] Available at <http://www.redbooks.ibm.com/redbooks/pdfs/sg245470.pdf> [Accessed 14 October 2010].

8 H. Yoshida. LUN Security Considerations for Storage Area Networks. [online] Available at <http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf> [Accessed 14 October 2010].

9 M. Factor, D. Nagle, D. Naor, E. Riedel, J.Satran, 2005. The OSD Security Protocol. [online] Available at <http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf> [Accessed 14 October 2010].

10 S. A. Weil, S. A. Brandt, E. L. Miller, D. D. E. Long,
and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proc. OSDI, 2006. [online] Available at: <http://www.usenix.org/events/osdi06/tech/full_papers/weil/weil_html/> [Accessed 14 October 2010].

11 M. Factor, K. Meth, D. Naor, O. Rodeh, J. Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-14T08:37:42Z

Npradhan: /* Essay Format and Assigned Tasks */

== Wikipedia Sources ==
I think we may want to replace the references to wikipedia with something more authoritative. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open this massive pdf] from IBM supports the idea that fiber channels are the dominant infrastructure of SANs, but i'm not sure if it mentions how that is changing.

The wikipedia page for LUN masking has [http://www.sansecurity.com/san-security-faq.shtml this] as its reference for the definitions, there's also [http://technet.microsoft.com/en-us/library/cc758640(WS.10).aspx this] microsoft article and [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf this] paper from Hitachi. I'm not sure which of these is most relevant since I just did a quick google search and haven't really read up on LUN masking or zoning, so someone else would probably be better suited to decide which one if any to use.

How does that sound to everyone?

--[[User:Mbingham|Mbingham]] 02:55, 14 October 2010 (UTC)

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

::I apologize for the delay, this has been an easy thing to neglect during a busy week. What's the proper way to reference with this wiki? --[[User:Dagar|Dagar]] 21:29, 13 October 2010 (UTC)

:::check out this reference guide, it explain how to reference any material you find online. [http://libweb.anglia.ac.uk/referencing/harvard.htm Harvard System of Reference] --[[User:Smcilroy|Smcilroy]] 22:46, 13 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that more people aren't tagging the document outline and volunteering responsibilities is kind of unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

I'm going to expand the scalability and integrity sections. Then once the security section is done, I think that just leaves the section on the OSD standard and future plans for the tech. Then in the conclusion we can recap.
--[[User:Smcilroy|Smcilroy]] 22:54, 13 October 2010 (UTC)

:Sounds like a plan. I'll clean up/expand what I have written and get started with some initial stuff for the object sections. Anyone else is welcome to expand and edit as well.
:--[[User:Mbingham|Mbingham]] 00:44, 14 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham (I added a useful diagram here -Npradhan)
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage -Npradhan
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage -Npradhan
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage -dagar

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security -Myagi

:*Conclusion -Smcilroy

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-14T08:37:11Z

Npradhan: /* Essay Format and Assigned Tasks */

== Wikipedia Sources ==
I think we may want to replace the references to wikipedia with something more authoritative. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open this massive pdf] from IBM supports the idea that fiber channels are the dominant infrastructure of SANs, but i'm not sure if it mentions how that is changing.

The wikipedia page for LUN masking has [http://www.sansecurity.com/san-security-faq.shtml this] as its reference for the definitions, there's also [http://technet.microsoft.com/en-us/library/cc758640(WS.10).aspx this] microsoft article and [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf this] paper from Hitachi. I'm not sure which of these is most relevant since I just did a quick google search and haven't really read up on LUN masking or zoning, so someone else would probably be better suited to decide which one if any to use.

How does that sound to everyone?

--[[User:Mbingham|Mbingham]] 02:55, 14 October 2010 (UTC)

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

::I apologize for the delay, this has been an easy thing to neglect during a busy week. What's the proper way to reference with this wiki? --[[User:Dagar|Dagar]] 21:29, 13 October 2010 (UTC)

:::check out this reference guide, it explain how to reference any material you find online. [http://libweb.anglia.ac.uk/referencing/harvard.htm Harvard System of Reference] --[[User:Smcilroy|Smcilroy]] 22:46, 13 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that more people aren't tagging the document outline and volunteering responsibilities is kind of unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

I'm going to expand the scalability and integrity sections. Then once the security section is done, I think that just leaves the section on the OSD standard and future plans for the tech. Then in the conclusion we can recap.
--[[User:Smcilroy|Smcilroy]] 22:54, 13 October 2010 (UTC)

:Sounds like a plan. I'll clean up/expand what I have written and get started with some initial stuff for the object sections. Anyone else is welcome to expand and edit as well.
:--[[User:Mbingham|Mbingham]] 00:44, 14 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham (I added a useful diagram here -Npradhan)
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage -Npradhan
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage -dagar

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security -Myagi

:*Conclusion -Smcilroy

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

COMP 3000 Essay 1 2010 Question 11

2010-10-14T07:44:31Z

Npradhan: /* Overview of Object-Based Storage */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access and ensured integrity of data with unique hash key's for each object along due to some benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

[[Image:Osd_figure3.jpg|thumb|center|650x405px|alt=White diagonal cross over blue background|Diagram illustrating the components of an object store.[http://dsc.sun.com/solaris/articles/osd.html]]]

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

One application where the utility of object stores has become increasingly apparent is in SAN (Storage Area Network) systems. SAN file systems are distributed, however they provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SAN networks of the past. Modern SAN networks can serve a much larger set of users, not all of whom can be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf]. Object stores can make user privilege management a much more manageable task, since each object can 'know' who is allowed to access it.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [http://en.wikipedia.org/wiki/Storage_area_network] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://en.wikipedia.org/wiki/Fibre_Channel_zoning] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[8] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[9] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[10] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]

[11] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]

[12] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[13] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

COMP 3000 Essay 1 2010 Question 11

2010-10-14T07:44:03Z

Npradhan: /* Changing Storage Needs */ Added section on SAN since it is representative of "changing storage needs"

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access and ensured integrity of data with unique hash key's for each object along due to some benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

[[Image:Osd_figure3.jpg|thumb|center|650x405px|alt=White diagonal cross over blue background|Diagram illustrating the components of an object store.http://dsc.sun.com/solaris/articles/osd.html]]

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

One application where the utility of object stores has become increasingly apparent is in SAN (Storage Area Network) systems. SAN file systems are distributed, however they provide a single system image of the file system. This means that a local user need not be concerned with where the data is physically stored, since a level of abstraction separates the user from the physical location of the data. In the past, SANs were implemented on private fiber channel networks, which were designed to emulate local storage media. As long as the network remained exclusive, it could be assumed that all the clients could be trusted, so security was not a primary concern. The lack of security concern is one of the main reasons that block storage was a viable option for SAN networks of the past. Modern SAN networks can serve a much larger set of users, not all of whom can be trusted. This, in addition to the possible adoption of IP based SAN solutions, make data security a primary concern[http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf]. Object stores can make user privilege management a much more manageable task, since each object can 'know' who is allowed to access it.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [http://en.wikipedia.org/wiki/Storage_area_network] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://en.wikipedia.org/wiki/Fibre_Channel_zoning] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[8] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[9] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[10] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]

[11] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]

[12] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[13] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-14T06:20:20Z

Npradhan: /* Essay Format and Assigned Tasks */

== Wikipedia Sources ==
I think we may want to replace the references to wikipedia with something more authoritative. [http://www.redbooks.ibm.com/abstracts/sg245470.html?Open this massive pdf] from IBM supports the idea that fiber channels are the dominant infrastructure of SANs, but i'm not sure if it mentions how that is changing.

The wikipedia page for LUN masking has [http://www.sansecurity.com/san-security-faq.shtml this] as its reference for the definitions, there's also [http://technet.microsoft.com/en-us/library/cc758640(WS.10).aspx this] microsoft article and [http://www.it.hds.com/pdf/wp91_san_lun_secur.pdf this] paper from Hitachi. I'm not sure which of these is most relevant since I just did a quick google search and haven't really read up on LUN masking or zoning, so someone else would probably be better suited to decide which one if any to use.

How does that sound to everyone?

--[[User:Mbingham|Mbingham]] 02:55, 14 October 2010 (UTC)

== Some Sourcing Issues and Other Stuff ==
Just a reminder, if we're taking direct quotes from a source they need to be in quotation marks and attributed with the authors name and the date (I think) in parenthesis at the end, not just a link or footnote reference. There was an issue with this in the first couple sentences of the scalability section. I've put it in quotes (though I didn't see any authors listed so I just put the company), but I think that that information might be better worked into the "Changing Storage Needs" section, what do you guys think?

Also, I think probably sometime today we should divide the rest of the sections up and try to get most of the content in so we have tomorrow for editing and combining the information so that it flows well. Again, any thoughts?

--[[User:Mbingham|Mbingham]] 19:32, 12 October 2010 (UTC)

: Sorry about the citation issue, you're right. I used the quote to emphasize the fact that scalability issues are evident in disk block systems. But now that I read it, it doesn't really transition well into the second paragraph. I don't mind if you move the quote to another section. Other than that, I could just finish up the section about Security. I don't really know who else is actively contributing to this essay though...or at least don't see anyone volunteering to take a topic other than Mbingham, Smcilroy and myself...
:--[[User:Myagi|Myagi]] 15:47, 12 October 2010 (UTC)

:No problem, it's just something to watch out for. I'll integrate it with the other section.
:Dagar has been making edits to the essay as well, he's cleaned up the language in some of the sections and organized the references. Maybe he would like to tackle one of the object specific sections?
:--[[User:Mbingham|Mbingham]] 20:02, 12 October 2010 (UTC)

::I apologize for the delay, this has been an easy thing to neglect during a busy week. What's the proper way to reference with this wiki? --[[User:Dagar|Dagar]] 21:29, 13 October 2010 (UTC)

:::check out this reference guide, it explain how to reference any material you find online. [http://libweb.anglia.ac.uk/referencing/harvard.htm Harvard System of Reference] --[[User:Smcilroy|Smcilroy]] 22:46, 13 October 2010 (UTC)

I'm going to finish up the Security section if nobody tags it by the end of today. I have a draft written up. The fact that more people aren't tagging the document outline and volunteering responsibilities is kind of unnerving...

--[[User:Myagi|Myagi]] 07:57, 13 October 2010 (UTC)

I'm going to expand the scalability and integrity sections. Then once the security section is done, I think that just leaves the section on the OSD standard and future plans for the tech. Then in the conclusion we can recap.
--[[User:Smcilroy|Smcilroy]] 22:54, 13 October 2010 (UTC)

:Sounds like a plan. I'll clean up/expand what I have written and get started with some initial stuff for the object sections. Anyone else is welcome to expand and edit as well.
:--[[User:Mbingham|Mbingham]] 00:44, 14 October 2010 (UTC)

== Essay Format and Assigned Tasks ==
So I added an intro and I did it like it was an essay and not a wiki article. Feel free to edit, expand and replace it as you see fit.
Also I think we should just list the topics we want to talk about and then people can put their name beside it and work on it, that way we don't have two people working on the same thing. Then we can edit it all so it fits together in the end. What do you think?
--[[User:Smcilroy|Smcilroy]] 15:16, 10 October 2010 (UTC)

:Sounds like a good idea. Here's a relatively quick list of topics to talk about, based on our discussions and the outline below. Add in any sections anyone thinks are missing and put your name beside areas you want:

:*Overview and history of block-based storage -Mbingham (I added a useful diagram here -Npradhan)
:*Block based storage standards - SCSI, SATA, ATA/IDE etc -Mbingham
:*Networked storage architectures: SAN and NAS -Smcilroy

:*How storage needs have changed since the development of block-based storage
:(maybe focus on the Internet, massive coorporate/government networks, large personal storage, etc)

:*Overview and History of object-based storage
:*Object-based storage standards (ANSI OSD specification)
:*Object-based storage applied to networked storage -dagar

:Comparison of object and block based stores focusing on:
::*Scalability -Myagi
::*Integrity -Myagi
::*Security -Myagi

:*Conclusion -Smcilroy

:Also, it would probably add it would be useful for people to be reading over each other's work and making suggestions, etc. I would also be cool with other people adding stuff to my sections if they have additional info or if there's something i've overlooked. There's 11 or 12 sections there, and I think there's six of us, so we can start off taking maybe 2 sections each, and then if we don't have all the sections covered we can divide them up later. How does that sound?
:--[[User:Mbingham|Mbingham]] 16:45, 10 October 2010 (UTC)

:Good plan, I took Scalability and Integrity comparisons of object and block stores.
:--[[User:Myagi|Myagi]] 13:26, 10 October 2010 (UTC)

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks

COMP 3000 Essay 1 2010 Question 11

2010-10-14T05:53:38Z

Npradhan: /* Overview of Object-Based Storage */

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access and ensured integrity of data with unique hash key's for each object along due to some benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

[[Image:Osd_figure3.jpg|thumb|center|650x405px|alt=White diagonal cross over blue background|Diagram illustrating the components of an object store.http://dsc.sun.com/solaris/articles/osd.html]]

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [http://en.wikipedia.org/wiki/Storage_area_network] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://en.wikipedia.org/wiki/Fibre_Channel_zoning] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[8] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[9] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[10] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]

[11] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]

[12] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[13] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

File:Osd figure3.jpg

2010-10-14T05:28:38Z

Npradhan: Diagram of the components of an object store. Taken from: http://dsc.sun.com/solaris/articles/osd.html

Diagram of the components of an object store.

Taken from:
http://dsc.sun.com/solaris/articles/osd.html

COMP 3000 Essay 1 2010 Question 11

2010-10-14T04:37:16Z

Npradhan: /* Introduction */ better phrasing

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access and ensured integrity of data with unique hash key's for each object along due to some benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [http://en.wikipedia.org/wiki/Storage_area_network] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://en.wikipedia.org/wiki/Fibre_Channel_zoning] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[8] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[9] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[10] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]

[11] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]

[12] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[13] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

COMP 3000 Essay 1 2010 Question 11

2010-10-14T04:35:10Z

Npradhan: /* Introduction */ Typo, should be "ensured" not "insured"

=Question=

Why are object stores an increasingly attractive building block for filesystems (as opposed to block-based stores)? Explain.

=Answer=

== Introduction ==

Each year we are faced with growing storage needs as the world's information increases exponentially and business' are increasingly choosing to archive and retain all the data they produce. The storage industry has been able to keep up with demand with matching increases in storage capacity. Unfortunately the interfaces between clients and storage devices has remained unchanged since the 1950's. The dominate storage mechanism is still block-based storage technology. This has been sufficient for meeting most needs of modern businesses, but as we enter an age where "store everything, forever"[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] is the common mantra of storage administrators and unstructured data with little meta-data is the norm, we have to look for technology that can provide better scalability, business intelligence, and management while ensuring security and data access speed of traditional storage solutions.

Object Based Storage Devices (OSD) solve these issues because of how they are designed. Object storage uses objects that consists of data and meta-data that describe the object. They are accessed with defined methods such as read and write and carry a unique ID. They manage all necessary low-level storage, space management, and security functions.[http://developers.sun.com/solaris/articles/osd.html] This storage technology has the potential to address some of the problems with block-based storage.

With increased scalability, better security through per-object level access and ensured integrity of data with unique hash key's for each object along with some benefits in management and business intelligence with rich meta-data, OSD can be seen as a viable alternative to improve the standard architectures of storage area network (SAN) and network-attached storage (NAS).

== Overview of Block-Based Storage ==

Hard disks as a storage medium date back to the 1950's with the introduction of the IBM 350 disk storage unit.[http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html] Hard disks store data in blocks, which are a fixed length series' of bytes. Since early devices like the IBM 350, the interface that the operating system uses to communicate with the hard disk has remained mostly the same.[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722] This interface simply allows the operating system to read or write to blocks on the disk. This means that the goal of abstracting stored data into related groups or into human-understandable constructs such as objects or files is left completely in the space of the operating system's filesystem. For example, when the filesystem wants to write data to a file it must translate that into what block on the disk to write to. In this way, the scope of a filesystem extends from high level constructs like files to low level constructs like blocks. This wide scope is necessary because of the simple interface presented to the filesystem that must be abstracted up to the complex expectations of a user.

Multiple standards exist to implement this interface. The small computer system interface (SCSI) standards, which have been around in one form or another since the late 1970s, are popular with industry. Parallel ATA, another standard which was designed in the 1980s, continues today in the form of Serial ATA (SATA). However, even though these standards have been around for a long time, "the logical interface, or the command set, has seen only minor additions"[http://developers.sun.com/solaris/articles/osd.html](Bandulet). This means that the functionality that the command set allows has also remained mostly the same, since the functionality must be built on top of these commands.

== Overview of Object-Based Storage ==
'''Anyone feel free to expand on this section'''

Unlike block-based storage, whose design reaches back to the 1950s, object-based storage research goes back to the 1990s. See for example the work of Gibson et al in "A Cost-Effective, High-Bandwidth Storage Architecture", Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, 1998. The fundamental idea of an object based storage device is to have the storage device itself handle a layer of abstraction on top of the block. Instead of the interface presenting the filesystem with blocks to read and write to, the interface presents the filesystem with "objects" which it can read to, write to, create, or destroy. Objects can be variable sized, and the device itself handles mapping onto physical blocks of memory. These objects also have meta-data and access controls immediately associated with them. This allows the filesystem to work at a higher level of abstraction. This is important because the needs placed on filesystems has changed, and we will see as we compare object based storage with block based storage that the design of objects are more suited to the needs of todays filesystems than blocks.

== Changing Storage Needs ==
'''Note: Just getting the ball rolling on this section. Anyone else is welcome to pick it up and expand'''

Storage needs have changed a lot since the 1950s, when the first hard disks were developed, and the 1970s, when the interface became standardized. This means that the functionality of storage devices must also change to reflect these needs. Firstly, the scale of data being stored, both personally and by organizations, has gone up by orders of magnitude. Today personal hard drives routinely store terabytes of data, massive networks store even more. In fact, "a survey of over one thousand ASNP members indicates that 20% of them manage over 100 terabytes of data" (Seagate Research, 2005).[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf] Data has also become more sensitive. Personal information, such as credit card numbers and financial information, is stored in large databases. Sensitive corporate and governmental information is stored similarly. Since the value of data has gone up, it becomes more important to ensure the data's integrity and security. Block based storage, as we will see, has difficulty dealing with these priorities because of limitations inherent in it's design. Object based storage is more suited to address these issues because of how it has been designed.

== Comparison of object and block based stores ==
=== Scalability ===
Today's storage systems consist of two main technologies, SAN and NAS storage. They both have their benefits and drawbacks. The key issues being managing metadata and ensuring data access speed as the systems grow.

Most block based storage systems contain many layers of metadata. There are also various types of virtualized systems that contain metadata to deal with device diversity or remapping of blocks for archiving or duplication. Building systems to scale with the metadata becomes a major issue. But at the same time the current speeds of block-based storage needs to be maintained.

NAS is a file system that coordinates the interface between file blocks and the clients access to files. This is done through a single NAS head which usually has thousands of gigabytes of storage behind it.[http://articles.techrepublic.com.com/5100-22_11-5841266.html] All data traffic must flow through this single access point. The benefits of the NAS file system is through its ability to set block access, manage security, prevent unauthorized access to files and use metadata to map blocks into files for the client. However, this causes a bottleneck issue with all the data passing through one point. Another issue is managing the metadata. Metadata is shared among separate metadata servers remote from the hosts. Space allocation management on different storage system layers and applications that add policy and management metadata individually is spread throughout the system. So this results in the metadata becoming very hard to manage.

SAN's on the other hand, allow data access through fiber cables directly accessing the storage. The storage management and file system is connected separately to both the client and the storage, separating the data channel with the management channel and acts as the mediator with the client and the storage blocks. This eliminates the bottleneck. Although SAN filesystems have the benefits of shared access for scalability, coordination of this shared access leads to scalability problems. File systems must coordinate allocation of blocks. For clients to share read-write access, they must coordinate usage of data blocks through metadata. Security also must be addressed as it opens up a host of security issues as the clients must be trusted to access the data.

Object storage provides the ability to operate a SAN setup with direct access to data while offering better security and scalability with metadata. Each object comes with a set of access rules given to it by the management server and metadata is associated and stored directly with each data object and is automatically carried between layers and across devices. Space allocation and management metadata are the responsibility of the storage device. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf] This allows metadata layers to be folded, reducing server overhead and processing, and allows for larger clusters of storage compared with traditional block-based interfaces.

=== Integrity ===
Block based file systems in archive solutions usually have no built in mechanisms for assuring data integrity. A common best practice is to conduct frequent backups, which adds to the complexity of using file systems for archiving and scalability. The mechanisms for ensuring data integrity in OSDs have mechanisms that operate differently from block store systems.

One of the major problems with storage at the block level is that if there is an error in a block, it is almost impossible to determine what part of the file system is affected. It may be the case that the error in a particular block may not even contain any data. This usually happens during a backup procedure or when a controller is organizing data.

OSDs provide a level of abstraction that hides the fact that a disk device has blocks. It no longer matters to the file system manager what kind of disk drive is being used, it only worries about managing objects. This is done through managing metadata as well as maintaining internal copies of its metadata. Hence, OSDs have knowledge of its object layout even though one or more groups of objects are on different OSDs. In this way OSDs know what kind of space is being used or unused and can scan and correct errors without losing data. In the event of a failure in recovering a file or a number of files, traditional systems may have to do a complete file system restore. However, an OSDs awareness of its object layout enables it to recover data specific to a byte range and thus restore files in an efficient manner.

OSDs have another powerful feature. Each object file has an associated hash key that is generated uniquely to the contents of the file. Thus the file can be verified for accuracy to ensure the contents remain the same and integrity to ensure the data has not been corrupted. Also it can be used for management of data to flag duplicate data. [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf]

=== Security ===

Security threats can be thought of as having four quadrants. External, internal, accidental and malicious. Block based stores have a variety of ways for handling security but there are basic concepts that SAN and NAS technologies use to secure data.

SAN has traditionally run on fibre channels, although this is a trend that is changing. [http://en.wikipedia.org/wiki/Storage_area_network] 
For the sake of security, running a SAN on fibre channels help isolate its network as they do not communicate over TCP/IP connections. However, since the SAN devices themselves do not restrict access, it's up to the network infrastructure and host system to handle its security.

Zoning and LUN masking are typical ways SAN systems could use as security measures. Zoning allocates a certain amount of storage to clients. These zones are isolated and are not allowed to communicate outside their respective zone. LUN masking is similar to zoning, however, they differ in the type of devices being used. Switches utilize zoning while disk array controllers use LUN masking. A disk array controller is a device which manages the physical disk drives and interprets them as logical unit numbers. Thus, the term LUN masking. [http://en.wikipedia.org/wiki/Fibre_Channel_zoning] 

NAS has its own vulnerabilities but as with SAN, it is only as secure as the network they operate on. NAS security is conceptually simpler than SAN. NAS environments can administer security tasks as well as control disk usage quotas. The proprietary operating system it runs on has access control configurations much like other traditional OSs that can prevent unauthorized access to data.

Unlike NAS and SAN systems, OSD devices handle security requests directly. The set of protocols used by OSD enable it to cover the four quadrants of security threats outlined above. Clients can access an OSD device by providing "cryptographically secure credentials", called capabilities, which specify a tuple (OSD name, partition ID, object ID) to identify the object. [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf] This can prevent accidental or even malicious access to an OSD externally or internally.

== Conclusion ==
Although object storage is relatively new compared to block storage, work as progressed steadily in universities and on standards such as the ANSI T10 SCSI OSD standard. But there remains challenges to its adoption in the industry. One of which, is that it is only needed in high end business solutions at the moment, preventing it from reaching smaller businesses.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf] But as newer features are added and the standards mature we will see an increased adoption.

It is obvious however that changes do need to occur as storage grows and finer levels of management are needed for data storage. Object-based storage has evolved to fit these needs where block-based storage has stagnated. The better tools for managing the data using the rich metadata of objects, the security and data transfer speeds of NAS and SAN combined and integrity controls for backups and redundancies will be an attracted choice for storage administrators in the future.

==References==

[1] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[2] Christian Bandulet, 2007. Object-Based Storage Devices. [online] Oracle Available at: <http://developers.sun.com/solaris/articles/osd.html>
[Accessed 13 October 2010].

[3] [http://www-03.ibm.com/ibm/history/exhibits/storage/storage_350.html IBM 350 Disk Storage Unit]

[4] M. Mesnier, G. R. Ganger, and E. Riedel. Object-Based Storage. IEEE Communications Magazine, 41(8), August 2003.

[5] [http://developers.sun.com/solaris/articles/osd.html Object-Based Storage Devices Christian Bandulet, July 2007]

[6] [http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf Seagate]

[7] [http://articles.techrepublic.com.com/5100-22_11-5841266.html Foundations of Network Storage]

[8] [http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Dell Object Storage Overview]

[9] Dell Product Group, 2010. Object Storage A Fresh Approach to Long-Term File Storage. [online] Dell Available at: <http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf> [Accessed 13 October 2010].

[10] [http://en.wikipedia.org/wiki/Storage_area_network Storage Area Network]

[11] [http://en.wikipedia.org/wiki/Fibre_Channel_zoning Fibre Channel zoning]

[12] [http://www.research.ibm.com/haifa/projects/storage/objectstore/papers/OSDSecurityProtocol.pdf IBM OSD Security Protocol Overview]

[13] Michael Factor, Kalman Meth, Dalit Naor, Ohad Rodeh, Julian Satran, 2005. Object storage: The future building block for storage systems. In 2nd International IEEE Symposium on Mass Storage Systems and Technologies, Sardinia [online] IBM Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.122.3959&rep=rep1&type=pdf> [Accessed 13 October 2010].

Talk:COMP 3000 Essay 1 2010 Question 11

2010-10-09T23:45:39Z

Npradhan: /* Some more links */

== Initial Outline ==
'''Introduction'''
* Thesis Statement: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes.
* What will be discussed
- Current state of block based storage
- Brief overview of object store
- Scalability
- Integrity
- Security

'''Block based storage'''
* NAS is a single storage device that is shared on a LAN
- File level/Single storage device(s) that operates individually
- Clients connect to the NAS head (interface between client and NAS) rather than to the individual storage devices
- Use small, specialized and proprietary operating systems instead of general purpose OSs
- Can enforce security constraints, quotas, indexing
- Example of access: \\NAS\Sharename

Advantages
- Dedicated, feature-rich file sharing
- Network optimized
- Centralized storage
- Less administration overhead
Disadvantages
- Metadata processing has to be handled on the NAS server
- Scaling up with more storage behind the NAS head is restricted because metadata processing on the NAS device becomes a bottleneck
- Scaling by adding additional NAS devices quickly becomes a management issue because data is isolated on individual NAS islands
- High latency protocols that clogs LANs, using TCP/IP
- Not suitable for data transfer intensive apps

* SAN filesystem is a local network of multiple devices that operate on disk blocks and provides a file system abstraction
- Block level/local network of multiple device
- Every client computer has its own file system
- A SAN alone does not provide the file abstraction but there is a file system built on top of SANs
- Example of access: D:\, E:\, etc.

Advantages
- High-performance shared disk
- Scalable
- Short I/O paths
- Lots of parallelism
Disadvantages
- Harder to maintain, lots of file systems to manage
- Harder to administer, lots of storage access rights to coordinate

* OSDs closes the gap between the scalability of SAN and the file sharing capabilities of NAS
* Block storage has limitations that have become more apparent as demand for scalability and security has grown

'''Overview of OSD'''
* An OSD device deals in objects
- Handles the mapping from object to physical media locations itself
- Tracks metadata as attributes, such as creation timestamps, allowing for easier sharing of data among clients
- OSDs are directly connected to clients without the need for an intermediary to handle metadata.

* ANSI ratified version 1.0 of the OSD specification in 2004, defining a protocol for communication with object-based storage devices
* The OSD specification describes:
- a SCSI command set that provides a high-level interface to OSD devices
- how file systems and databases stores and retrieves data objects
- work has continued in ratifying OSD-2 and OSD-3 specificiations

'''Scalability'''
* Metadata is associated and stored directly with data objects and carried between layers and across devices
* Space allocation delegated to storage device
* Server has reduced overhead and processing, allowing larger clusters of storage

'''Integrity'''
* OSD's have knowledge of its object layout
* Unlike block stores, OSD's can recover data specific to a byte range
- OSD's know what space is being unused in this way
- Can scan and correct errors without losing data
* OSD's maintain internal copies of metadata
- User doesn't have to do a complete file system restore for the sake of one or few unrecoverable files
- OSD's can identify the byte range lost and restore the file efficiently

'''Security'''
* Suited for network based storage
* Associate security attributes directly with data object
* Security requests handled directly by storage device
* Computer system can access OSD device by providing cryptographically secure credentials(capability) that the OSD device can validate
- This can prevent malicious access from unauthorized requests or accidental access from misconfigured machines

'''Conclusion'''
* Reiteration of thesis statement

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

Hey Myagi, I thought i'd move your outline to its own section at the top of the page so it's more visible. I hope you don't mind. If you do, feel free to revert this edit.

--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

: It's all good.
:--[[User:Myagi|Myagi]] 10:00, 8 October 2010 (UTC)

:This outline looks pretty good to me. I like the three focus points of scalability, integrity and security, those seem to be constant themes in what i've read about object stores.

:For the block storage overview, the two current standards for a block based interface seem to be SCSI and SATA. SCSI seems to be used more in enterprise storage and SATA more in personal storage (someone correct me if i'm wrong here). We might also want to take a look at SAN and NAS. I need to do some more reading, haha.

:Also, I think we might as well start putting up some stuff on the article page. Even just a few sentences per section. I can start on that tomorrow or maybe Saturday. Of course any one else is welcome to as well.

:--[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

== Quick Overview ==
So I hope i'm not the only one who was wondering "What are object stores?" when reading the question. I don't think the textbook mentions it but I didn't read through the filesystems chapter very thoroughly. Here's where some quick googling has got me:

Most storage devices divide their storage up into blocks, a fixed length sequence of bytes. The interface that storage devices provide to the rest of the system is pretty simple. It's essentially "Here, you can read to or write to blocks, have fun". This is block-based storage.

Object-based storage is different. The interface it presents to the rest of the system is more sophisticated. Instead of directly accessing blocks on the disk, the system accesses objects. Objects are like a level of abstraction on top of blocks. Objects can be variable sized, read/written to, created, and deleted. The device itself handles mapping these objects to blocks and all the issues that come with that, rather than the OS.

Here's some papers that give an overview of object-based storage:

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1612479 Object Storage: The Future Building Block for Storage Systems]

[http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1222722 Object-Based Storage]

I think if you just look those up on google scholar you can access the pdf without even being inside carleton's network.

--[[User:Mbingham|Mbingham]] 23:56, 1 October 2010 (UTC)

== Some more links ==
I haven't been reading many academic papers on the subject so those links will be very useful.

If I may add to this. I read articles on object storage here:

[http://www.dell.com/downloads/global/products/pvaul/en/object-storage-overview.pdf Object Storage Overview]

and

[http://www.snia.org/education/tutorials/2010/spring/file/PaulMassiglia_File_Systems_Object_Storage_Devices.pdf File Systems for OSD's]

I can add that metadata is much richer in an object store context. Searching for files and grouping related files together is much easier with the context information that metadata supplies for objects. I'm beginning to read:

[http://www.seagate.com/docs/pdf/whitepaper/tp_536.pdf The advantages of OSD's]

--[[User:Myagi|Myagi]] 10:39, 5 October 2010 (UTC)

I'm going to write a version of my essay out over the long weekend with headings and references and put it up on the wiki. I'd like to know who and how many people are working on this essay but dunno if that's possible. We'll see what we do from there I guess? I was thinking we just homogenize all of the information we write into one unified essay.

--[[User:Myagi|Myagi]] 10:42, 6 October 2010 (UTC)

:I think there's 6 people in our group, though there might only be 5. I'll be working on this over the long weekend too. I was thinking maybe we should try to get a rough outline up, thursday or friday. Since Prof Somayaji mentioned that this should have the format of an essay, maybe we could start with what our main argument is?

:I was thinking something like objects stores are becoming more attractive because the demands on filesystems has changed, but the interface has not been updated to accomodate these changes. Then we could go into an explanation of block based storage, how it fails to meet the needs placed on modern FSs, then how object stores solves these problems. What do you think?

:--[[User:Mbingham|Mbingham]] 01:55, 7 October 2010 (UTC)

:You don't need to write your own independent essay on the wiki. Let's just add info as it comes along. I'll be completely without internet access this weekend, but I'll try to bring some background reading with me. Expect lots of edits from me starting Monday night/Tuesday morning.
:--[[User:Dagar|Dagar]] 12:59, 7 October 2010 (UTC)

:Sounds good! I think that's a good idea for a thesis statement and we should have a concrete one by Thurs/Fri. Although I'm not absolutely clear about the interface not being updated? I think the object store SCSI standard is constantly being ratified and now they have an OSD-3 draft. [http://www.t10.org/drafts.htm#OSD_Family T10 OSD Working Drafts]. But then again I'm probably misunderstanding something...
:--[[User:Myagi|Myagi]] 10:08, 7 October 2010 (UTC)

::I didn't mean that the object interface hadn't been updated, I meant that the block interface hasn't been updated to reflect the changing requirements put on storage. Since the block interface is still largely the same as it was decades ago (read/write to blocks) it is unable to handle the new requirements. Object stores look attractive because they are designed to deal with issues like scalability, integrity, security, etc. Sorry for the confusion, I hope it makes more sense now, haha.
::--[[User:Mbingham|Mbingham]] 15:44, 7 October 2010 (UTC)

:I gotcha, thanks for explaining! I'd say that would be a great thesis statement then: Object stores are becoming more attractive because the demands on filesystems has changed and the block store interface has not been updated to accommodate these changes. We can work from there. I think we can address the inadequacies of block based storage after stating our thesis and then for the body, we point out how object stores deal with issues of scalability, integrity, security as well as flexibility. And then some kind of nice tie up reiterating our thesis.
:--[[User:Myagi|Myagi]] 12:50, 7 October 2010 (UTC)

I mine as well put my contribution here. I'm willing to move or change it for the sake of organizing this discussion page.

--[[User:Myagi|Myagi]] 18:15, 7 October 2010 (UTC)

:(moved Myagi's outline to top of page) --[[User:Mbingham|Mbingham]] 02:31, 8 October 2010 (UTC)

Some links that I found while doing the assignment about object storage and its application to SAN systems:
http://dsc.sun.com/solaris/articles/osd.html
http://www.research.ibm.com/haifa/projects/storage/zFS/papers/amalfi.pdf

--[[User:Npradhan|Npradhan]] 23:45, 9 October 2010 (UTC)

== Other ==
-instead of storing filesytems in terms of blocks, you store in terms of objects.

-extents, named extents

-objects fancier because they can move around.

-extra level of abstraction and indirection

-files made of objects, objects made of blocks