Accountable Virtual Machines

Authors: Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, Peter Druschel

Affiliates: University of Pennsylvania, Max Planck Institute for Software Systems (MPI-SWS)]

Link to Paper: Accountable Virtual Machines

Background Concepts

Accountable Virtual Machine (AVM)

Deterministic Replay: A machine can record its executions into a file so that it can be replayed in order to see the executions and follow what was happening on the machine. Remus [1] has contributed a highly efficient snap-shotting mechanism for these replays.

Accountability: Accountability in the context of this paper means that every action done on the virtual machine is recorded and will be used against the machine or user to verify the correctness of the application. The AVM is responsible of its action and will answers for its action against an auditor.

Remote Fault Detection: There are programs like GridCop [2] that can be used to monitor the progress and execution of a remotely executing program by requesting a beacon packet. When the remote computer is sending the packets, the receiving/logging computer must be a trusted computer (hardware,software, OS) so that the receiving of packets remains consistent. To detect a fault in a remote system, every packet must arrive safely, and any interrupts during the logging must be handled or the inconsistencies will result in an inaccurate outcome. The AVM does not require trusted hardware and can be used over wide-area networks.

Cheat Detection: Cheating in games or any specific modification in a program can be either scanned [3][4] for or prevented [5][6] by certain programs. The issue with these scanning and preventative software is the knowledge/awareness of specific cheats or situations that the software can handle. An AVM is designed to counter any kind of general cheat.

Integrity Violations: This refers how the consistency of normal/expected operations of an execution does not equal to that of the host/reference (Trusted) execution, hence a violation has occurred.

- The word "node" is used to refer to a computer or server in order to represent the interactions between one computer and another, or a computer and a server.

Research problem

The research presented in this paper tries to tackle a problem that has haunted computer scientists for a long time. How can you be sure that the software running on a remote machine is working correctly or as intended. Cloud computing, online multi-player games, and other online services such as auctions are only a few examples that rely on a trust relation between users and a host. When a node (user or computer) expects some sort of result or feedback from another node, they would hope that that interaction being done would be independent of the node and only dependent on the intended software. Let's say, that node A interacts with node B with execution exe1 and node A interacts with node C also with ex1, but node C has been modified and respond with exe2. Thus, we can assume that the respond of B and C will be different. Being able to prove that the node C has been modified without any doubt is the purpose of this paper.

Previous work that has been done in efforts to prevent or detect integrity violations can be separated into different categories of operations. The first would be Cheat Detection, where in many different games there are cheats that users use to usually create benefits for themselves that was not intended by the original game.[4] These detectors are not dynamic, in the sense that they do not actually detect whether a cheat is being used, more so they are checking if there is a cheating operation that they have logged before, being operated on the user's system. For example, if there was a known cheating program named aimbot.exe that can be run in the background of a game such as CounterStrike, and the PunkBuster system that was implemented on the user's system had the aimbot.exe program already logged as a cheating program from the developers, the PunkBuster program might notify the current game servers of this or even prevent the user from playing any games until the aimbot.exe operation is no longer running.

Accountability is another important problem that has been the subject of much research. An accountable system provides a method to ascertain whether execution took place correctly and as expected. These systems can also provide reliable evidence of proper execution to third parties, if required. This evidence can also be used to defend a node when threatened with false accusation. Numerous systems already use accountability in their system, but they were mostly all linked to specific applications, where a point of reference must be used to compare. As example PeerReview[7], which is a system closely related to what the research team have worked on, must be implemented into the application which makes it less portable and cannot be implemented as easily as an AVM. PeerReview verifies the inbound and outbound packets and can see if the software is running as intended.

Another problem that is related to the paper is remote fault detection in a distributed system. That is, having a method to verify proper code execution and machine functionality of a node. Network activity is a common solution to this problem, as it looks at the inbound and outbound of the node. This can let them know how the software is operating, or in the case of AVM how the whole virtual machine is working. Gridcop[8] is another example that inspects a small number of packets periodically. Another way of determining the fault remotely is to use a trusted node, where it can tell immediately if a fault occurs or a modification is made where it should not have been made.

The problem of logging and auditing the processes of an execution of a specific node (computer) is greatly dependent on the work done for deterministic replay. Deterministic replay programs can create a log file that can be used to replay the operations done for some execution that occurs on a node. Replaying the operations done on the node can show what the node was doing, and this would seem like it is sufficient in finding out whether a node was causing integrity violations or not. The concept of snap-shoting/recording the operations is not the issue with deterministic replay, it is the fact that the data being outputted into the replay may be tampered with by the node itself so that it generates optimal results in replay. By faking the results of the operations, the auditing computer will falsely believe that the tested computer is running all operations as normal. The logging operations done by these recording programs can be directly related to the work needed to detect integrity violations.

Contribution

The accountable virtual machine (AVM), that was proposed in this essay, most useful contribution was the implementation of the accountable virtual machine monitor (AVMM). It is what allows for the fault checking of virtual machines in a cloud computing environment. The AVMM can be broken down into different parts: the virtual machine monitor (VMM), the temper-evident log, and auditing mechanisms. The VMM is based off the VMM found in VMWare Workstation 6.5.1[9], the temper-evident log was adapted from code in PeerReview[7], and the audit tools were built up from scratch.

The accountable virtual machine monitor relies on four assumptions:

1. All transmitted messages are received, retransmitted if needed.

2. Machines and Users have access to a hash function that is pre-image resistant, second pre-image resistant, and collision resistant.

3. All parties have a certified keypair, that can be used to sign messages.

4. To audit a log, the user has a reference copy of the VM used. The job of the AVMM is to record all incoming and outgoing messages to a tamper-evident log and enough info of the execution to enable deterministic replay.

The hash function used is a cyrptographic hash function, which is a way of translating a arbituary block of data into a string. While not impossible to spoof or break, if it has the three properities specified in the assumptions it is concidered a "hard" problem that is infeasible for a malicious attacker to use as a attack vector in the foreseable future, and thus is secure.

The AVMM must record nondeterministic inputs (such as hardware interrupts), because the input is asynchronous, and the exact timing of input must be recorded so the inputs can be injected at the same moment during the replay. Wall-clock time is not accurate enough for this recording, so the AVMM must use a combination of instruction pointer, branch counter, and additional registers. Not all inputs have to be recorded this way (software interrupts) because they send requests to the AVM, which will be issued again during replay.

Two parallels streams appear in the tamper-evident log: message exchanges and nondeterministic inputs. It is important for the AVMM to detect inconsistencies between the user's log and the machine's log (in case of foul play), so the AVMM simply cross-references messages and inputs during replay, thus, easily detecting any discrepancies.

The AVMM periodically takes snapshots of the AVM's current state, this facilitates fine-grain audits for the user, but it also increases overhead. The overhead is lowered slightly by the snapshots being incremental (only save the state that has been changed since the last snapshot). The user can authenticate the snapshot using a hash tree of the state (generated by the AVMM) and it can update the hash tree after each snapshot.

Tamper-Evident Log

The log is made up of hash code entries. Each log entry in form e = (s,t,c,h) s = monotonically increasing sequence number t = type c = data of the type h = hash value

The hash value is calculated by: h = H(hi-1 || s || t || H(c)) H() is a hash function. || stands for concatenation.

Each message sent gets signed with a private key, when the AVMM logs the messages with the signature attached but removes it before sending it to the AVM. To ensure nonrepudiation, an authenticator is attached to each outgoing message.

To detect when a message is dropped, each party sends an acknowledgement for each message they receive. If an acknowledgement is not received the message is resent a few times, if the user stops receiving messages, then the machine is presumed to have failed.

To preform a log check, the user retrieves a pair of authenticators, then challenges the machine to produce the log segment between the two. The log is computationally infeasible to edit without breaking the hash chain, thus, if the log has been tampered with, the hash chain will be different and the user will notified of the tampering.

Auditing Mechanism

From VMM's perspective all things are deterministic.

To perform a audit, the user:

1. obtains a segment of the machine's log and the authenticators

2. downloads a snapshot of the AVM at the beginning of the segment

3. replays the entire segment, starting from the snapshot, to verify the events in the log are the correct execution of the software.

The user can verify the execution of software through three different methods: Verifying the log, snapshot, and execution.

When the user wants to verify a log segment, the user retrieves the authenticators from the machine with the sequence numbers in the range of the log segment. The user then downloads the log segment from the machine, and, starting with the most recent snapshot before the log segment and ending with the most recent snapshot before the end of the log segment. The user then checks the authenticators for tampering. If this step proceeds, the user can assume the log segment executed properly. If the machine is faulty, the segment will be unavailable to download or may return a corrupted log segment. This can be used to convince a third party of the fault.

When the user wants to verify the snapshot, the user obtains a snapshot of the AVM's state at the beginning of the log segment. The user then downloads a snapshot from the machine and the AVMM recomputes the hash tree. The new hash tree is compared to the hash tree contained in the orignal log segment. If any discrepancies are detected, the user can use this to convince a third party of the machine's faults.

In order for the user to verifying the execution of a log segment, the user needs three inputs: the log segment, the snapshot, and the public keys of the machine and any users of the machine. The auditing tool performs two checks on the log segment, a syntactic check (determines if log is well-formed), and a semantic check (determines if the information in the log shows the correct execution of the machine).

The syntactic check checks whether all log entries are in the proper format, the signatures in each message and acknowledgement, if each message was acknowledged, and the sequence of sent and received messages is correct when compared to the sequence of messages that enter and exit the AVM.

The semantic check creates a local VM that will execute the machine's log segment, the VM is initialized with a snapshot from the machine if possible. The local VM then runs the log segment and the data is recorded. The auditing tool then checks the log segments, inputs, outputs, and verification of snapshot hashes of the replayed execution against the original log. If any discrepancies are detected then the fault is reported and can be used as evidence against the machine.

Critique

The layout of the paper is primordial for the comprehension of the reader. The introduction clearly describes what the reader has to expect in the following pages, especially what problems are addressed and how they are solved.

This paper gives multiple examples about advantages and disadvantages in an AVM. A good example is "Cheat Detection". Cheaters use programs to go around the original game code to gain an major advantage over other players. Since an AVM is generic in cheat detection it casts a much wider net for detecting cheats than most of the other cheat detection algorithms. The logs give the game the function to replay the game. Thus, players using AVM can see the way other players play by replaying the game with the player's log.

The negative side is that the player might have to suffer from the AVM. Everything is being logged and stored on the hard drive, which takes a lot amount of space. In the example in the paper it is 148mb per hour after compression. This reduces the fps. Additionally, the connection to the AVM increases the ping time to the server.

As a proof of concept, they used their AVM in the online game Counter Strike and tried to detect online cheats. They were using “Dell Precision T1500 workstations, with 8 GB of memory and 2.8 GHz Intel Core i7 860 CPUs”[pg 10]. These machines are considerably more high powered than the system requirements of Counter-Strike, which are “500 MHz processor, 96 MB RAM”[10]. A 10 year old game [10] should use fewer resources on a Dell Precision T1500 workstations. In comparison, newer games consume far more resources than Counter-Strike giving it less room to run the AVM. A 13% slowdown [pg 12.] in a game where you are only getting 30 to 40 fps is a pretty noticeable slowdown. This is very detrimental to the game play because having over 60fps is the optimal performance.

While the AVM cheat detection method comes with a performance penalty, it was successful in its goal. This cheat detection method was able to detect all of the 26 downloaded cheats for Counterstrike, without prior knowledge of the nature of the cheats. This is an important step since most cheat detection is responsive. That is, cheat detection for a specific type of cheat is added when that cheat is developed, used, and discovered. The AVM cheat detection method shuts the door on a wide variety of cheats without any prior knowledge of how the cheat works.

When discussing the Counterstrike use case, the authors did not state that in counterstrike the user can record a demo of his current game. Some online playing leagues require every player to record his own demo and upload it to the website, where every person in the league can watch it. Without this demo the team lost the match immediately. Additionally, some leagues require the player to start an extra program (e.g. Electronic Sports League WIRE), which checks the programs running in the background. It also takes random snapshots of the current player and compresses all information into a file and uploads it to one of the server in the online league, where it can be checked by any player.

In the paper the authors state that the AVM will only generate an extra 5ms of latency. While this does not seem like a lot the measurement was taken over a LAN with all the computers connected to the same switch [pg. 12]. This sample does not accurately represent real life situations and therefore lacks external validity, since many of these online games are played over the internet with the participants sometimes not even on the same continent; the latency overhead of the AVM would certainly increase due to the added distance. [12]

While the paper does test a slightly larger than one to one scenario, it certainly does not test in a real world environement where 16,32 or even 64 players would be playing in the sametime.

Spot checking can be used for applications that require snapshots every x seconds. Even if this way remove a lot of overhead and data storage, it only verify if the applications or user is working as intended every x second. Thus, someone could find the patern of those snapshots and render the AVM inutile.

AVM's are extremely effective against two types of cheating, that which gives incorrect networking messages and the one that has to be loaded with the game. This is the perfect world for tournaments competition type of game, but in a real world this wouldn't be of much use. Games get patched, users download add-ons for the game, etc. Every patch or add-ons would require a new AVM which is unreasonable for the amount of people playing the game. A solution brought from the team was to disable the right to install anything on the AVM. As this could work in a tournament environment, a normal users at home would not be pleased with this limitation.

An AVM's will not in any way catch any bug or exploit in a program that a malicious user could exploit, as the exploit would appear on both user/monitor systems and perform the same. This type of exploit would not be detected, since both the real time execution and the execution log that is known to be correct would both show the exploit as being correct.

References

[1] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: High availability via asynchronous virtual machine replication. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), Apr. 2008.

[2] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but verify: Monitoring remotely executing programs for progress and correctness. In Proceedings of the ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), June 2005.

[3] G. Hoglund. 4.5 million copies of EULA-compliant spyware. http://www.rootkit.com/blog.php?newsid=358.

[4] PunkBuster web site. http://www.evenbalance.com/.

[5] N. E. Baughman, M. Liberatore, and B. N. Levine. Cheat-proof playout for centralized and peer-to-peer gaming. IEEE/ACM Transactions on Networking (ToN), 15(1):1–13, Feb. 2007.

[6] C. M¨onch, G. Grimen, and R. Midtstraum. Protecting online games against cheating. In Proceedings of the Workshop on Network and Systems Support for Games (NetGames), Oct. 2006.

[7] A. Haeberlen, P. Kuznetsov, and P. Druschel. PeerReview: Practical accountability for distributed systems. In Proceedings of the ACM Symposium on Operating Systems Principles (SOSP),Oct. 2007.

[8] S. Yang, A. R. Butt, Y. C. Hu, and S. P. Midkiff. Trust but verify: Monitoring remotely executing programs for progress and correctness. In Proceedings of the ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP), June 2005.

[9] VMWare Workstation 6.5.1 web site. http://www.vmware.com/products/workstation/

[10] Counter-Strike http://store.steampowered.com/app/10/

[12] Larry L. Peterson and Bruce S. Davie. Computer Networks a Systems Approach, 2007