Talk:COMP 3000 Essay 2 2010 Question 5

Maybe we can all add our names below so we know who's still in this course? --Myagi 12:38, 14 November 2010 (UTC)

Group members:

Michael Yagi
Nicolas Lessard
Julie Powers
Derek Langlois
Dustin Martin

Jeffrey Francom contacted me earlier so I know he is also still in the course. ~~Now we are only waiting on Dustin Martin.~~ Everyone has been accounted for. J powers 18:07, 15 November 2010 (UTC)

Just kicking things off. Feel free to make suggestions or change anything. --Myagi 11:36, 17 November 2010 (UTC)

Edited and filled out the critique section. Edited a little bit here and there. --Afranco2 17:41, 22 November 2010 (UTC)

Moved stuff to the front page and cleaned up references. Still waiting for people to expand if possible. Also, spellcheck ;) --Myagi 10:37, 24 November 2010 (UTC)

I worked on revising our report last night. There are still sections that have to be explained more clearly. I will work on those and try to remove our direct quotations as the Professor said that we should avoiding using them. I also noticed a few sentences were taken directly from the textbook (slight modifications). Those will need to be changed.

I have a question related to what someone wrote. Does LOOM also find race conditions (if it does where is it mentioned in the paper)? J powers 19:54, 2 December 2010 (UTC)

-I can only see references to tools that find Race Conditions, not LOOM finding actual race conditions. It seems to be focused on fixing not finding them. I don't know if that's worth mentioning or not. --Nlessard 00:37, 3 December 2010 (UTC)

I haven't done any work on the essay yet :( I'm going to proof read the essay. I will also attempt to elaborate on some of the ideas as I see fit. -Dlangloi

That is the impression I received from the paper as well Nicolas. Someone has implied that it does in our report.

Unfortunately I do not think that I can edit what I planned to tonight. I have a in-class final tomorrow morning that I need to study for instead. I have some suggestions if anyone is interested. I do not think that the definitions for evacuation, execution filters and function quiescence are clear. The sentences that are copied from the book are in the semaphore definition (Very good idea to change this before it is marked). J powers 01:26, 3 December 2010 (UTC)

Essay

Paper

The paper's title, authors, and their affiliations. Include a link to the paper and any particularly helpful supplementary information.

Title: Bypassing Races in Live Applications with Execution Filters
Authors: Jingyue Wu, Heming Cui, Junfeng Yang
Affiliations: Computer Science Department, Columbia University
Supplementary Information: Video, Slides

Background Concepts

Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.

A race condition is a system flaw that “occurs when two threads access a shared variable at the same time." Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather find it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.

LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.

The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manor.

This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:

Race Condition: "A race condition occurs when two threads access a shared variable at the same time." Race Condition
Execution Filters: Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.
Hot Patches: "Hot patching provides a mechanism to update system files without rebooting or stopping services and processes."Hot Patching
Hybrid Instrumentation Engine: "Instrumentation refers to an ability to monitor or measure the level of a product's performance, to diagnose errors and writing trace information." Instrumentation Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.
Lock: A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. "Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code." Mutex Locks
Mutex: Unable to be both true at the same time.
Semaphore: "A semaphore is a protected variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment." Semaphore

Research problem

What is the research problem being addressed by the paper? How does this problem relate to past related work?

Problem being addressed

With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.

Past related work

Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.

Contribution

What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Current solution expressed

Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary.

Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live.

LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode.

An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location.

LOOM's hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.

Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.

Why is it any better than what came before?

Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.

Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, this can sometimes be considered acceptable. However, server applications often do not have the luxury of being able to reboot because requests are coming from external sources and are expected to be processed.

While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct, all the while the vulnerability was exposed to all Mozilla users.

Flaws common to both the system update and hot patch approach are they very difficult to properly development, slow to implement, and result in potential unsafe ad hoc solutions that are not scalable. Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.

Critique

What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.

Good

The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.

The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.

Not-So-Good

One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.

The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.

References

You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.