COMP 3000 Essay 2 2010 Question 7

From Soma-notes

Paper

Ad Hoc Synchronization Considered Harmful

Weiwei Xiong University of California, San Diego

Authors:

Soyeon Park, Jiaqi Zhang, Yuanyuan Zhou,

University of Illinois at Urbana-Champaign.

Zhiqiang Ma,

Intel.

Background Concepts

Concurrent Programming

Concurrent programming is a style of programming where multiple threads run concurrently to perform a single task. These threads have shared data structures. This style of programming is particularly useful in multi-core systems, as the application can run multiple threads in parallel. One significant challenge of concurrent programming is coordinating the different threads, usually done through synchronization primitives.

Synchronization Primitives

Synchronization primitives act as barriers to memory that prevent threads from accessing the same shared resource concurrently8 and facilitate coordination between different threads. They come in many forms such as locks, mutexes, semaphores, and condition variables.

Locks maintain access to data and limit who has access when there are multiple threads. An example of a lock is a read/write lock, which only locks when a reader obtains access.

Mutexes are mutually exclusive locks that threads employ to lock a resource that they need. No other threads can access them at that point. Once they are finished, it frees the resource for access.

Condition variables are variables that will block the thread until a certain condition is met. This allows the thread to only execute when it is safe to perform its operation.

Synchronization primitives can be misused and lead to a host of other problems generally referred to collectively as race conditions.

Race conditions, deadlocks

Race conditions occur when two or more processes have access to a shared resource, and at least one of them has a write privilege. This leads to a process modifying the data that other processes are accessing. Because of this, it is possible for processes to read data that is either inaccurate or out of date. These errors occur during run-time, and have the potential for severe side effects, such as a system failure.

a Deadlock occurs when two or more processes are waiting for a response from each other. Since both are in wait mode, neither will either receive or send a response. This results in an infinite loop, causing the application to crash.

Both of these issues are inherent to concurrent programming. Although there are no general solutions for deadlocks9, there are suitable methods for dealing with them. As for race conditions, using mutual exclusive locks and synchronization primitives can be used for prevention. However, programmers are human (for now), and therefore these two issues are still prevalent in current software.

Ad Hoc Synchronization

Ad hoc synchronizations are synchronizations engineered for a specific task, and therefore are not standardized. All ad hoc synchronizations are implemented using loops. The exit conditions for these loops are in some way affected by sync variables, which are variables that are modified by a separate process or thread. Because of the specialization inherent to ad hoc synchronization, these loops are often hard to detect.

Research Problem

As the computer industry continues to shift towards multi-core processors, the need for concurrent programming and the use of multithreaded designs has increased. Because of this, many popular applications incorporate threads in their design. However, the concepts behind concurrent programming bring with them a host of potential dangers, such as race conditions and deadlocks. These usually result from bad programming practices, for example: mismanaging threads accessing shared memory. Fortunately, there are well known and standardized methods for dealing with these problems, i.e synchronization primitives. But in real world situations, programmers commonly want to handle synchronization in a more specific way, and so implement their own "ad hoc" synchronizations that do not meet common design standards. Ad hoc synchronizations are hard to recognize, and tend not to be well documented by the designer. Because of this, standard debugging tools have a hard time identifying ad hoc synchronizations. This is a problem, because ad hoc synchronizations are highly prone to cause bugs, bringing the application to a halt.

The paper we are discussing addresses these concerns in two regards. Firstly, it details a thorough study of ad hoc synchronizations, their nature, dangers, impact on bug detection tools and prevalence in several major open-source applications. Secondly, it introduces SyncFinder, a tool designed by the authors that detects and annotates ad hoc synchronizations in C/C++ code.

With detailed analysis of ad hoc synchronization and study of their occurrences in several applications, the research ultimately concludes that they are harmful and should be removed. At the same time, SyncFinder detects and documents ad hoc synchronizations in the source code, enabling programmers to easily track and remove them.

Contribution

With concurrent programming commonly used in modern applications, many issues arise due to data accesses occurring simultaneously. In order to maintain a concurrent system, synchronization is required to ensure that the executing tasks do not interfere with each other, without causing race conditions. However, many programmers do not use proper synchronization primitives to deal with these issues. Rather, they implement synchronizations in an ad hoc fashion. The paper we are discussing shows that ad hoc synchronizations, despite implemented as a solution to concurrency issues, are undesirable in a system. This paper details the characteristics of ad hoc synchronizations, as well as the issues associated with this programming construct. The paper then goes on to introduce a tool, entitled "SyncFinder", which they have designed to identify ad hoc synchronizations in C/C++ code.

Findings

In order to identify the characteristics of ad hoc synchronizations, 12 mainstream programs were analyzed. These programs were either of server, desktop or scientific type, including Apache, MySQL and Mozilla. Through manual inspection of the source code, these facts regarding ad hoc synchronizations were discovered:

1. In all programs studied, it was found that each contained numerous ad hoc synchronizations. The number of synchronizations found ranged from 6 to 83, with server-type programs containing the most. It is proposed that programmers tend to use ad hoc synchronization for two reasons:

  • In order to ensure processes execute in proper order, programmers will use ad hoc synchronization to ensure processes wait for their turn. The reason for using ad hoc synchronization over standardized mechanisms is to fully customize the handling of this execution order.
  • Some synchronization techniques introduce heavy-weight synchronization primitives. As such, programmers will use ad hoc synchronizations in order to improve performance.

2. It is often very hard to identify instances of ad hoc synchronization. They are hard to distinguish from other computational loops and as the implementations are diverse, there is no easy way to differentiate them from harmless code. As they are difficult to detect, they will go unnoticed by anyone modifying the code base. Since they are unaware of the ad hoc loop, they may unknowingly introduce bugs. As well, due to their stealthy nature, standard debugging tools have a hard time of identifying ad hoc synchronizations.

3. It was found that ad hoc synchronizations often introduce bugs into the system, such as deadlocks or hangs. Since these are different than those caused by locks and other synchronizations, it is hard for detection tools to recognize them.

4. As they are not easily recognizable, it is hard for bug detection tools to fix issues presented by ad hoc synchronizations. In fact, it is often the case that these tools either do not find these issues or report them as false positives, as the tool is unaware of the "work arounds" put into affect by the ad hoc implementation. Since they cannot find these problems, it severely impacts the effectiveness of such tools. This also impacts analysis of performance. Synchronization is quite costly and if a tool cannot recognize the form of synchronization, a false report is generated and the programmer will not be aware. This may cause poor decisions on the part of the programmer just from the fact that ad hoc synchronizations are hard to identify.

5. The reason ad hoc synchronizations are hard to identify stems from the fact that there is no single way of implementing them. Some typical characteristics of an ad hoc synchronization include:

  • These loops can contain one or multiple exit conditions. Some or all of these exit conditions may be satisfied by remote threads while others may be satisfied locally.
  • Often, exit conditions depend on sync variables (variables that are shared with other tasks).
  • In some cases, the synchronization does not wait idly, performing other computations while checking the sync variables periodically.

Despite the dangers of using ad hoc synchronization, programmers continue to use this method. Comments added by programmers indicate that their implementations are potentially unsafe, but proceed to use ad hoc synchronization techniques anyway, due to reasons outline above. A better practice of synchronization would be to replace ad hoc synchronizations with synchronization primitives (primitives already present in standard POSIX thread libraries). However, it is often difficult to replace ad hoc synchronizations with synchronization primitives and doing this may not fulfill the concerns presented in point 1.

SyncFinder

SyncFinder is a tool built and designed by the authors of the paper, for the purpose of identifying and annotating instances of ad hoc synchronization in concurrent programs built in C/C++. The main goal of this is to aid programmers in better handling synchronization, as well as assisting debugging tools in recognizing ad hoc synchronization. It has demonstrated itself to be very effective in this area where other similar tools have failed, as it analyzes the code in a unique way that specifically tracks down sync loops that implement ad hoc synchronizations.

How it works

There are two possibilities to consider when searching for ad hoc synchronizations. You can either analyze runtime traces via a dynamic method, or analyze the source code in a static method. Both methods carry with them a number of pros and cons. While a dynamic process is generally more accurate than a static method, there is a significant overhead, affecting performance. In addition to this, the dynamic method is somewhat limited as it can only detect ad hoc synchronizations demonstrated in the test cases. Taking these factors into consideration, the authors of the paper opted to pursue a static solution for achieving the goals set out for SyncFinder. SyncFinder uses the LLVM compiler infrastructure.

SyncFinder tracks down ad hoc synchronizations by executing the following steps:

1. Find Loops: An important characteristic of ad hoc synchronizations is that they are all implemented using loops, be they “for”, “while” or “go to” loops. These are generally referred to as "sync loops". The first step in identifying sync loops is identifying each loop and obtaining the exit conditions of the loop, using the LLVM infrastructure.

2. Identify Sync Loops: The next and most important step is to differentiate between sync loops used for ad hoc synchronization and regular computational loops. It does this by going through the following steps:

  • Exit Dependent Variable (EDV) Analysis: EDVs are variables that affect the exit conditions of a loop. A sync variable is a variable related to the synchronization of concurrent programs. Therefore, by identifying any EDVs as sync variables, it can be concluded that the loop is an implementation of ad hoc synchronization.
  • Pruning Computational Loops: If a loop has at least one sync condition, it is considered a sync loop. Otherwise, it is pruned out as a computational loop.
  • Pruning Condvar Loops: condvar loops are not considered sync loops. SyncFinder will go through all loop candidates and prune out any that calls cond_wait inside the loop.

3. Synchronization Pairing: The next step is to find the remote update that would release the sync loop. SyncFinder first finds all write instructions that would modify the sync variables. It then decides if the value that the write assigns to the sync variable would satisfy the exit condition. All those that do not are pruned. SyncFinder also prunes pairings that do not execute concurrently, as they would not affect each other. This is done conservatively due to the limitations of static analysis.

4. SyncFinder Annotation: After the initial set of loops found is reduced through the above process, the remaining loops are determined to be sync loops, and are suitably annotated. Marking the source code with LLVM’s static instrumentation framework, it allows for other tools to take advantage of SyncFinder’s findings in their own analysis.

Uses

SyncFinder is a robust tool that can be utilized in a variety of applications such as bug detection, performance profiling, and concurrency testing. Using its auto-annotation feature, it is capable of identifying sections of code that demonstrate bad programming practices, which could in turn cause issues such as deadlocks. In addition to this, the authors of the paper were able to expand upon the existing data race detector “Valgrind”, modifying it to take advantage of the annotation system introduced by SyncFinder. Through this, the authors were able to reduce the number of false positives flagged by the former, while being able to make use of the information it provides.

Accuracy

SyncFinder was tested against 25 concurrent programs that are commonly used. In testing, SyncFinder was able to positively identify 96% of ad hoc synchronizations within the tested programs. False positives were at a rate of only 6%. In further tests, they were able to utilize SyncFinder’s auto-annotation systems to locate and mark 5 deadlocks and 16 potential issues within Apache, MySQL and Mozilla, that had previously been missed by other analysis tools.

Related Work and similar tools

There has been attempts to remove synchronization issues entirely from concurrent programming, such as transactional memory1, a lock-free synchronization that does not require mutexes, and avoids having to use lock, unlock operations. Other attempts have been made to remove bugs that would otherwise be safe from data races but are are still at risk of unintended effects from thread interactions, such as Atomizer2, a dynamic atomicity checker.

There are tools that detect data races such as CHESS3, a dynamic data race checker that runs through all possible thread execution paths and CTrigger4, a tool that checks for atomicity violations. The problem with these programs is that they only look for standard synchronization methods and structures, such as lock() and cond_wait(). They are not looking for ad hoc synchronizations.

A similar tool to SyncFinder exists that can detect simple spinning, also an ad hoc synchronization5, but it only detects simple spinning and not the more complicated ad hoc variations.

Several studies on bug characteristics6 and concurrency bugs7 have been composed. This paper complements these studies to better understand the nature of ad hoc synchronizations and their occurrence in concurrent programs.

Critique

Style

There is some unnecessary repetition in two sections of the essay. In the "Contributions" section, they present the findings of their study. These findings are presented for a second time in the section that covers the characteristics of ad hoc synchronizations. Instead of essentially repeating themselves in two separate sections, it would have been better to refer to the previous section as they make their point.

Evaluation

The authors of the paper chose a mix of the leading concurrent open-source software programs10 to base their study on. They were chosen to represent different uses of applications for server, desktop and scientific applications. The number of ad hoc synchronizations were determined by two authors who reviewed the source code themselves. They were both experienced with the code base, but mistakes could have been made. General conclusions would be hard to draw from the limited data set, but the study gives strong indicators of ad hoc synchronizations characteristics and their effects based on evidence from the software tested.

SyncFinder has shown promise for the future of bug detection, as it has a high success rate in a previously unsuccessful field, on average finding 96% of ad hoc synchronizations. In addition, it can be used to improve other tools, shown when it reduced Valgrind data race checker's false positives by 43%-86%. On the downside, SynchFinder produces 6% false positives.However, this number is relatively small, and the reason for false positives is due to lack of source code on library functions and incorrect pointer aliasing. Even then, a programmer can then examine the returned ad hoc synchronizations to review them for false positives.

SyncFinder uses the static approach to finding ad hoc synchronizations by analyzing source code. A dynamic approach that uses run-time traces would be more accurate, but would carry a heavier computational load and would require a thorough run through of all possible test cases. Because a static approach naturally debugs a larger amount of the code, it is overall the better choice.

Conclusion

The paper's extensive examination of the previously unstudied ad hoc synchronizations concludes that: they are prevalent in today's concurrent software, they are problematic, and they should be avoided. The basis of their study is well supported with diverse programs, but could always have used more.

Despite the fact that it used a relatively small amount of applications as the basis for the study, the fact that it was able to detect ad hoc synchronizations in software they had not previously analyzed is very impressive. Remarkable still is that it uncovered several major bugs in popular applications that had previously gone undiscovered. The fact that ad hoc synchronizations are fairly sparse (86 being the most they found), means that the false positive rate is acceptable, as it only means the manual analysis of a few dozen loops at most. The static approach (as opposed to dynamic) allows the analysis of the entire code base. With a success rate of 96%, the possible loss of precision is acceptable.

Overall, both the study regarding the harm of ad hoc synchronizations and the way in which they are detected has been enlightening. It shows both the practices involved in specialized debugging tools, as well as the importance of standardized approaches to concurrency.

References

1 M Herlihy and J.E.B. Moss, 2NA0. Transactional Memory: Architectural Support for Lock-Free Data Structures. [online] Available at: <http://www.cs.brown.edu/~mph/HerlihyM93/herlihy93transactional.pdf> [Accessed 23 November 2010].

2 C Flanagan and S N Freund, 2NA0. Atomizer: A Dynamic Atomicity Checker For Multithreaded Programs (Summary). [online] Company(optional) Available at: <http://www.cs.williams.edu/~freund/papers/atomizer-padtad.pdf> [Accessed 23 November 2010].

3 T Ball,M Musuvathi and S Qadeer, 2NA0. CHESS: A Systematic Testing Tool for Concurrent. [online] Company(optional) Available at: <http://research.microsoft.com/pubs/70509/tr-2007-149.pdf> [Accessed 23 November 2010].

4 Park, Lu and Zhou, 2009. CTrigger: Exposing Atomicity Violation Bugs from Their Hiding Places. [online] University of Illinois at Urbana Champaign, Urbana, Available at: <http://pages.cs.wisc.edu/~shanlu/paper/asplos092-zhou.pdf> [Accessed 23 November 2010].

5 LI, T., LEBECK, A. R., AND SORIN, D. J. Spin detection hardware for improved management of multithreaded systems. IEEE Transactions on Parallel and Distributed Systems PDS-17, 6 (June 2006), 508–521.

6 Z Li, L Tan, X Wang, S Lu, Y Zhou, 2006. Have things changed now?: an empirical study of bug characteristics in modern open source software. Proc. of 1st Workshop on Architectural and System Support for Improving Software Dependability p.25-33 Available through CiteSeerX: <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.138.6982> [Accessed 23 November 2010].

7 Lu, Park, Seo and Zhou, 2010. Learning from Mistakes— A Comprehensive Study on Real World Concurrency Bug Characteristics. [online] University of Illinois at Urbana Champaign, Available at: <http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.121.1203&rep=rep1&type=pdf> [Accessed 23 November 2010].

8 John H. Baldwin , 2002. Locking in the Multithreaded FreeBSD Kernel. [online] FreeBSD Available at: <http://www.usenix.org/events/bsdcon/full_papers/baldwin/baldwin_html/node5.html> [Accessed 23 November 2010].

9 Soma-notes, 2010. Basic Synchronization Principles. [online] Available at: <http://homeostasis.scs.carleton.ca/wiki/index.php/Basic_Synchronization_Principles> [Accessed 23 November 2010].

10 BuiltWith, 2010. Apache Usage Statistics. [online] Available at: <http://trends.builtwith.com/Web-Server/Apache> [Accessed 30 November 2010].