Basic Synchronization Principles: Difference between revisions
|  added part of Tackling Mutual Exclusion | m added note about cli and sti only being accessible from kernel | ||
| (11 intermediate revisions by the same user not shown) | |||
| Line 59: | Line 59: | ||
| void use_list() { | void use_list() { | ||
|      asm {        // disable interrupts |      asm {        // disable interrupts | ||
|          cli |          cli      // note: this instruction can only be executed from kernel code | ||
|      } |      } | ||
|      while (using_list) { | |||
|          asm {    // enable interrupts |          asm {    // enable interrupts, because access wasn't acquired this time | ||
|              sti |              sti  // note: this instruction can only be executed from kernel code | ||
|         } | |||
|         sleep(0);// go to sleep so that other threads will run | |||
|         asm {    // disable interrupts to test again | |||
|             cli | |||
|          } |          } | ||
|      } |      } | ||
|      using_list = 1; | |||
|     asm {        // enable interrupts, because this thread now has mutually exclusive access | |||
|         sti | |||
|      } |      } | ||
|      asm {        //  | |||
|     // use the list | |||
|     using_list = 0; | |||
| } | |||
| </pre></code> | |||
| ====Multi-Processor System==== | |||
| * For this, a special test-and-set instruction or a special swap instruction are needed. | |||
| * The operations need to be ''bus-locked'', which the processor does using its ''bus snooping'' mechanism, to make sure that ''cache coherence'' is maintained (e.g. all caches get the updated value of using_list before the operation is considered complete). | |||
| * An alternative approach to the one presented below, (which can still be implemented using test-and-set or swap), is known as ''spinlock'', which is to keep checking repeatedly until the thread gets access.  This works for cases of “low contention” i.e. when you won’t have to be checking too long, because another processor has the access, and it will only have access for a short period of time. | |||
| ''test-and-set'': sets (to 1) a bit and indicates whether it was set (1) or clear (0) before | |||
| ''swap'': one can swap a variable with the value 1, get back its previous value, and check whether it was 0 or 1 before | |||
| '''Using test-and-set:''' | |||
| <code><pre> | |||
| linked_list list; | |||
| int         using_list = 0; | |||
| void use_list() { | |||
|     asm { | |||
|     CheckForAccess: | |||
|         lock bts    using_list,0    ;bus-locked bit test-and-set on bit 0 of using_list | |||
|         jnc         GotAccess       ;got access if the bit was 0 at the beginning of the test-and-set | |||
|                                     ;previous value of bit is in carry flag, so no carry (nc) means bit was 0 before | |||
|     } | |||
|     sleep(0);                       // go to sleep so that other threads will run | |||
|     asm { | |||
|         jmp         CheckForAccess | |||
|     GotAccess: | |||
|     } | |||
|     // use the list | |||
|     using_list = 0; | |||
| } | |||
| </pre></code> | |||
| '''Using swap:''' | |||
| <code><pre> | |||
| linked_list list; | |||
| int         using_list = 0; | |||
| void use_list() { | |||
|      asm { | |||
|     CheckForAccess: | |||
|         mov         eax,1 | |||
|         lock xchg   using_list,eax  ;bus-locked exchange (swap) | |||
|         test        eax,eax         ;check if eax is 0 (i.e. check if using_list was 0) | |||
|         jz          GotAccess       ;got access if the using_list was 0 at the beginning of the exchange | |||
|     } | |||
|     sleep(0);                       // go to sleep so that other threads will run | |||
|     asm { | |||
|          jmp         CheckForAccess | |||
|     GotAccess: | |||
|      } |      } | ||
| Line 80: | Line 138: | ||
| </pre></code> | </pre></code> | ||
| ''' | Making a massively parallel system with separate memories isn’t hard, but making it work with shared memory is very difficult. | ||
| ==Abstractions and High-Level Problems== | |||
| Test-and-set and swap are quite low-level and painful to use in complex cases, so we want to wrap them in an abstraction. | |||
| ===Semaphore=== | |||
| A semaphore is the simplest abstraction.  It is a counter variable with two associated functions, ''up'' and ''down''. | |||
| The initial value of the counter variable indicates how many threads are allowed to gain access concurrently.  This initial value is 1 for a ''binary semaphore'', a.k.a. a ''mutex'', a.k.a a simple ''lock''. | |||
| ''down(sem)'': This atomically tries to decrement the counter, and if it can't (i.e. the counter was already 0), it waits until it can (i.e. until the counter is above 0 again).  down is called to gain access. | |||
| ''up(sem)'': This atomically increments the counter.  up is called to release access. | |||
| <code><pre> | |||
| linked_list list; | |||
| int         list_semaphore = 1; | |||
| void use_list() { | |||
|     down(list_semaphore);   // Gain access to the list | |||
|     // use the list | |||
|     up(list_semaphore);     // Release access to the list | |||
| } | |||
| </pre></code> | |||
| ===Producer-Consumer=== | |||
| The (Bounded-Buffer) Producer-Consumer Problem itself is more of an abstraction of problems rather than an abstraction of solutions, but its common solutions are abstractions as well.  Suppose that there is a thread (or multiple threads) producing items in a buffer, and a thread (or multiple threads) consuming items in a buffer.  An example of this is an assembly line (referring to manufacturing, not assembly language). | |||
| What needs to occur? | |||
| * The consumer needs to sleep when there is no work to be done (empty buffer), and wake up when there is once again work to be done (non-empty buffer). | |||
| * The producer needs to sleep when the buffer is full (it can't put any more in), and wake up when the buffer is no longer full (it can put more in again). | |||
| A classic bug is for either the producer or the consumer to never wake up after some point. | |||
| Modern OSs provide a simple solution to this problem, namely ''pipes''.  All of the sleeping and waking up is handled by the OS when reading/writing from/to a pipe.  All that the producer must do is write to the pipe, and all that the consumer must do is read from the pipe. | |||
| The following pipes (redirects) the output of ls to the file foo.txt: | |||
| <code>ls > foo.txt</code> | |||
| The following pipes foo.txt as the input of more: | |||
| <code>more < foo.txt</code> | |||
| The following pipes the output of ls to sort, whose output is piped to tr, whose output is piped to more: | |||
| <code>ls | sort | tr –d ‘_’ | more</code> | |||
| Why not let ls run to completion before calling sort?  What if ls never finishes?  Also, sort can be operating on a second processor while ls is still producing output. | |||
| ===Readers-Writers=== | |||
| The Readers-Writers Problem is a case in which: | |||
| * Any number of readers are allowed access concurrently, and | |||
| * Only one writer is allowed access concurrently (i.e. mutual exclusion for writers) | |||
| An example of this is airline reservation.  There is a database of seats sold, and one wants to allow as many people as possible to look at what seats are available, but only one person can buy a seat at the same time.  Then, as soon as someone buys a seat, the info needs to be updated as soon as possible to everyone. | |||
| '''A solution to problem:''' | |||
| When a writer wants to write: | |||
| * no new readers and no other writers | |||
| * wait for all readers to finish | |||
| * writer writes | |||
| * gives back access to readers and writers | |||
| '''A faster solution:''' (for cases where the data structure allows reading while writing) | |||
| When a writer wants to write: | |||
| * no other writers | |||
| * writer writes | |||
| * gives back access to writers | |||
| ===Message Passing=== | |||
| Producer-Consumer gets complicated with networking, because there are many separate Producer-Consumer relationships along the way from one process on one computer to another process on another computer.  This is a case where message passing is useful. | |||
| '''Network three-way handshake to open TCP connection:''' | |||
| {| border="1" | |||
| |- | |||
| ! Computer A | |||
| ! Messages | |||
| ! Computer B | |||
| |- | |||
| | closed | |||
| |  | |||
| | closed | |||
| |- | |||
| | half-open | |||
| | send SYN (synchronize) message → | |||
| | closed | |||
| |- | |||
| | half-open | |||
| |  | |||
| | half-open | |||
| |- | |||
| | half-open | |||
| | ← send SYN ACK (acknowledge) message | |||
| | half-open | |||
| |- | |||
| | open | |||
| |  | |||
| | half-open | |||
| |- | |||
| | open | |||
| | send ACK (acknowledge) message → | |||
| | half-open | |||
| |- | |||
| | open | |||
| |  | |||
| | open | |||
| |- | |||
| |} | |||
| This is robust (scalable to size of the internet), but slow, whereas shared memory isn’t scalable, but fast. | |||
| Still a lot of error cases to be handled: code you don’t want to mess with unless you really know what you’re doing. | |||
| ==Advanced Issues== | |||
| ===Deadlocks=== | |||
| A deadlock is when a set of threads get into a state where they all need access that other threads in the set have.  The simplest case is shown below. | |||
| <code><pre> | |||
| int semaphoreA = 1; | |||
| int semaphoreB = 1; | |||
| void thread1() { | |||
|     down(semaphoreA);   // Gain access to resource A | |||
|     down(semaphoreB);   // Gain access to resource B | |||
|     // use A and B | |||
|     up(semaphoreB);     // Release access to resource B | |||
|     up(semaphoreA);     // Release access to resource A | |||
| } | |||
| void thread2() { | |||
|     down(semaphoreB);   // Gain access to resource B | |||
|     down(semaphoreA);   // Gain access to resource A | |||
|     // use A and B | |||
|     up(semaphoreA);     // Release access to resource A | |||
|     up(semaphoreB);     // Release access to resource B | |||
| } | |||
| </pre></code> | |||
| A deadlock scenario with this example is when thread1 gets access to A, then thread2 gets access to B before thread1 gets access to B.  This means that thread1 is waiting for thread2 to release access to B, and thread2 is waiting for thread1 to release access to A, and neither will ever happen, thus the threads are said to be deadlocked. | |||
| In the case where there is a fixed set of controlled resources to be acquired by any given function, as long as all threads acquire them in the same order, a deadlock will be avoided (e.g. switch the first two lines of thread2).  However, this is not always possible in more complex situations. | |||
| It is always possible to identify deadlocks when using binary semaphores if one extends them to keep track of which thread has access and which threads are waiting for access.  Conceptually, the state of the thread interaction can be represented by a directed graph, in which the nodes are threads and there is an edge from thread1 to thread2 if thread1 wants access that thread2 has.  Then any directed cycles represent deadlocks, and the nodes in the cycle are the deadlocked threads.  Unfortunately, there is no single way to avoid deadlocks in all cases.  Also note that identifying deadlocks is not the same as predicting them, which can be as difficult as avoiding them. | |||
| ===Starvation=== | |||
| Starvation is a more general situation than deadlock, in that deadlock is a type of starvation.  Starvation refers to any situation in which one or more threads end up waiting either indefinitely long or just unreasonably long in order to perform a particular operation. | |||
| An example of starvation that is not what is normally referred to as deadlock, is when a thread gets access to a resource and then because of scheduling or other decisions (beyond its reasonable control), it can never release access to the resource (or at least it holds it for a long time). | |||
| Consider three threads (on a single-processor system): thread1 is an idle priority thread (only run when nothing else is running), and both thread2 and thread3 are medium-to-high priority threads.  Then the following could occur: | |||
| # thread1 acquires mutually exclusive access to a resource | |||
| # thread2 starts running an intense computation that could take days, so thread1 is no longer running | |||
| # thread3 tries to acquire access to the resource that thread1 has | |||
| # thread1 and thread3 don't run again until thread2 has stopped, which could be a while | |||
| This example type of starvation can (almost) always be avoided if the operating system manages the thread concurrency.  Using the same extension of a binary semaphore as mentioned in the deadlocks section, the operating system can know that thread3 is waiting for thread1, and as such, thread1 should be given the higher priority of thread3 until it releases the access, at which point it returns to its original priority.  This is still fair, because thread3 depends upon thread1 releasing access before it can continue, and the starvation issue is solved. | |||
| However, there is no general solution to starvation problems, since they can be arbitrarily complex.  (Also, the lack of a general solution for deadlocks implies that there is a lack of a general solution for starvation, since deadlocks are a type of starvation.) | |||
Latest revision as of 16:45, 21 October 2007
This article discusses approaches and issues involved with managing concurrency. The objective is to be able to do multiple things at the same time (i.e. using multiple threads), while avoiding coordination problems.
Three Concurrency Problems to Address
Synchronization
Synchronization, a.k.a. serialization, is making tasks run in a serial sequence (one at a time) instead of all at once, because of dependencies between the tasks. An example of this is that when building a house the person building basement needs to tell person building ground floor when the basement is done, so that work on the ground floor can start.
In a more complex case, it may be that parts of the ground floor are only dependent on parts of the basement, so the person building the ground floor may be able to begin work sooner. This more complex case is related to the Producer-Consumer pattern discussed below.
Mutual Exclusion
Mutual exclusion is controlling access to shared resources. Continuing with the construction example, only one person can use the same drill at the same time, or things get ugly. Likewise, things like a disk or a data structure often require a limit of one thread accessing it at a time to ensure the desired effect.
Data Sharing
Data sharing, which is closely related to mutual exclusion, is managing the duplication of data structures or devices. These duplicated resources would in turn send updates to a manager of sorts, and would be informed of updates by the manager.
An example of this is what Massively-Multiplayer Online (MMO) games do to avoid certain problems associated with network lag (the delay between sending a notification of action to the server and receiving the result of that action). If this was done with all world data processing done on the server, a player wouldn't know whether or not an opponent had been defeated until the network responds with the result. With data sharing, the client program can immediately determine, based on its copy of the local environment, that it is likely that the opponent has been defeated, and then receive either confirmation or rejection of that assumption when the server responds with its "official version" of events.
Message Passing is this way of managing duplication of state via coordinated communication between copies. Shared Memory is a way to simulate duplication of state in a way that in some circumstances, can allow concurrent access, and generally access with less overhead than message passing.
Tackling Mutual Exclusion
A Broken Attempt
linked_list list;
int         using_list = 0;
void use_list() {
    while (using_list) {
        // Just waiting here, so nothing goes in the loop
        // Could sleep, but this is a broken attempt anyway
    }
    using_list = 1;
    // use the list
    // this part of the code is known as a critical region
    using_list = 0;
}
What's the problem?
TOCTOU: Time Of Check to Time Of Use
If the thread went to sleep immediately before running the “using_list = 1;” line, another thread might also get access to the list, since using_list is still zero.
Fixing the Broken Attempt
Atomic operation: an operation that cannot be “split apart”
One machine instruction can be made atomic easily, but any more are not guaranteed to be atomic in hardware. As such, special machine instructions are needed to handle this.
Single-Processor System
- One can simply disable interrupts while acquiring access, since then no threads can jump in between the checking of using_list and the setting of using_list.
- Interrupts should only be disabled for so long, because devices depend on the system reacting to interrupts.
linked_list list;
int         using_list = 0;
void use_list() {
    asm {        // disable interrupts
        cli      // note: this instruction can only be executed from kernel code
    }
    while (using_list) {
        asm {    // enable interrupts, because access wasn't acquired this time
            sti  // note: this instruction can only be executed from kernel code
        }
        sleep(0);// go to sleep so that other threads will run
        asm {    // disable interrupts to test again
            cli
        }
    }
    using_list = 1;
    asm {        // enable interrupts, because this thread now has mutually exclusive access
        sti
    }
    // use the list
    using_list = 0;
}
Multi-Processor System
- For this, a special test-and-set instruction or a special swap instruction are needed.
- The operations need to be bus-locked, which the processor does using its bus snooping mechanism, to make sure that cache coherence is maintained (e.g. all caches get the updated value of using_list before the operation is considered complete).
- An alternative approach to the one presented below, (which can still be implemented using test-and-set or swap), is known as spinlock, which is to keep checking repeatedly until the thread gets access. This works for cases of “low contention” i.e. when you won’t have to be checking too long, because another processor has the access, and it will only have access for a short period of time.
test-and-set: sets (to 1) a bit and indicates whether it was set (1) or clear (0) before swap: one can swap a variable with the value 1, get back its previous value, and check whether it was 0 or 1 before
Using test-and-set:
linked_list list;
int         using_list = 0;
void use_list() {
    asm {
    CheckForAccess:
        lock bts    using_list,0    ;bus-locked bit test-and-set on bit 0 of using_list
        jnc         GotAccess       ;got access if the bit was 0 at the beginning of the test-and-set
                                    ;previous value of bit is in carry flag, so no carry (nc) means bit was 0 before
    }
    sleep(0);                       // go to sleep so that other threads will run
    asm {
        jmp         CheckForAccess
    GotAccess:
    }
    // use the list
    using_list = 0;
}
Using swap:
linked_list list;
int         using_list = 0;
void use_list() {
    asm {
    CheckForAccess:
        mov         eax,1
        lock xchg   using_list,eax  ;bus-locked exchange (swap)
        test        eax,eax         ;check if eax is 0 (i.e. check if using_list was 0)
        jz          GotAccess       ;got access if the using_list was 0 at the beginning of the exchange
    }
    sleep(0);                       // go to sleep so that other threads will run
    asm {
        jmp         CheckForAccess
    GotAccess:
    }
    // use the list
    using_list = 0;
}
Making a massively parallel system with separate memories isn’t hard, but making it work with shared memory is very difficult.
Abstractions and High-Level Problems
Test-and-set and swap are quite low-level and painful to use in complex cases, so we want to wrap them in an abstraction.
Semaphore
A semaphore is the simplest abstraction. It is a counter variable with two associated functions, up and down.
The initial value of the counter variable indicates how many threads are allowed to gain access concurrently. This initial value is 1 for a binary semaphore, a.k.a. a mutex, a.k.a a simple lock.
down(sem): This atomically tries to decrement the counter, and if it can't (i.e. the counter was already 0), it waits until it can (i.e. until the counter is above 0 again). down is called to gain access. up(sem): This atomically increments the counter. up is called to release access.
linked_list list;
int         list_semaphore = 1;
void use_list() {
    down(list_semaphore);   // Gain access to the list
    // use the list
    up(list_semaphore);     // Release access to the list
}
Producer-Consumer
The (Bounded-Buffer) Producer-Consumer Problem itself is more of an abstraction of problems rather than an abstraction of solutions, but its common solutions are abstractions as well. Suppose that there is a thread (or multiple threads) producing items in a buffer, and a thread (or multiple threads) consuming items in a buffer. An example of this is an assembly line (referring to manufacturing, not assembly language).
What needs to occur?
- The consumer needs to sleep when there is no work to be done (empty buffer), and wake up when there is once again work to be done (non-empty buffer).
- The producer needs to sleep when the buffer is full (it can't put any more in), and wake up when the buffer is no longer full (it can put more in again).
A classic bug is for either the producer or the consumer to never wake up after some point.
Modern OSs provide a simple solution to this problem, namely pipes. All of the sleeping and waking up is handled by the OS when reading/writing from/to a pipe. All that the producer must do is write to the pipe, and all that the consumer must do is read from the pipe.
The following pipes (redirects) the output of ls to the file foo.txt:
ls > foo.txt
The following pipes foo.txt as the input of more:
more < foo.txt
The following pipes the output of ls to sort, whose output is piped to tr, whose output is piped to more:
ls | sort | tr –d ‘_’ | more
Why not let ls run to completion before calling sort? What if ls never finishes? Also, sort can be operating on a second processor while ls is still producing output.
Readers-Writers
The Readers-Writers Problem is a case in which:
- Any number of readers are allowed access concurrently, and
- Only one writer is allowed access concurrently (i.e. mutual exclusion for writers)
An example of this is airline reservation. There is a database of seats sold, and one wants to allow as many people as possible to look at what seats are available, but only one person can buy a seat at the same time. Then, as soon as someone buys a seat, the info needs to be updated as soon as possible to everyone.
A solution to problem:
When a writer wants to write:
- no new readers and no other writers
- wait for all readers to finish
- writer writes
- gives back access to readers and writers
A faster solution: (for cases where the data structure allows reading while writing)
When a writer wants to write:
- no other writers
- writer writes
- gives back access to writers
Message Passing
Producer-Consumer gets complicated with networking, because there are many separate Producer-Consumer relationships along the way from one process on one computer to another process on another computer. This is a case where message passing is useful.
Network three-way handshake to open TCP connection:
| Computer A | Messages | Computer B | 
|---|---|---|
| closed | closed | |
| half-open | send SYN (synchronize) message → | closed | 
| half-open | half-open | |
| half-open | ← send SYN ACK (acknowledge) message | half-open | 
| open | half-open | |
| open | send ACK (acknowledge) message → | half-open | 
| open | open | 
This is robust (scalable to size of the internet), but slow, whereas shared memory isn’t scalable, but fast. Still a lot of error cases to be handled: code you don’t want to mess with unless you really know what you’re doing.
Advanced Issues
Deadlocks
A deadlock is when a set of threads get into a state where they all need access that other threads in the set have. The simplest case is shown below.
int semaphoreA = 1;
int semaphoreB = 1;
void thread1() {
    down(semaphoreA);   // Gain access to resource A
    down(semaphoreB);   // Gain access to resource B
    // use A and B
    up(semaphoreB);     // Release access to resource B
    up(semaphoreA);     // Release access to resource A
}
void thread2() {
    down(semaphoreB);   // Gain access to resource B
    down(semaphoreA);   // Gain access to resource A
    // use A and B
    up(semaphoreA);     // Release access to resource A
    up(semaphoreB);     // Release access to resource B
}
A deadlock scenario with this example is when thread1 gets access to A, then thread2 gets access to B before thread1 gets access to B. This means that thread1 is waiting for thread2 to release access to B, and thread2 is waiting for thread1 to release access to A, and neither will ever happen, thus the threads are said to be deadlocked.
In the case where there is a fixed set of controlled resources to be acquired by any given function, as long as all threads acquire them in the same order, a deadlock will be avoided (e.g. switch the first two lines of thread2). However, this is not always possible in more complex situations.
It is always possible to identify deadlocks when using binary semaphores if one extends them to keep track of which thread has access and which threads are waiting for access. Conceptually, the state of the thread interaction can be represented by a directed graph, in which the nodes are threads and there is an edge from thread1 to thread2 if thread1 wants access that thread2 has. Then any directed cycles represent deadlocks, and the nodes in the cycle are the deadlocked threads. Unfortunately, there is no single way to avoid deadlocks in all cases. Also note that identifying deadlocks is not the same as predicting them, which can be as difficult as avoiding them.
Starvation
Starvation is a more general situation than deadlock, in that deadlock is a type of starvation. Starvation refers to any situation in which one or more threads end up waiting either indefinitely long or just unreasonably long in order to perform a particular operation.
An example of starvation that is not what is normally referred to as deadlock, is when a thread gets access to a resource and then because of scheduling or other decisions (beyond its reasonable control), it can never release access to the resource (or at least it holds it for a long time).
Consider three threads (on a single-processor system): thread1 is an idle priority thread (only run when nothing else is running), and both thread2 and thread3 are medium-to-high priority threads. Then the following could occur:
- thread1 acquires mutually exclusive access to a resource
- thread2 starts running an intense computation that could take days, so thread1 is no longer running
- thread3 tries to acquire access to the resource that thread1 has
- thread1 and thread3 don't run again until thread2 has stopped, which could be a while
This example type of starvation can (almost) always be avoided if the operating system manages the thread concurrency. Using the same extension of a binary semaphore as mentioned in the deadlocks section, the operating system can know that thread3 is waiting for thread1, and as such, thread1 should be given the higher priority of thread3 until it releases the access, at which point it returns to its original priority. This is still fair, because thread3 depends upon thread1 releasing access before it can continue, and the starvation issue is solved.
However, there is no general solution to starvation problems, since they can be arbitrarily complex. (Also, the lack of a general solution for deadlocks implies that there is a lack of a general solution for starvation, since deadlocks are a type of starvation.)