Operating Systems 2022F Lecture 14

Video

Video from the lecture given on November 3, 2022 is now available:
Video is also available through Brightspace (Resources->Zoom meeting->Cloud Recordings tab)
Notes

Lecture 14
----------

Assignment 3 has not yet been released
 - hoping to post tomorrow
 - due date will be 11th at earliest, will update once assignment is posted

Tutorial 7 is *almost* ready, that will be out by tomorrow definitely
 - you can start looking at it now if you want

So tutorial due dates move with the assignment due dates
 - so you have some more time on 5 and 6

A3 is based on T5 and T6, NOT T7

Deadlock
--------
 - one class of concurrency bugs
 - see the textbook for others

What is deadlock?
 - type of situation in which progress on a computation stops because everyone is waiting on a resource that will never become available because another part of the system is holding onto it

Dining philopher's problem
 - five philosophers sitting around the table
 - have food in front of them that they need 2 utensils to eat
 - but there are only 5 utensils at the table
 - problem: how do we make sure that the philosophers eat?

What algorithm works?

Think of the basic approach:
 - grab the utensil on the left
 - grab the one on the right
 - eat
 - put utensil on the right down
 - put utensil on the left down

Why can this result in everyone starving?

If everyone tries to eat at the same time, they will all have a left fork
and will never put it down

What if we add a one second timeout?  (i.e., if you don't get the right utensil in one second you put down the left one)
 - but then you get everyone picking up and putting down the left utensil in unison, they all still starve

you can add in random timeout, that mostly works
 - but the pauses can go on for arbitrarily long

Dining philosopher's problem results in deadlock because four conditions all hold:
 - mutual exclusion (only one has a utensil at a time)
 - hold-and-wait (holding on to some resources when they don't have all that
   they need to eat)
 - no preemption (no stealing utensils)
 - circular wait (they are sitting in a circle)

All these have to hold to get deadlock, break any one of them and you can avoid it

The trick with solutions to this problem isn't to avoid deadlock, it is to avoid deadlock *efficiently*
 - minimize delays due to resource contention


Classic way to avoid deadlock issues is to just have one master lock
 - used by old versions of Linux, still in place (I think) in the standard Python runtime

But if you do this, then you minimize parallelism
 - only one lock means every time a shared resource is needed, we're down to a single thread running

In a parallel system, you want all parts of the system to keep running all the time
 - waiting on other threads/processes reduces overall performance
 - how to minimize the waiting for resources (resource contention)?

Temptation is to make LOTS of locks, one for each resource
 - but the more locks we add, the higher risk of deadlock

Simplest solution to avoid deadlock is to always allocate resources (locks) in a fixed order
 - A, B, C but never A, C, B
 - avoids circular dependence

But what in your code enforces that you grab locks always in the same order?
NOTHING

Why is there so much discussion of deadlock in concurrency theory and in OS courses?
 - easy to formalize
 - many possible algorithms

In practice, we add in timeouts and try to allocate resources in a fixed order

In Tutorial 6:
 - 3000pc-fifo uses pipes, works
 - 3000pc-rendezvous uses shared memory
    - is fast, but deadlocks
 - 3000pc-rendezvous uses shared memory, timeouts
    - not as fast because it pauses randomly, but works

Note that the shared memory versions use functions from the pthread library
(which is specially linked in).  pthread is a standard UNIX library (part of POSIX), but it is implemented differently on different systems, because system calls related to concurrency aren't standardized
 - note the clone system call, used to make threads, is not standard, it is Linux specific

Why does 3000pc-rendezvous deadlock?  What is happening?
 - I didn't realize this would happen when I made it
 - 3000pc-rendezvous-timeout was a bugfix!

So what actually happens?
 - both the producer and consumer go to sleep at the same time

Compare 3000pc-rendezvous and 3000pc-rendezvous-timeout
 - logic in 3000pc-rendezvous is much simpler, but it deadlock
 - all 3000pc-rendezvous-timeout does is adds timeouts to waiting for all locks and condition variables, because sometimes things go wrong

spinlocks are a kind of mutex which "busy waits":
 - when waiting for the lock to be available, just do while(1)
    - so CPU is "spinning" while waiting for the lock to be available
 - not so efficient, but most efficient strategy when wait should be short
 
But if the wait is going to be longer, you don't want to consume CPU time doing nothing - that just makes your CPU hot for no good reason

Instead, we want to sleep while waiting, to get woken up when the resource is available

The mutex's we are using sleep rather than busy wait

Why not always sleep?  For that we have to understand CPU scheduling

When you look at the STAT column in ps, you'll see many letter codes
 - S: sleeping
 - R: running
 - I: idle kernel thread
 - Z: zombie (terminated, waiting for wait)

What is the difference between a process sleeping and running?
 - running: it is executing instructions on a CPU core
 - sleeping: it is waiting to execute instructions, it isn't doing anything now



process executing
 - in running mode
 - on a CPU core in user mode
 - registers hold process's state

runnable process
 - could be on a CPU or could be waiting
 - is not waiting on any events, just needs to get back on the CPU

process sleep
 - in sleep mode
 - NOT on a CPU core, instead state is stored in process data structure
   in kernel
   - thus has no state in registers
 - waiting for an event to happen


Process states are just an abstraction over how the CPU is shared between processes
 - a CPU core can be executing a process OR it can be executing kernel code
 - when executing a process, it is in user mode
 - when executing the kernel, it is in supervisor mode

How does the CPU switch between running a process in user mode and the kernel in supervisor mode?

One way: the process makes a system call
 - syscall instruction causes CPU to switch to kernel mode, jump to
   system call handler routine in the kernel and execution proceeds from there

BUT, what if a process never makes a system call?
 - say if it is just crunching some data?

Interrupts are events generated by the hardware that have to be handled by the kernel
 - hardware generates interrupt
 - CPU switches to supervisor mode, handles the interrupt event

What sort of hardware generate interrupts? well, basically everything except for RAM (and maybe it does on some systems if it detects errors)
 - keyboard, mouse
 - CPU (divide by zero, syscall instructions, etc)
 - SSD
 - sound, video
 - network
 - timers  <--- very important

EVERY time you hit a key, some part of the system has to handle the keypress
 - to a first approximation, every keypress generates an interrupt

Basically when a device needs attention from the CPU, it sends the CPU an interrupt
 - the interrupt causes the CPU to stop what it is doing and run code
   registered as the designated handler for that interrupt

signals are like interrupt handlers for processes
 - but it is a metaphorical connection, a signal is an OS abstraction
   while interrupts are a feature of the hardware


Fun fact
 - on old macs (before OS X), if you held the mouse button down
   it wouldn't be able to receive any network traffic and in fact all
   running programs would pause


I've discussed here how we go from running a process to running the kernel
 - basically, interrupts

How do we get from the kernel to running a process?  that's scheduling, and
is for next time!