Operating Systems 2020W Lecture 19

From Soma-notes

Video

The video from the lecture given on March 20, 2020 is now available.

Notes

Lecture 19
----------

Topics
* questions about exam format
* update on Assignment 4
* questions on Assignment 3
* Deadlock
* Tutorial 9
* Scheduling

Final exam
 - open book/note/internet
 - 2 hours
 - will post PDF of questions, 2 hours later you should upload solutions
   to cuLearn
 - will be available for PMs for questions during exam
 - after exam is graded, will do targeted interviews
   (randomly and targeted) to ensure exams match with your knowledge
   5-10 minutes, if you did the work on your own nothing to worry about
   will contact via discord and email
 - collaboration is NOT ALLOWED during the final


Critical sections
 - code accessing shared resource
 - if resource is accessed by other code while in a critical section
   resource can be corrupted
 - semaphores are used to protect critical sections
    - binary semaphores are so often used for this they are also known as
      mutexes (for enforcing mutual exclusion)
 - mutual exclusion - making sure only one thread/process is in the critical
   section at a time (accessing shared resource at the same time)
 - also exist higher-level ways to protect critical sections, e.g.
   Java synchronized blocks

lock in entry in 3000pc-rendezvous.c is there to enforce mutual exclusion
when modifying each entry
 - only the producer or consumer can change the word, never both
   at the same time
 - in practice, because we are doing so little work,
   very hard to catch the producer and consumer editing the word at the same
   time (possible but unlikely)
  
Deadlock: two or more processes/threads waiting on each other such that
they will wait forever (dead => no progress on computation)

mutual exclusion => prevent corruption
avoiding deadlock => preventing computation from getting stuck

Efforts to enforce mutual exclusion can result in deadlock if you aren't
careful.

Necessary conditions for deadlock
 - mutual exclusion
 - resource holding
 - no preemption
 - circular wait

Note deadlock can happen even without semaphores or shared data, just needs concurrency
 - Example: web application where web server is waiting on client to send data, while web client is waiting on the server to send data
     - well, deadlock if neither ever times out

Explaining deadlock with dining philosopher's problem
 - how do you make sure nobody starves?
 - 5 mathematicians, 5 chopsticks
 - need two chopsticks to eat
 - not going to talk to the other mathematicians about food,
   just going to talk math
 - what algorithm should they use for grabbing chopsticks?

Obvious solution:
 - grab one on the left if it is there, then grab one on the right
 - this is BAD, why?
 - if everyone grabs the left chopstick, NOBODY will eat,
   because nobody can grab the right chopstick
 - really want to grab 2 or grab 0, but how?
 - ok, do a timeout...but if fixed, can get oscillation
   - technically this is livelock I think, but clearly it is bad
 - may be simpler to just kill one mathematician
   - or get one more chopstick
   - then you have at least two chopsticks for somebody, always
 
Approaches to making sure you don't have deadlocks
 - ignore (i.e., I'll reboot when it happens)
 - prevention
    - making sure four conditions never hold, e.g, make sure
      no circular dependencies
        - e.g., always allocate resources in the same order
	- may stop deadlock but may lead to starvation (some
	  not progressing in computation)
    - or, being careful with resource allocation so it never results in
      deadlock
       (this last one is the focus of much deadlock research)
 - detection/recovery
    - timeouts, so when been waiting too long "restart" in some
      way to get back to a sane state
    
Watchdog timers are a general way to deal with deadlock
 - have a part of the system "press the red button" periodically
 - some part is guaranteed to be watching for the periodic button
   presses
 - if the button isn't pressed, reboot
 - "dead man's switch"

kernel oops is the kernel's way of dealing with memory access errors
 (what would cause a segfault in userspace)

Think of a synchonized block in Java
 - grabs semaphore at start of block
 - releases at end
 - so, only one thread can be in block at the same time
 - key advantage: you don't have to remember to grab and release
   semaphores, mistakes there is a huge source of errors

Scheduling
 - problem: how do we share the CPU between processes and threads?
 - with memory, we can just split it up between processes
 - but with the CPU, they have to take turns.  How?

The scheduling problem is
 - how to do turn taking between processes & threads
 - when to do turn taking
 
Remember the CPU runs in (at least) two modes
 - user mode: restricted access to resources
 - supervisor mode: full access to resources

Processes run in user mode, the kernel runs in supervisor mode

When we make a system call, it causes the CPU to switch into supervisor mode
and run the system call dispatcher in the kernel

Scheduling happens when the the kernel finishes with handling a system call
or with handling an interrupt
 - it happens when the kernel wants to return to userspace
 - happens on the supervisor->user mode switch
 - so, what process or thread should run on this switch?
   - it normally isn't what got us into the kernel in the first place
   - so, returning from a system call is very different from returning from
     a function

So, one big determinant of what to run is a process's priority
 - static priority, i.e. "niceness value"
 - dynamic priority

Static priorities never change, dynamic ones do all the time
 - adjusted based on recent history
 - if a process has run recently, its dynamic priority gets "lowered"
   (which means a bigger value) so it is less likely to run next
 - if a process hasn't run recently, its dynamic priority gets "increased"
   (lowered value)

We have to constantly adjust dynamic priorities because everyone deserves
some resources
 - no process should be starved of CPU time
 - should be "fair"

If you're giving out pizza to a school
 - static priority: how many slices everyone bought
 - dynamic priority: have they gotten their pizza yet?
   - even if someone bought 3 slices, and they ask for more...
     they may have already have had three slices and
     another kid only ordered one slice but hasn't gotten any

We can change static priorities from userspace
dynamic priorities are purely controlled by the kernel

What is the algorithm for going from priorities to actual
scheduling decisions?
 - COMPLICATED, and keeps changing
 - turns out scheduling in practice is a very delicate art
 - challenge: many kinds of workloads

Contrast playing a game on your own machine to a server handling
thousands of requests to a web server with a background indexer running
 - what should be give priority isn't always clear from the kernel's perspective
 - sometimes "fair" isn't what the user really wants
    - better to run the game than the background AV program!
 - gets really complicated when you take into account I/O usage as well as
   CPU time

Scheduling is kernel version specific...highly variable

Some processes are CPU bound (i.e., calculating digits of pi) while
others are I/O bound (scanning files for malware).

Some are interactive (a game) versus non-interactive (AV scanner)

I/O bound processes should normally get CPU time whenever they want it
 - most of the time, they are waiting on outside activity

But interactive programs should be prioritized
 - so users get responsive systems

But we also want high throughput
 - make programs complete their work faster

There is a cost to switching which program is running, so can't just
switch things around constantly.

No clear winner because metrics change, different users have different goals

Nowadays we care about power which makes scheduling even more complicated!