Operating Systems 2020W Lecture 18

Video

The video for the lecture given on March 18, 2020 is now available.
Notes

Lecture 18
----------

Topics
* race condition deadlock in 3000pc-rendezvous.c
* hints for /dev/describe, Assignment 3 Q5
  - how to allow writing
  - how to store the pid
  - how to get task info
* containers
* scheduling


Yes, Assignment 3 is due on March 25th, NOT March 20th!
 (changed when the assignment was finalized)

Assigment 4 will be out by Friday, due last day of class
 (you just have to submit multiple choice answers)

Will discuss final in a bit


So the bug in 3000pc-rendezvous.c is because communication isn't
reliable
 - the consumer can send a wakeup message to the producer, and
   the producer won't get it because it wasn't waiting for the message
   (and vice versa)
 - solution is to add a timeout so messages can be re-sent
    - this is very real world
 - will try to post a better solution soon!  But use the posted version of
   3000pc-rendezvous for your assignment
 - in the meantime, your modified version should ALSO deadlock
 - (notice the fifo version is much more reliable!)
 - removing fprintf's in wait_for_consumer() and wait_for_producer() should
   make 3000pc-rendezvous.c much more reliable (but perhaps not deadlock free)
    - because the fprintf's cause system calls, which are a long enough
      delay (and a context switch) to cause a delay in the wait,
      so the wakeup message sent via the condition variable is lost
 - in practice, never assume perfectly reliable communication
   with concurrency, always plan for lost messages with timeouts

I should discuss deadlock in more detail in a later lecture!
 - remind me!

Are timeouts the only way to deal with lost messages?
 - basically, yes
   - ok, you could just send multiple, redundant messages,
     but that has its own problems
 - this is how we do it normally, especially over networks



Hints for /dev/describe
 - there is a routine for converting strings to integers, kstrtol(),
   need this to get the provided string into a PID

 - copy the read function to make a write function
   - same function signature, just make the buf constant
 - add write function to the device file ops struct
 - make the device file writable (change 0444 to 0666)
    - note the leading 0 in C means the value is octal, not decimal
    - 0x in front means hexadecimal
 - to get the uid, gid, look at how getuid, getgid work
   - getuid, getgid look weird
   - this is because they have been generalized to work with
     namespaces that are used for containers
     (e.g., containers each have their
     own range of process IDs, uids, gids, etc)
   - will explain containers in a bit
 - use task_uid(), task_euid() macros to find out the uid, gid
   associated with a task
   - still have to "munge" (transform) as getuid does
   - kernel maintains internal pid's that are different from userspace
     ones, needed to support containers
   - so before sending a pid to userspace, we have to transform it
   - on VMs you probably don't need it since we aren't using
     containers, so just try and see what works
   - e.g., try "from_kuid_munged(current_user_ns(), task_uid(task))"
     to get the uid

 - to get a task from a PID, use pid_task(pid, PIDTYPE_TGID) <-- not quite
   - will confirm after lecture

I don't expect you to understand everything with Q5
 - I definitely don't!
But this gives you some experience in looking through the Linux
kernel source and trying to figure out how things work
 - so many abstractions!

First, process groups, then containers

Process groups are based on a simple issue
 - when a user logs out, what processes should be terminated?
   - can't kill all of the processes belonging to the user,
     maybe they are logged in multiple times or
     they are running background processes that could be running for
     days or longer
 - process groups allow the system to know what processes should
   be terminated - just kill everything in the process group

Similarly, processes are groups of threads
 - so when you do things with a thread,
   you have to keep track of the other threads

tgid in kernel: "thread group ID"
 - PID of a multithreaded process
 - each thread also has its own PID
 - single threaded processes, pid=tgid

tasks/threads/processes normally greatly outnumber the number of cores
 - so they each get a "turn" on the CPU
 - this problem is the CPU scheduling problem, which we will discuss!

Why does firefox have so many threads?
 - to make it faster!
 - only one thread, runs only on one core
 - for performance, want to run on multiple cores at the same time

Note tha Chrome divides itself into multiple processes, as does firefox
 - look like threads to me?
 - but I know they are isolated for security purposes, to limit
   sharing of memory (so evil web pages don't compromise the whole
   browser)
 - not sure how exactly they are kept separate...

top lists all tgid's by default, but can be changed with the -H option (or H)

Idea of containers is I want to share the kernel between multiple userspaces
 - with virtual machines, you partition the hardware and run multiple
   kernels, each of which has its own set of processes (userspace)
 - but why run multiple kernels at the same time?  Can't one do the job?
   - that's why we have containers - multiple userlands (sets of processes,
     root filesystem, etc) all running on one kernel
 - much more efficient than virtual machines, but containers aren't
   as well separated (depends on the kernel separating things properly

The Linux kernel doesn't directly support containers
 - instead, it has multiple abstractions that can be combined to produce
   containers

Most id's can be put into namespaces
 - pid, uid, gid, etc

So now any kernel routine that works with these id's has to first
figure out which namespace (container) it belongs to
 - PID 1112, uid 1000 can mean different things depending on the container it
   is in, each has its own namespace

Container/namespace support is why credentials are crazily abstracted in the
Linux kernel now (they used to be much simpler!)

These containers are *exactly* what are used by docker, kubernetes, etc

Why aren't containers as well separated?
 - the API that is multiplexed in virtual machines is the hardware interface
    - devices, interrupts, CPU facilities
 - the API that is multiplexed with containers is the entirety of the system
   call interface
    - LOTS of system calls, with complex semantics
 - hardware interface is simpler to abstract, partition

Intro to scheduling (will continue next lecture)
 - we only have so many cores on which to run code
 - must share them, how?

One way we share: system calls
 - process is paused when it makes a system call
 - other processes or the kernel can use the core until the system call returns
   (i.e., if a process is blocked on a read, the kernel and other
   processes can use that core until the read finishes)

But what if a process isn't making many system calls?
 - calculating the digits of pi?

Then, use a timer to allow a process to only use the CPU for some time
 - e.g., one millisecond
 - kernel sets a timer for one millisecond, interrupt happens after that time
 - kernel gets called by interrupt handler, decides what should next
   run on the core

Final exam
 - open book, but will be similar to last term in format
   (being open book won't really help you)
 - timed - you download at start, upload your answers to cuLearn by end
 - randomized and targeted zoom interviews
   - to make sure you actually understood what you wrote
 - if it becomes clear you couldn't have written your answers,
   will report to dean for plagiarism

Will explain more next class, see you Friday!