Operating Systems 2020W Lecture 18
Video
The video for the lecture given on March 18, 2020 is now available.
Notes
Lecture 18
----------
Topics
* race condition deadlock in 3000pc-rendezvous.c
* hints for /dev/describe, Assignment 3 Q5
- how to allow writing
- how to store the pid
- how to get task info
* containers
* scheduling
Yes, Assignment 3 is due on March 25th, NOT March 20th!
(changed when the assignment was finalized)
Assigment 4 will be out by Friday, due last day of class
(you just have to submit multiple choice answers)
Will discuss final in a bit
So the bug in 3000pc-rendezvous.c is because communication isn't
reliable
- the consumer can send a wakeup message to the producer, and
the producer won't get it because it wasn't waiting for the message
(and vice versa)
- solution is to add a timeout so messages can be re-sent
- this is very real world
- will try to post a better solution soon! But use the posted version of
3000pc-rendezvous for your assignment
- in the meantime, your modified version should ALSO deadlock
- (notice the fifo version is much more reliable!)
- removing fprintf's in wait_for_consumer() and wait_for_producer() should
make 3000pc-rendezvous.c much more reliable (but perhaps not deadlock free)
- because the fprintf's cause system calls, which are a long enough
delay (and a context switch) to cause a delay in the wait,
so the wakeup message sent via the condition variable is lost
- in practice, never assume perfectly reliable communication
with concurrency, always plan for lost messages with timeouts
I should discuss deadlock in more detail in a later lecture!
- remind me!
Are timeouts the only way to deal with lost messages?
- basically, yes
- ok, you could just send multiple, redundant messages,
but that has its own problems
- this is how we do it normally, especially over networks
Hints for /dev/describe
- there is a routine for converting strings to integers, kstrtol(),
need this to get the provided string into a PID
- copy the read function to make a write function
- same function signature, just make the buf constant
- add write function to the device file ops struct
- make the device file writable (change 0444 to 0666)
- note the leading 0 in C means the value is octal, not decimal
- 0x in front means hexadecimal
- to get the uid, gid, look at how getuid, getgid work
- getuid, getgid look weird
- this is because they have been generalized to work with
namespaces that are used for containers
(e.g., containers each have their
own range of process IDs, uids, gids, etc)
- will explain containers in a bit
- use task_uid(), task_euid() macros to find out the uid, gid
associated with a task
- still have to "munge" (transform) as getuid does
- kernel maintains internal pid's that are different from userspace
ones, needed to support containers
- so before sending a pid to userspace, we have to transform it
- on VMs you probably don't need it since we aren't using
containers, so just try and see what works
- e.g., try "from_kuid_munged(current_user_ns(), task_uid(task))"
to get the uid
- to get a task from a PID, use pid_task(pid, PIDTYPE_TGID) <-- not quite
- will confirm after lecture
I don't expect you to understand everything with Q5
- I definitely don't!
But this gives you some experience in looking through the Linux
kernel source and trying to figure out how things work
- so many abstractions!
First, process groups, then containers
Process groups are based on a simple issue
- when a user logs out, what processes should be terminated?
- can't kill all of the processes belonging to the user,
maybe they are logged in multiple times or
they are running background processes that could be running for
days or longer
- process groups allow the system to know what processes should
be terminated - just kill everything in the process group
Similarly, processes are groups of threads
- so when you do things with a thread,
you have to keep track of the other threads
tgid in kernel: "thread group ID"
- PID of a multithreaded process
- each thread also has its own PID
- single threaded processes, pid=tgid
tasks/threads/processes normally greatly outnumber the number of cores
- so they each get a "turn" on the CPU
- this problem is the CPU scheduling problem, which we will discuss!
Why does firefox have so many threads?
- to make it faster!
- only one thread, runs only on one core
- for performance, want to run on multiple cores at the same time
Note tha Chrome divides itself into multiple processes, as does firefox
- look like threads to me?
- but I know they are isolated for security purposes, to limit
sharing of memory (so evil web pages don't compromise the whole
browser)
- not sure how exactly they are kept separate...
top lists all tgid's by default, but can be changed with the -H option (or H)
Idea of containers is I want to share the kernel between multiple userspaces
- with virtual machines, you partition the hardware and run multiple
kernels, each of which has its own set of processes (userspace)
- but why run multiple kernels at the same time? Can't one do the job?
- that's why we have containers - multiple userlands (sets of processes,
root filesystem, etc) all running on one kernel
- much more efficient than virtual machines, but containers aren't
as well separated (depends on the kernel separating things properly
The Linux kernel doesn't directly support containers
- instead, it has multiple abstractions that can be combined to produce
containers
Most id's can be put into namespaces
- pid, uid, gid, etc
So now any kernel routine that works with these id's has to first
figure out which namespace (container) it belongs to
- PID 1112, uid 1000 can mean different things depending on the container it
is in, each has its own namespace
Container/namespace support is why credentials are crazily abstracted in the
Linux kernel now (they used to be much simpler!)
These containers are *exactly* what are used by docker, kubernetes, etc
Why aren't containers as well separated?
- the API that is multiplexed in virtual machines is the hardware interface
- devices, interrupts, CPU facilities
- the API that is multiplexed with containers is the entirety of the system
call interface
- LOTS of system calls, with complex semantics
- hardware interface is simpler to abstract, partition
Intro to scheduling (will continue next lecture)
- we only have so many cores on which to run code
- must share them, how?
One way we share: system calls
- process is paused when it makes a system call
- other processes or the kernel can use the core until the system call returns
(i.e., if a process is blocked on a read, the kernel and other
processes can use that core until the read finishes)
But what if a process isn't making many system calls?
- calculating the digits of pi?
Then, use a timer to allow a process to only use the CPU for some time
- e.g., one millisecond
- kernel sets a timer for one millisecond, interrupt happens after that time
- kernel gets called by interrupt handler, decides what should next
run on the core
Final exam
- open book, but will be similar to last term in format
(being open book won't really help you)
- timed - you download at start, upload your answers to cuLearn by end
- randomized and targeted zoom interviews
- to make sure you actually understood what you wrote
- if it becomes clear you couldn't have written your answers,
will report to dean for plagiarism
Will explain more next class, see you Friday!