Operating Systems 2020W Lecture 18
Video
The video for the lecture given on March 18, 2020 is now available.
Notes
Lecture 18 ---------- Topics * race condition deadlock in 3000pc-rendezvous.c * hints for /dev/describe, Assignment 3 Q5 - how to allow writing - how to store the pid - how to get task info * containers * scheduling Yes, Assignment 3 is due on March 25th, NOT March 20th! (changed when the assignment was finalized) Assigment 4 will be out by Friday, due last day of class (you just have to submit multiple choice answers) Will discuss final in a bit So the bug in 3000pc-rendezvous.c is because communication isn't reliable - the consumer can send a wakeup message to the producer, and the producer won't get it because it wasn't waiting for the message (and vice versa) - solution is to add a timeout so messages can be re-sent - this is very real world - will try to post a better solution soon! But use the posted version of 3000pc-rendezvous for your assignment - in the meantime, your modified version should ALSO deadlock - (notice the fifo version is much more reliable!) - removing fprintf's in wait_for_consumer() and wait_for_producer() should make 3000pc-rendezvous.c much more reliable (but perhaps not deadlock free) - because the fprintf's cause system calls, which are a long enough delay (and a context switch) to cause a delay in the wait, so the wakeup message sent via the condition variable is lost - in practice, never assume perfectly reliable communication with concurrency, always plan for lost messages with timeouts I should discuss deadlock in more detail in a later lecture! - remind me! Are timeouts the only way to deal with lost messages? - basically, yes - ok, you could just send multiple, redundant messages, but that has its own problems - this is how we do it normally, especially over networks Hints for /dev/describe - there is a routine for converting strings to integers, kstrtol(), need this to get the provided string into a PID - copy the read function to make a write function - same function signature, just make the buf constant - add write function to the device file ops struct - make the device file writable (change 0444 to 0666) - note the leading 0 in C means the value is octal, not decimal - 0x in front means hexadecimal - to get the uid, gid, look at how getuid, getgid work - getuid, getgid look weird - this is because they have been generalized to work with namespaces that are used for containers (e.g., containers each have their own range of process IDs, uids, gids, etc) - will explain containers in a bit - use task_uid(), task_euid() macros to find out the uid, gid associated with a task - still have to "munge" (transform) as getuid does - kernel maintains internal pid's that are different from userspace ones, needed to support containers - so before sending a pid to userspace, we have to transform it - on VMs you probably don't need it since we aren't using containers, so just try and see what works - e.g., try "from_kuid_munged(current_user_ns(), task_uid(task))" to get the uid - to get a task from a PID, use pid_task(pid, PIDTYPE_TGID) <-- not quite - will confirm after lecture I don't expect you to understand everything with Q5 - I definitely don't! But this gives you some experience in looking through the Linux kernel source and trying to figure out how things work - so many abstractions! First, process groups, then containers Process groups are based on a simple issue - when a user logs out, what processes should be terminated? - can't kill all of the processes belonging to the user, maybe they are logged in multiple times or they are running background processes that could be running for days or longer - process groups allow the system to know what processes should be terminated - just kill everything in the process group Similarly, processes are groups of threads - so when you do things with a thread, you have to keep track of the other threads tgid in kernel: "thread group ID" - PID of a multithreaded process - each thread also has its own PID - single threaded processes, pid=tgid tasks/threads/processes normally greatly outnumber the number of cores - so they each get a "turn" on the CPU - this problem is the CPU scheduling problem, which we will discuss! Why does firefox have so many threads? - to make it faster! - only one thread, runs only on one core - for performance, want to run on multiple cores at the same time Note tha Chrome divides itself into multiple processes, as does firefox - look like threads to me? - but I know they are isolated for security purposes, to limit sharing of memory (so evil web pages don't compromise the whole browser) - not sure how exactly they are kept separate... top lists all tgid's by default, but can be changed with the -H option (or H) Idea of containers is I want to share the kernel between multiple userspaces - with virtual machines, you partition the hardware and run multiple kernels, each of which has its own set of processes (userspace) - but why run multiple kernels at the same time? Can't one do the job? - that's why we have containers - multiple userlands (sets of processes, root filesystem, etc) all running on one kernel - much more efficient than virtual machines, but containers aren't as well separated (depends on the kernel separating things properly The Linux kernel doesn't directly support containers - instead, it has multiple abstractions that can be combined to produce containers Most id's can be put into namespaces - pid, uid, gid, etc So now any kernel routine that works with these id's has to first figure out which namespace (container) it belongs to - PID 1112, uid 1000 can mean different things depending on the container it is in, each has its own namespace Container/namespace support is why credentials are crazily abstracted in the Linux kernel now (they used to be much simpler!) These containers are *exactly* what are used by docker, kubernetes, etc Why aren't containers as well separated? - the API that is multiplexed in virtual machines is the hardware interface - devices, interrupts, CPU facilities - the API that is multiplexed with containers is the entirety of the system call interface - LOTS of system calls, with complex semantics - hardware interface is simpler to abstract, partition Intro to scheduling (will continue next lecture) - we only have so many cores on which to run code - must share them, how? One way we share: system calls - process is paused when it makes a system call - other processes or the kernel can use the core until the system call returns (i.e., if a process is blocked on a read, the kernel and other processes can use that core until the read finishes) But what if a process isn't making many system calls? - calculating the digits of pi? Then, use a timer to allow a process to only use the CPU for some time - e.g., one millisecond - kernel sets a timer for one millisecond, interrupt happens after that time - kernel gets called by interrupt handler, decides what should next run on the core Final exam - open book, but will be similar to last term in format (being open book won't really help you) - timed - you download at start, upload your answers to cuLearn by end - randomized and targeted zoom interviews - to make sure you actually understood what you wrote - if it becomes clear you couldn't have written your answers, will report to dean for plagiarism Will explain more next class, see you Friday!