Operating Systems 2021F Lecture 19
Video
Video from the lecture given on November 23, 2021 is now available:
Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)
Notes
Lecture 19 ---------- - plan for rest of term - kernel modules, T7 - concurrency, T8 Tutorial 8 is about the producer consumer problem - classic problem in concurrency (one of the simpler ones) When you think producer/consumer, think pipes (FIFOs) If we run something like cat /dev/urandom | less Note that the cat command doesn't keep running; it is paused when less stops asking for input - standard out from cat is going to standard in for less, via an anonymous pipe - the left side of the pipe pauses when the right side isn't actively reading (consumer pauses when producer doesn't need more input) - similarly, the consumer will pause when there is no output from the producer - note there is a buffer between the producer and consumer (as part of the pipe) - so whether it is full or empty governs whether the producer or consumer is going to be paused Basic structure of a producer consumer problem ---------- producer <-> | buffer | <-> consumer ---------- Buffer has many slots, think of it as a circular array - producer writes entries to buffer - consumer reads and removes entries from buffer Producer will sleep when buffer is full - and will wait for consumer to wake it up Consumer will sleep when buffer is empty - and will wait for producer to wake it up Note that if producer and consumer sleep at the same time, the programs deadlock and no progress will ever be made again - unless there are timeouts This can happen because of a failure in communication, generally due to timing - producer and consumer sleep "at the same time" In above example, cat is the producer (it writes to standard out) and less is the consumer (it reads from standard in) But if the buffer isn't full and isn't empty - both producer and consumer run at the same time To maximize throughput, you ideally never have the producer or consumer go to sleep - but not possible if they don't work at the same rate So the buffer and the sleep/wake mechanisms ensure that progress continues to be made at the maximum rate possible Recall what is | doing? - a pipe system call - you get two file descriptors connected together, so writes from one become reads for the other - buffer is implemented in the kernel to coordinate reads and writes Note in 3000pc-fifo, the buffer is in the kernel - we don't control it directly - so we don't see its size - The only shared resource between the processes is the pipe But, 3000pc-rendevous* uses a shared buffer - note that this is two processes sharing a portion of memory using mmap - could get the same thing using multiple threads in one process but this is safer and should be as performant T8 is really an example of why multithreaded programming is no fun - and generally not worth the effort Note that 3000pc-rendezvous is broken - 3000pc-rendevous-timeout is the fixed version Specifically, 3000pc-rendezvous can deadlock - even though it seems to use semaphores correctly - but on a multicore system, it isn't enough What is a kernel module? - inserting code into the Linux kernel - lsmod - see the modules currently loaded (where do you think it gets its info?) In general, monolithic kernels allow for code to be added to them - most commonly, for device drivers Linux kernel modules are used for device drivers, but are also used for many other things (e.g., anything that might not always be needed) Note that kernel modules run in supervisor mode on the CPU - same as all other kernel code - so, can do anything that the kernel can do - strictly more powerful than code running as root in a process A process with root privileges is still executing in user mode on the CPU - just when it makes system calls, the kernel will likely always say "yes" - still has to follow the system call interface, can't just mess with kernel data structures - unless it decides to load a kernel module! Currently the Linux kernel is written in C - but some people are trying to get parts written in Rust - note that C was designed to be a portable assembler userspace - code running in user mode on the CPU, processes kernelspace - code running in supervisor mode on the CPU, kernel code (including modules) When you make a kernel module, note that you can't include standard C library headers, only kernel headers. Why? - kernel code can't make system calls directly - it implements system calls, so can't depend on them - a system call is a userspace -> kernel space switch, but in kernel code we're already in kernel space running in supervisor mode on the CPU - kernel *can* make function calls, but it gets weird because system call code assumes it is working on behalf of a specific process - virtually all regular libraries depend on system calls - but for anything you need the Linux kernel has something equivalent (but it may have a very different interface) When you "print" in the kernel, there is no standard out or standard error. So where does it go? - kernel log, can be seen in /var/log/kern.log or dmesg - messages are written using printk() or a macro that turns into printk() If /var/log/kern.log gets too big, you can just delete it - but you'll have to reboot to get the space back - because it will still be written to, and as long as a file is open its inode refcount won't be zero Note you'll get "tainting" messages when you load the class kernel modules - this is because they aren't digitally signed with an authorized key - on many desktop Linux systems, they run in "lockdown" and won't allow any unauthorized modules to be loaded So what about eBPF-based tools? - eBPF loads code into the kernel, but much more safely - restrictions on execution - verifier checks byte code before JIT compiling bytecode and inserting it into the kernel There are no checks on regular kernel modules beyond code signing and basic sanity checking That's how eBPF can see so much - it is running in the kernel! ptrace-based programs can't see so much - they are just using the ptrace system call to observe & manipulate one other process - this system call can mess with the observed process, hence it isn't suitable for production generally Kernel modules can easily corrupt the kernel. With eBPF, it is almost impossible to mess things up - eBPF isn't even Turing complete (no unbounded loops) .bt code is for bpftrace, it generates eBPF bytecodes that are inserted into the kernel - eBPF is a machine-code like byte code - so portable across CPUs - highly restricted, but by formal verification, not a virtual machine