Operating Systems 2021F Lecture 21
Video
Video from the lecture given on December 2, 2021 is now available:
Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)
Notes
Lecture 21 ---------- Yes you can make your own filesystem - in fact, you can do this in userspace, see FUSE (this is how sshfs works) In the kernel, a file is a file struct - in userspace, it is a file descriptor (a small number) - note that a file struct represents an *open* file - and thus is referenced in task_struct When we look in /proc/<PID>, we are looking at task_struct's - each file/directory corresponds to parts of the task_struct System calls in the kernel are defined with special macros - SYSCALL_DEFINEX where X is a number for number of args - need this because system calls aren't invoked like regular functions on the kernel side either, need to do machine language magic - after initial entry, kernel uses regular C functions like userspace As you can see, kernel code is C code, but there's a lot and it has lots of abstractions - I get lost following it sometimes - but if you're patient you can see the patterns file descriptor is an index into an array of file structs in the kernel process IDs identify task_struct's in the kernel - the process that made a given system call is always "current" in the kernel The current position in a file is inside a file struct - that's the offset VFS = virtual file system - abstraction in the kernel for filesystems - allows for many implementations In kernel data structures, you'll see mutexes and spinlocks - logically, they are the same thing - but a spinlock busy waits, while a mutex can go to sleep/wake up A busy wait is just a loop "spinning" - typically a while loop on a condition or equivalent logically: int lock = 0; /* shared variable */ while (lock) { /* wait */ } lock = 1; /* do something assuming the condition is true */ lock = 0; Note that in the loop we're doing no useful work, just waiting - very inefficient for long period of times - but going to sleep and waking up later takes a while It is more complicated because the above code can't ever work properly on modern computers The problem is the separation between the testing (the while loop) and the setting (the assignment) - another process or thread could set lock to 1 between the end of the while loop and the assignment, - so logically two processes/threads would have the lock, but we wouldn't know it This is known as a race condition - state depends on relative speed of different concurrent executions The only way to do this safely on modern systems is using special machine language instructions - the exact ones vary from processor to processor - but all modern processors that support multiple cores have them (Yes, you can't make a mutex using standard C!) These instructions either swap two values as one atomic operation, or they do a test & set as one atomic operation - they are implemented by one core sending messages to other cores saying "hey I'm messing with this memory, nobody else touch it until I'm done" a "spinlock" is a mutex that busy waits a "mutex" is a mutex that can sleep a mutex is a semaphore that can take on a value of 1 or 0 a semaphore is just a counting variable that has exclusive access properties - will always have a coherent state when being accessed concurrently When you're doing concurrent programming, you need to guarantee mutual exclusion - "mutex" is just short for mutual exclusion How do processes and kernel threads share the CPU? When a process runs, it has exclusive access to a core - but the core is in user mode, so limited access But it only gets the core for a limited time. Its use will end when - it makes a system call - it is preempted (interrupted) The kernel sets timers (timer interrupts) to make sure it gets back the CPU - at least every .01 seconds, but can be more often What is an interrupt? - an event where a device sends the CPU a message that it must act on immediately - signals are modeled on hardware interrupts - but for processes/kernel messages, not devices When you type on a keyboard, an interrupt is send by the keyboard to the CPU - and so the CPU has to stop and acknowledge a key has been pressed before continuing its business - (in reality it is more complicated because keyboards have their own little CPU) So the kernel has to constantly switch to service interrupts from devices - especially network interfaces, that data has to be dealt with quickly or it will be lost Sometimes polling is used, depends - so then the kernel periodically asks the device for its state, rather than the device telling it it has data Nowadays we don't use polling so much because it wastes resources most of the time - polling would consume resources when you aren't typing, for example When your machine freezes, the kernel is no longer servicing interrupts - so it is REALLY messed up Keyboard death is a bad thing One quirk of linux - magic sysrq key - only works on real hardware I think? Haven't tried with VMs - you can hit alt-sysrq-X and do all kinds of things - alt-sysrq-S will sync file systems - alt-sysrq-R will instantly reboot the system (sync first otherwise some data may not have been written to disk and will be lost) So if the magic sysrq key is working, part of the kernel is still alive - if it fails, the kernel truly is dead On a linux box, it is possible for the display to freeze up but the system is still running - the X server or Wayland server has died - can still ssh in, and can restart any misbehaving processes Note that when a task loses a core it is then in another state - "blocked" - it is waiting on I/O - "sleep" - it is waiting for its next turn on a core The scheduler in the kernel is responsible for deciding what task to run next - it is a doubly linked list I think, with some additions to allow for efficient scheduling Scheduling needs to be efficient - can't spend too long deciding - but can precompute things to make it easier If you look at top, you'll see a PR and NI value - NI is "niceness", the priority the user can set, basically static priority - PR is the dynamic priority, which is constantly updated based on past behavior of the process - if it has been using a lot of CPU time, it will get a lower priority - if it was waiting on I/O, it will get a higher priority when that wait is over So the big difference with a mutex versus a spinlock is with a mutex you give up the CPU when waiting, while with a spinlock you don't. - if it is a short wait, better to hold the CPU - but otherwise better to give it up so someone else can use it When you go to sleep (give up the core you had), you have no guarantee when you'll get a core again - waits can be arbitrarily long on a busy system a core should only idle if there is nothing to do - no processes in a "ready to run" state In tutorial 8 - with 3000pc-fifo, calls to read/write can cause the process to sleep (when the kernel buffer for the pipe is full/empty) - but with 3000pc-rendezvous, we manage the buffer manually in userspace using mutexes UNIX is preemptively multitasking - so a task can always be kicked off a core Some systems are cooperatively multitasking - so a task running cannot be interrupted, it has to give up use of a core Key advantage of doing it yourself in userspace - grabbing lock successfully doesn't involve making a system call, - only need system calls when you sleep With 3000pc-fifo, putting a word in the queue always involves a system call - with rendevous, we may not need to make a system call (and hence may not need to give up the CPU) - but clearly the Linux kernel is very efficient at file I/O, including pipes, even with system call overhead