Operating Systems 2021F Lecture 21

From Soma-notes

Video

Video from the lecture given on December 2, 2021 is now available:

Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)

Notes

Lecture 21
----------
Yes you can make your own filesystem
 - in fact, you can do this in userspace, see FUSE
   (this is how sshfs works)

In the kernel, a file is a file struct
 - in userspace, it is a file descriptor (a small number)
 - note that a file struct represents an *open* file
   - and thus is referenced in task_struct

When we look in /proc/<PID>, we are looking at task_struct's
 - each file/directory corresponds to parts of the task_struct

System calls in the kernel are defined with special macros
 - SYSCALL_DEFINEX  where X is a number for number of args
 - need this because system calls aren't invoked like regular functions on the kernel side either, need to do machine language magic
   - after initial entry, kernel uses regular C functions like
     userspace

As you can see, kernel code is C code, but there's a lot and
it has lots of abstractions
 - I get lost following it sometimes
 - but if you're patient you can see the patterns

file descriptor is an index into an array of file structs in the kernel

process IDs identify task_struct's in the kernel
 - the process that made a given system call is always "current"
   in the kernel

The current position in a file is inside a file struct
 - that's the offset

VFS = virtual file system
 - abstraction in the kernel for filesystems
 - allows for many implementations

In kernel data structures, you'll see mutexes and spinlocks
 - logically, they are the same thing
 - but a spinlock busy waits, while a mutex can go to sleep/wake up


A busy wait is just a loop "spinning"
 - typically a while loop on a condition or equivalent
   logically:

   int lock = 0;  /* shared variable */

   while (lock) {
   	       /* wait */
   }
   lock = 1;
   /* do something assuming the condition is true */
   lock = 0;

   Note that in the loop we're doing no useful work,
   just waiting
      - very inefficient for long period of times
      - but going to sleep and waking up later takes a while

It is more complicated because the above code can't ever work properly on modern computers

The problem is the separation between the testing (the while loop) and the setting (the assignment)
 - another process or thread could set lock to 1 between the end
   of the while loop and the assignment,
     - so logically two processes/threads would have the lock,
       but we wouldn't know it

This is known as a race condition
  - state depends on relative speed of different concurrent executions

The only way to do this safely on modern systems is using special machine language instructions
 - the exact ones vary from processor to processor
 - but all modern processors that support multiple cores have them

(Yes, you can't make a mutex using standard C!)

These instructions either swap two values as one atomic operation,
or they do a test & set as one atomic operation
 - they are implemented by one core sending messages to other cores
   saying "hey I'm messing with this memory, nobody else touch it
   until I'm done"


a "spinlock" is a mutex that busy waits
a "mutex" is a mutex that can sleep

a mutex is a semaphore that can take on a value of 1 or 0

a semaphore is just a counting variable that has exclusive access properties
 - will always have a coherent state when being accessed concurrently

When you're doing concurrent programming, you need to guarantee mutual exclusion
 - "mutex" is just short for mutual exclusion

How do processes and kernel threads share the CPU?

When a process runs, it has exclusive access to a core
 - but the core is in user mode, so limited access

But it only gets the core for a limited time.  Its use will end when
 - it makes a system call
 - it is preempted (interrupted)

The kernel sets timers (timer interrupts) to make sure it gets back the CPU
 - at least every .01 seconds, but can be more often

What is an interrupt?
 - an event where a device sends the CPU a message that
   it must act on immediately
 - signals are modeled on hardware interrupts
    - but for processes/kernel messages, not devices

When you type on a keyboard, an interrupt is send by the keyboard to the CPU
 - and so the CPU has to stop and acknowledge a key has been pressed before continuing its business
 - (in reality it is more complicated because keyboards have their
    own little CPU)

So the kernel has to constantly switch to service interrupts from devices
 - especially network interfaces, that data has to be dealt with quickly or it will be lost

Sometimes polling is used, depends
 - so then the kernel periodically asks the device for its state,
   rather than the device telling it it has data

Nowadays we don't use polling so much because it wastes resources
most of the time
 - polling would consume resources when you aren't typing, for example

When your machine freezes, the kernel is no longer servicing interrupts
 - so it is REALLY messed up

Keyboard death is a bad thing

One quirk of linux - magic sysrq key
 - only works on real hardware I think?  Haven't tried with VMs
 - you can hit alt-sysrq-X and do all kinds of things
    - alt-sysrq-S will sync file systems
    - alt-sysrq-R will instantly reboot the system
       (sync first otherwise some data may not have been
        written to disk and will be lost)

So if the magic sysrq key is working, part of the kernel is still alive
 - if it fails, the kernel truly is dead

On a linux box, it is possible for the display to freeze up but the system is still running
 - the X server or Wayland server has died
 - can still ssh in, and can restart any misbehaving processes

Note that when a task loses a core it is then in another state
 - "blocked" - it is waiting on I/O
 - "sleep" - it is waiting for its next turn on a core


The scheduler in the kernel is responsible for deciding what task to run next
 - it is a doubly linked list I think, with some additions to allow for efficient scheduling

Scheduling needs to be efficient
 - can't spend too long deciding
 - but can precompute things to make it easier

If you look at top, you'll see a PR and NI value
 - NI is "niceness", the priority the user can set,
   basically static priority
 - PR is the dynamic priority, which is constantly updated
   based on past behavior of the process
     - if it has been using a lot of CPU time, it
       will get a lower priority
     - if it was waiting on I/O, it will get a higher
       priority when that wait is over

So the big difference with a mutex versus a spinlock is with a mutex you give up the CPU when waiting, while with a spinlock you don't.
 - if it is a short wait, better to hold the CPU
 - but otherwise better to give it up so someone else can use it

When you go to sleep (give up the core you had), you have no guarantee when you'll get a core again
 - waits can be arbitrarily long on a busy system

a core should only idle if there is nothing to do
 - no processes in a "ready to run" state

In tutorial 8
 - with 3000pc-fifo, calls to read/write can cause the process to sleep (when the kernel buffer for the pipe is full/empty)
 - but with 3000pc-rendezvous, we manage the buffer manually in
   userspace using mutexes

UNIX is preemptively multitasking
 - so a task can always be kicked off a core

Some systems are cooperatively multitasking
 - so a task running cannot be interrupted, it has to give
   up use of a core

Key advantage of doing it yourself in userspace
 - grabbing lock successfully doesn't involve making a system call,
 - only need system calls when you sleep

With 3000pc-fifo, putting a word in the queue always involves a system call
 - with rendevous, we may not need to make a system call
   (and hence may not need to give up the CPU)
 - but clearly the Linux kernel is very efficient at file I/O,
   including pipes, even with system call overhead