Operating Systems 2022F Lecture 16

From Soma-notes

Video

Video from the lecture given on November 10, 2022 is now available:

Video is also available through Brightspace (Resources->Zoom meeting->Cloud Recordings tab)

Notes

Lecture 16
----------
 - scheduling
 - eBPF

For Tutorial 7, the detailed instructions about the state of the VM are outdated, I should delete that
 - the VM you have should work fine

For Tutorial 7, you should understand what eBPF is.  So I will try to explain now!

Remember we have userspace and kernelspace
userspace: processes (i.e., ls, top, emacs, ssh, etc)
kernelspace: the code that handles system calls, interacts with the hardware

While in userspace, the CPU is in user mode <-- limited access
While in kernelspace, the CPU is in supervisor mode <-- full access

Note that even processes running as root have limited access
 - but the kernel will do almost anything for a process running as root


The code in the kernel primarily comes in two forms:
 - the kernel image loaded at boot (/boot/vmlinuz...)
 - kernel modules (/lib/modules/<kernel version>)

Kernel modules are really like dynamically linked libraries, but for the kernel
 (literally they are .ko's, kernel object files)

Note that kernel modules can depend on other kernel modules
 - when we run "lsmod" you can see all the loaded modules and the other modules they depend on

Only a small fraction of available modules are loaded into any given Linux system at any time.

Like a dynamically linked library, code in a kernel module is linked into the kernel, so it has full access
 - modules can do anything the kernel can, it is just more kernel code

And device drivers need low level access, so they are either compiled into the kernel or are loaded as kernel modules
 - a "statically linked" module is just built into the kernel image file

So what happens if a module (driver) is buggy?
 - what is a "segfault" for a kernel module?

Accessing memory improperly in the kernel will result in a "kernel oops" if you are lucky, or a kernel panic if you aren't (kernel crash, system freeze)
 - a kernel panic on Windows is a "blue screen of death"
 - an oops dumps a record of what happened to the kernel log and then continues.  But after an oops, the kernel may become unstable because who knows what has been corrupted, best to reboot ASAP

You can run a debugger on a kernel
 - but a kernel debugger has to run on another computer
 - easy to do if you're running the kernel in a virtual machine,
 - but if you're on bare hardware you have to set up another computer
   to run the debugger (typically connected via a serial line)

A "safe mode" for an OS can do many things, but typically it limits the device drivers being loaded and uses a configuration that is likely to work but may not enable all features
 - "safe mode" is a goal, not a specific set of technologies

On Linux, we can generally get a kernel log
 - standard location is /var/log/kern.log

Because the cost of mistakes are so high, you really don't want to add code to the kernel if you can avoid it
 - better to stick to userspace

But wouldn't it be nice to be able to add code to the kernel *safely*?

Why would you want to add code to the kernel in the first place, if you aren't writing a driver?
 - well, the kernel sees everything, so if you want to do system monitoring, the kernel is the one to do it
 - potentially also security (observe and stop bad things)

But how could you do this safely?

eBPF is a technology for safely adding code to a running kernel

Key ideas
 - NOT Turing complete, no unbounded loops
 - no unrestricted memory access (no pointer arithmetic, all must
   be known at compile time)

So eBPF has a verifier that checks code before it is loaded.  Only if the code passes is the code linked into the kernel

There are lots of things you might want to do that are technically safe but the eBPF verifier can't be sure it is safe, so it will refuse to load it
 - biggest challenge in writing eBPF code is getting past the verifier


So there are two paths to loading code into the kernel dynamically
 - modules (basically no checks, except digital signatures sometimes)
 - eBPF (lots of checks)

modules contain processor-specific machine code (e.g., x86-64 machine code), just like regular program binaries

eBPF code is in a special byte code for the eBPF virtual machine
 - after the code is verified it is just-in-time compiled into machine code

Could you still do bad things with eBPF?  YES.  After all, you're getting
full visibility into the kernel and you can even change a few things
 - thus only root can load eBPF code
 - and of course only root can load kernel modules

The Linux kernel is written in C & assembly currently
 - but there are early efforts to add Rust support

eBPF is typically also written in C
 - but there are other high-level languages that compile down into eBPF bytecode


So why the name eBPF?
 - stands for enhanced Berkeley Packet Filter

So eBPF is an enhanced version of BPF
 - BPF was created to do...packet filtering!
 - so why would we want to add code to the kernel to do packet filtering?
   (what is packet filtering)

Packet, we mean network packets (IP packets for TCP/IP)

Packets come into the kernel from network interface devices (ethernet, wifi) and are often sent out to another network or are routed to one or more processes

What if you wanted to make a program to observe network traffic, route it to different places, and maybe even change or drop it as it comes in?
 - yes, you can implement a network firewall this way

With a standard architecture, to do the above you'd have to route all network traffic to a process, it would do some work, then it would pass the packets back to the kernel.
 - so you have a kernel->userspace->kernel transition potentially for every network packet
 - this can be VERY inefficient
 - so instead, BPF put code into the kernel so the packets were processed there
    - but a userspace gave the processing code to the kernel

eBPF is what happens when you take BPF technology and say, "why use this only for network traffic?  why can't I just add code to the kernel to do whatever?"
 - it started with general kernel introspection but then has grown from there

So using eBPF you can make tools that gather stats on every process, observe any process, and manipulate any process
 - basically what gdb can do, but for all processes, not just one at a time
 - and you can look/interact with kernel data structures


Note that in Tutorial 7, we're playing with two technology stacks for doing eBPF, bcc and bpftrace
 - bcc: python scripts that embed C code that gets turned into eBPF bytecode
    - good for very powerful, flexible programs that integrate userspace and kernelspace functionality (python in userspace, C->eBPF for kernel space)
 - bpftrace: high-level language for generating eBPF programs

bpftrace programs are MUCH shorter than bcc programs
but bcc programs are more flexible