Operating Systems 2022F Lecture 16: Difference between revisions
Created page with "==Video== Video from the lecture given on November 10, 2022 is now available: * [https://homeostasis.scs.carleton.ca/~soma/os-2022f/lectures/comp3000-2022f-lec18-20221110.m4v video] * [https://homeostasis.scs.carleton.ca/~soma/os-2022f/lectures/comp3000-2022f-lec18-20221110.cc.vtt auto-generated captions] Video is also available through Brightspace (Resources->Zoom meeting->Cloud Recordings tab) ==Notes== <pre> Lecture 16 ---------- - scheduling - eBPF For Tutorial..." |
(No difference)
|
Revision as of 21:52, 10 November 2022
Video
Video from the lecture given on November 10, 2022 is now available:
Video is also available through Brightspace (Resources->Zoom meeting->Cloud Recordings tab)
Notes
Lecture 16
----------
- scheduling
- eBPF
For Tutorial 7, the detailed instructions about the state of the VM are outdated, I should delete that
- the VM you have should work fine
For Tutorial 7, you should understand what eBPF is. So I will try to explain now!
Remember we have userspace and kernelspace
userspace: processes (i.e., ls, top, emacs, ssh, etc)
kernelspace: the code that handles system calls, interacts with the hardware
While in userspace, the CPU is in user mode <-- limited access
While in kernelspace, the CPU is in supervisor mode <-- full access
Note that even processes running as root have limited access
- but the kernel will do almost anything for a process running as root
The code in the kernel primarily comes in two forms:
- the kernel image loaded at boot (/boot/vmlinuz...)
- kernel modules (/lib/modules/<kernel version>)
Kernel modules are really like dynamically linked libraries, but for the kernel
(literally they are .ko's, kernel object files)
Note that kernel modules can depend on other kernel modules
- when we run "lsmod" you can see all the loaded modules and the other modules they depend on
Only a small fraction of available modules are loaded into any given Linux system at any time.
Like a dynamically linked library, code in a kernel module is linked into the kernel, so it has full access
- modules can do anything the kernel can, it is just more kernel code
And device drivers need low level access, so they are either compiled into the kernel or are loaded as kernel modules
- a "statically linked" module is just built into the kernel image file
So what happens if a module (driver) is buggy?
- what is a "segfault" for a kernel module?
Accessing memory improperly in the kernel will result in a "kernel oops" if you are lucky, or a kernel panic if you aren't (kernel crash, system freeze)
- a kernel panic on Windows is a "blue screen of death"
- an oops dumps a record of what happened to the kernel log and then continues. But after an oops, the kernel may become unstable because who knows what has been corrupted, best to reboot ASAP
You can run a debugger on a kernel
- but a kernel debugger has to run on another computer
- easy to do if you're running the kernel in a virtual machine,
- but if you're on bare hardware you have to set up another computer
to run the debugger (typically connected via a serial line)
A "safe mode" for an OS can do many things, but typically it limits the device drivers being loaded and uses a configuration that is likely to work but may not enable all features
- "safe mode" is a goal, not a specific set of technologies
On Linux, we can generally get a kernel log
- standard location is /var/log/kern.log
Because the cost of mistakes are so high, you really don't want to add code to the kernel if you can avoid it
- better to stick to userspace
But wouldn't it be nice to be able to add code to the kernel *safely*?
Why would you want to add code to the kernel in the first place, if you aren't writing a driver?
- well, the kernel sees everything, so if you want to do system monitoring, the kernel is the one to do it
- potentially also security (observe and stop bad things)
But how could you do this safely?
eBPF is a technology for safely adding code to a running kernel
Key ideas
- NOT Turing complete, no unbounded loops
- no unrestricted memory access (no pointer arithmetic, all must
be known at compile time)
So eBPF has a verifier that checks code before it is loaded. Only if the code passes is the code linked into the kernel
There are lots of things you might want to do that are technically safe but the eBPF verifier can't be sure it is safe, so it will refuse to load it
- biggest challenge in writing eBPF code is getting past the verifier
So there are two paths to loading code into the kernel dynamically
- modules (basically no checks, except digital signatures sometimes)
- eBPF (lots of checks)
modules contain processor-specific machine code (e.g., x86-64 machine code), just like regular program binaries
eBPF code is in a special byte code for the eBPF virtual machine
- after the code is verified it is just-in-time compiled into machine code
Could you still do bad things with eBPF? YES. After all, you're getting
full visibility into the kernel and you can even change a few things
- thus only root can load eBPF code
- and of course only root can load kernel modules
The Linux kernel is written in C & assembly currently
- but there are early efforts to add Rust support
eBPF is typically also written in C
- but there are other high-level languages that compile down into eBPF bytecode
So why the name eBPF?
- stands for enhanced Berkeley Packet Filter
So eBPF is an enhanced version of BPF
- BPF was created to do...packet filtering!
- so why would we want to add code to the kernel to do packet filtering?
(what is packet filtering)
Packet, we mean network packets (IP packets for TCP/IP)
Packets come into the kernel from network interface devices (ethernet, wifi) and are often sent out to another network or are routed to one or more processes
What if you wanted to make a program to observe network traffic, route it to different places, and maybe even change or drop it as it comes in?
- yes, you can implement a network firewall this way
With a standard architecture, to do the above you'd have to route all network traffic to a process, it would do some work, then it would pass the packets back to the kernel.
- so you have a kernel->userspace->kernel transition potentially for every network packet
- this can be VERY inefficient
- so instead, BPF put code into the kernel so the packets were processed there
- but a userspace gave the processing code to the kernel
eBPF is what happens when you take BPF technology and say, "why use this only for network traffic? why can't I just add code to the kernel to do whatever?"
- it started with general kernel introspection but then has grown from there
So using eBPF you can make tools that gather stats on every process, observe any process, and manipulate any process
- basically what gdb can do, but for all processes, not just one at a time
- and you can look/interact with kernel data structures
Note that in Tutorial 7, we're playing with two technology stacks for doing eBPF, bcc and bpftrace
- bcc: python scripts that embed C code that gets turned into eBPF bytecode
- good for very powerful, flexible programs that integrate userspace and kernelspace functionality (python in userspace, C->eBPF for kernel space)
- bpftrace: high-level language for generating eBPF programs
bpftrace programs are MUCH shorter than bcc programs
but bcc programs are more flexible