Operating Systems 2020W Lecture 15
Video
Video for the lecture given on March 6, 2020 is now available.
Notes
Lecture 15 ---------- - midterms mostly graded, all should be done by end of weekend - midterms will be returned through your TA - if you go over your midterm with your TA, you get 2 bonus marks - solutions are posted, will go through them at a later date Kernel modules & eBPF --------------------- What do you do if system calls aren't enough? - not fast enough - insufficient functionality - insufficient visibility Consider web browsers - web pages can do some things - browser extensions can do other things Browser extensions can be very dangerous - see every page you visit - change contents of any page - change interface elements -> change how your banking website works - send arbitrary data to other systems - spyware! Price for added functionality is more risk How can we extend the operating system, specifically the Linux kernel Firefox long had very powerful extensions, but more recently adopted a much more restrictive extension interface - one reason: mostly compatible with Chrome - but big reason: much safer - also, made it hard to change the browser, because they had to preserve internal interfaces for external consumers Of course, any program running on a system "extends" it - but extensions make use of privileged APIs to allow for tighter integration - also, extensions tend to run in the same address space as the main program, so can manipulate main program state With extending the Linux kernel, we want code running in the kernel - so, in the address space of the kernel - but this means the code can mess with arbitrary parts of the kernel - so you can see and modify any process, act as any user - interact with any device in arbitrary ways - allocate any resources you want Classic bad thing to do with a kernel module: a rootkit - hide processes - hide files - add backdoors that allow you to bypass mainline authentication - think "joshua" How do we control what code gets loaded? - normally requires root privileges - on some systems, code must be signed Code running in the kernel IS NOT running as root! - root is the maximally privileged user - but root is just a label for processes - the kernel is what implements the process abstraction kernel modules have maximum flexibility and maximum pain - one coding error can lead to system crashes and corrupted devices But what about adding code to the kernel in a safer way? When you run code in a web browser, where does it run? - in the address space of the browser - (compiled JavaScript) - this is safe because code is "sandboxed" - (sandboxing is not a technical term, it is an aspiration) We don't sandbox code loaded into a kernel - we already have processes But we can verify/check code to make sure it is safe (for some approximation of safe) Standard kernel modules have no verification eBPF does! eBPF is based on the Berkeley Packet Filter (BPF), but extended - idea was to run code in the kernel to filter packets trace uses eBPF strace, gdb uses ptrace (a system call) - designed for debugging one process at a time eBPF is very fussy about the code it accepts - all loops must clearly terminate! - no arbitrary memory access - eBPF code runs in kernel space, but is verified to make sure it is safe (supervisor mode on the CPU, in the kernel address space) - kernel modules run in kernel space, and ARE NOT verified - but they may be signed (so inauthentic modules will be rejected) - supervisor mode on the CPU, in the kernel address space - processes run in userspace, not verified but are "sandboxed" to a degree - user mode on CPU - own address space - eBPF is a new thing, separate from regular functionality we've covered - NOT used to implement system calls (at this time) - NOT used for device drivers eBPF is a safer way to add code to kernel space API exposed to userspace is stable - mainly system calls, but also device files but internal kernel APIs are not stable - kernel modules, eBPF programs have to be compiled anew for each new kernel - easier with eBPF, because designed to be compiled at runtime Monolithic vs microkernels - difference is in what runs in kernel vs user space - monolithic kernels runs most networking, filesystems, device drivers in kernel space - microkernels try to put these all in processes Advantage of microkernel - potentially more stable - easier to debug - most OS code is in processes, so can use conventional tools to debug Key disadvantage - performance (Security benefit of microkernels is quite arguable) Linux does have userspace drivers - NTFS for example (main Windows file system) eBPF reduces disadvantages of monolithic kernels - safe mechanism for extensions that is faster than processes