Operating Systems 2019F Lecture 15
Video
Video from the lecture given on November 1, 2019 is now available.
Notes
The midterm solutions are now posted. If you have questions please ask in a later lecture or on discord.
Lecture 15 ---------- * sshfs * ebpf Mounting filesystems - adding a filesystem to existing file hierarchy, on top of an empty directory - mountpoint directories don't have to be empty, but anything in them will be hidden as long as another filesystem is mounted on top of it - when you start, the file hierarchy is just the "root" filesystem eBPF First, in general we want to add code to the kernel to extend its functionality - but the kernel is a dangerous place to work - entire branch of OS practice that believes you shouldn't add code to the kernel, better to just do everything in processes, including device drivers, networking stack, filesystems...everything except basic scheduling and process memory management => microkernels - most common way to add code to the linux kernel is kernel modules - modern Linux distributions compile their kernels so as much functionality as possible are in modules - code that is loaded at boot time can (mostly) never be removed, so it always takes up RAM on all systems running that kernel, and that RAM is always physical RAM - modules have an advantage that they only have to be loaded if they are needed - in tutorial 7, we'll play with kernel modules, but for now we're doing something else...why? * why are modules so bad? - they are DANGEROUS - they run in the same address space as the rest of the kernel - consider all the problems you have with crashing your own C programs...now you're modifying a gigantic one that you don't understand, injecting code into it at runtime! - you have access to EVERYTHING in the kernel's memory address space, including all processes, all users, you got it all - module code is fundamentally stronger than code running as root - root has to ask the kernel to do things, module code can just do it - kernel can say no, e.g. filesystem busy eBPF is great because... - you can access almost anything in the kernel - but it is safe and much, much easier! - very difficult to crash the kernel or corrupt memory from eBPF, but you can still do lots of things eBPF is meant for kernel introspection - what is happening when you make a system call or call a function? eBPF can look at kernel space, user space, and can move freely in between eBPF has lots of "ease of use" features to make this all surprisingly easy - still can be a bit complicated, we are dealing with an OS kernel! BPF stands for "Berkeley Packet Filter" - whenever you see "Berkeley" in the name of something related to UNIX, pay attention, it probably is important - while Bell Labs (AT&T) built UNIX initially, Berkeley (BSD UNIX) is where UNIX came onto the Internet (and basically built the Internet) BPF was designed to do efficient network packet filtering - problem: want to specify rules for what packets to capture - initial packet processing happens in the kernel - but we normally add functionality in processes - so, if we implemented rules in processes, we'd have the slowdown of constantly switching between user mode and supervisor mode, potentially on every packet - BPF solves this problem by allowing code to be loaded into the kernel for processing packets Internet is a network of networks - connecting networks like your home network to your ISP, or a departmental lab to the Carleton network - LOTS of technologies implementing these networks - ethernet, wifi, ATM, SONET, carrier pigeons... - so we need a way to send data that doesn't depend on the technology Internet is built on a couple of insights - can divide communication into "packets" or "datagrams" with a maximum size - classically, 1500 bytes but this varies - each packet is like a postcard - you send it from somewhere to get to another place - each packet has a source address and a destination address, which are numbers (32 bits for IPv4, 128 bits for IPv6) - ideally, every device connected to the Internet has a unique address - messed this up with IPv4 with technologies such as NAT - so, all your computer does is send out packets with the IP addresses as desired destinations, and it receives packets where the destination IP address is its own - other key insight: best effort delivery - anyone can corrupt or throw away (drop) a packet at any time for any reason) - we have to layer protocols on top of these packets to get reliable communication TCP/IP: IP is the base protocol, TCP is the most common protocol layered on it to produce continuous, reliable streams of bytes Networking is really why we all things that are UNIX-like or not too different (Windows). Because we need a design that is good at handling network traffic at any time - older systems without a privileged kernel used to crash all the time when put on the Internet (think Windows 3.1, MacOS 9 and before) BPF - need a way to load code into the kernel - should be safe - just needs to specify rules for handling network packets BPF solution: build a bytecode virtual machine with a verifier - bytecode: no direct access to memory, is compiled when loaded - verifier: check for funny business, make sure code always terminates eBPF is a Linux technology that extends BPF to allow for arbitrary kernel introspection exploiting the Linux kernel - modules are the way you do it normally, if you can load a module, game over Rootkits - just modifications that allow an attacker to hide and maintain control - if they modify the kernel, they can hide in a way that no process can find you If you really want to protect a Linux system, disable modules and eBPF - but that makes things very inconvenient - signed modules are close CPU bugs (meltdown and spectre) - major security implications for any system that runs untrusted code (i.e., anything running a web browser) - problem is with hyperthreading - allowing multiple processes/threads to share a CPU at the same time Next lecture will be on *Monday*, no lecture on Wednesday. Will announce details on discord Friday lecture will be in the lecture hall live (but also livestreamed)