Operating Systems 2020W Lecture 21
Video
Video from the lecture given on March 27, 2020 is now available.
Resources
- Turing Award talk by John Hennessy and David Patterson
- Wikipedia page on Memory Hierarchy
- Wikipedia page on CPU caches
- Wikichip's page on AMD's Zen microarchitecture
Notes
Lecture 21 ---------- - tutorial 8 - tutorial 9 Standard address space ordering command line args, env vars stack (grows down) heap (grows up) globals code Virtual memory systems rely on page replacement algorithms - decide when to kick out a page in order to make room for another in RAM - we use bits in the page table entries to decide what to do - valid bit: is this a valid PTE? - accessed bit: recently accessed (approximation of a timestamp) - dirty bit: has it been changed (i.e., do we have to save it before evicting it?) - dirty pages have to be "cleaned" (i.e., saved to disk) before they can be freed - we don't want to kick out recently accessed pages as they are likely to be used in the future - we'd just have to load it back somewhere else, wasting time - when a program accesses parts of its virtual address space, the kernel has to make sure that memory is valid - if it is code, the code has to be loaded from disk - if data, have to make sure data page is allocated - the kernel can try to predict what a program is going to do - only makes sense in limited cases (e.g., load chunks of program binary from disk when program is first execve'd) - ideally, kernel would know exactly what each process would want to do next - would make sure resources were available just in time - of course, we can't do this in practice - note that block size on disk for current filesystems nowadays equals page size (both are 4k) - not essential but useful at times - we talk about blocks on disk, not pages (but that's just terminology) The memory hierarchy (fastest storage->slowest, smallest->biggest) registers <--- managed by the compiler TLB <--- managed by hardware, partially by OS L1 cache <--- (Per core) L2 cache <--- managed by CPU itself, no software involvement L3 cache <--- (Per CPU (chip)) DRAM <--- managed by OS ---------------- below is persistent, above, volatile, managed by software ssd hard disks tapes L1, L2, and L3 are all caches of main memory (DRAM), are SRAM - vary in speed (latency/bandwidth in being transferred to registers) - vary in size (smaller vs larger) If you want to learn about architecture in more detail, read Hennessey and Patterson, "Computer Architecture: A Quantitative Approach" - great book, very readable Remember how we discussed concurrency was hard, required hardware support? - think about the work required to make sure copies in L1, L2, L3, registers, and DRAM are all in sync - can have 3, 4, or 5 copies of one variable, that logically should always be the same but in practice can get out of sync you don't have to make sure DRAM is in sync with the caches, that happens automatically - it just isn't perfect when you access DRAM in parallel from muliple cores, you can see (slightly) stale data The real secret of modern systems is that while we program mostly sequentially, hardware runs things in parallel and *pretends* it all happened sequentially