Operating Systems 2020W Lecture 21

Video

Video from the lecture given on March 27, 2020 is now available.
Resources

Notes

Lecture 21
----------
 - tutorial 8
 - tutorial 9

Standard address space ordering

command line args, env vars
stack (grows down)

heap (grows up)
globals
code


Virtual memory systems rely on page replacement algorithms
 - decide when to kick out a page in order to make room for another in RAM
 - we use bits in the page table entries to decide what to do
   - valid bit: is this a valid PTE?
   - accessed bit: recently accessed (approximation of a timestamp)
   - dirty bit: has it been changed (i.e., do we have to save it before
     evicting it?)
 - dirty pages have to be "cleaned" (i.e., saved to disk) before they can be
   freed
 - we don't want to kick out recently accessed pages as they are likely
   to be used in the future
    - we'd just have to load it back somewhere else, wasting time
 - when a program accesses parts of its virtual address space, the kernel
   has to make sure that memory is valid
     - if it is code, the code has to be loaded from disk
     - if data, have to make sure data page is allocated
 - the kernel can try to predict what a program is going to do
     - only makes sense in limited cases (e.g., load chunks of
       program binary from disk when program is first execve'd)
 - ideally, kernel would know exactly what each process would want to do next
    - would make sure resources were available just in time
    - of course, we can't do this in practice

 - note that block size on disk for current filesystems
   nowadays equals page size (both are 4k)
    - not essential but useful at times
    - we talk about blocks on disk, not pages (but that's just terminology)




The memory hierarchy  (fastest storage->slowest, smallest->biggest)

registers     <---  managed by the compiler
TLB           <---  managed by hardware, partially by OS
L1 cache      <---     (Per core)
L2 cache      <---  managed by CPU itself, no software involvement 
L3 cache      <---     (Per CPU (chip))

DRAM          <---  managed by OS
----------------  below is persistent, above, volatile, managed by software
ssd          
hard disks
tapes


L1, L2, and L3 are all caches of main memory (DRAM), are SRAM
 - vary in speed (latency/bandwidth in being transferred to registers)
 - vary in size (smaller vs larger)

If you want to learn about architecture in more detail, read
Hennessey and Patterson, "Computer Architecture: A Quantitative Approach"
 - great book, very readable


Remember how we discussed concurrency was hard, required hardware support?
 - think about the work required to make sure copies in L1, L2, L3, registers,
   and DRAM are all in sync
     - can have 3, 4, or 5 copies of one variable, that logically should
       always be the same but in practice can get out of sync



you don't have to make sure DRAM is in sync with the caches, that happens
automatically
  - it just isn't perfect when you access DRAM in parallel from muliple cores,
    you can see (slightly) stale data

The real secret of modern systems is that while we program mostly sequentially,
hardware runs things in parallel and *pretends* it all happened sequentially