Operating Systems 2020W Lecture 16

Video

Video from the lecture given on March 11, 2020 is now available.
Notes

Lecture 16
----------

Virtual memory
 - virtual memory vs physical memory
 - page tables
 - page table entries (PTEs)
 - TLB
 - mmap and virtual memory
 - swapping & paging

What is virtual memory?
 - abstraction over physical memory that allows for
   memory to be shared with running programs not having to do
   anything special
 - each program sees its own memory map

To implement virtual memory, have to map between virtual addresses
and physical addresses
 - processes use virtual addresses
 - CPU maps virtual addresses to physical addresess to find actual memory
   (to read or write)

In other words, when, in assembly/machine code, there is an address
in a register or an instruction refers to an address, the CPU must translate this virtual address to a physical address before going to RAM and getting the value (or changing it)


So this translation should seem really really slow, and it is...if you
did it purely in software.  But there is hardware to help.

But how do we store this mapping between virtual and physical addresses?
 - every process has a "page table" associated with it. 
 - this is the data structure that defines the mapping between virtual
   and physical memory

Basic question: what units does the kernel use to manage memory?
 - on disk, kernel uses blocks (1k or 4k, fixed size)
 - in RAM, memory is addressable in bytes, so shouldn't we allocate it
   in bytes?  NO
 - RAM is allocated in fixed-sized units, typically 4k or 8K
   - known as pages, NOT blocks

 - often times, the page size and the block size of a system are equal
   (both 4K), but this can vary

We use fixed-sized units of allocation to avoid memory fragmentation
  - keep them small to minimize internal fragmentation (space lost
    to small allocations, e.g. need one byte but allocate 4k)
  - fixed sized prevents external allocation (any memory allocation
    can be accomplished if there are a sufficient number of pages/blocks)

Physical RAM
************
Program 1
...
...
------------
Program 2
...
...
...
------------
Free Space
...
...             <--- put Program 3 here
...
------------
Kernel
...
...
...
************

Want to load program 3, could put it in free space
BUT normally, memory map is much messier



Physical RAM
************
Program 1
...
...
------------
Free Space
...
------------
Program 2
...
...
...
------------
Free Space
...             <--- put Program 3 here, but not enough room!
------------
Kernel
...
...
...
************

Fragmentation also takes place in filesystems
 - so we try to allocate "extents" which are ranges of contiguous blocks,
   faster to access

But in RAM fragmentation is much more painful
 - want to assemble a 1M array out of scattered 4k pieces?

So really virtual memory is an abstraction over RAM much like a file
is an abstraction over disk blocks


The purposes of virtual memory are
 - give programmers the illusion that their program occupies a contiguous
   range of bytes in memory
 - allow programs to be bigger than physical memory

(Look up segmentation in the textbook to see another scheme for
abstrating memory)

The limits to physical memory is determined by cost, space, and power constraints (4G versus 16G versus 256M versus 16K)

Virtual memory is limited by the size of pointers/addresses for the CPU
 - 32 bit processors normally have 32 bit addresses (max 4G)
 - 64 bit processors normally have 64 bit addresses
 - 8 bit processors normally had 16 bit adresses (max 64k)

2^64 is a very big number, don't expect to ever see a system with that much RAM!
 - so on 64 bit systems, virtual addresses are 64 bit but physical addresses are
   smaller (e.g. 48 bits)

Address spaces is all about limits of naming
 - need a unique address for every byte in memory
 - there are ways around this, but they are very painful (e.g, bank switching)

Virtual memory is made up
 - process sees that memory, but it doesn't all have to be allocated
 - e.g., process is using 64-bit pointers, so can address 2^64 bytes of
   memory...but if it accesses more than the kernel has allocated it
   (e.g., 1M), access to unallocated memory ranges results in segfaults

A segfault happens when a process accesses a virtual address that has
no corresponding physical address

So what is a page table?
 - data structure for mapping virtual->physical addresses
 - does not have mappings on an address-by-address basis,
   that would be crazy inefficient
 - instead, mappings are virtual page->physical page
   - can copy page offset from virtual to physical address

Page tables are basically arrays stored in a memory-effient way
 - use virtual page number as index into array
 - value at that index is the physical page number

But if you made this a real array, it would be crazy inefficient
 - would need to be huge, e.g. 2^52 entries for a 64-bit address space
   with 4k pages (4k = 2^12)
 - so we need something like a sparse array

Is data in swap persistent?
 - it isn't supposed to be, is deallocated when procesess terminate
 - but...data isn't normally overwritten on disk
 - solution: look up encrypted swap

Why not make a page table a hash table?
 - want something that can be implemented quickly in hardware
 - calculating values on each memory access would be slower generally

How do we do this "sparse array"?

Big idea: split virtual address up into portions that each becomes an index
into a page
 - need to make page tables out of pages!

e.g., divide a 32 bit address as follows:

 ----------- ---------- ------------
   10 bits     10 bits   12 bits      = 32 bits total

   lev 1       lev 2     offset


level 1 = offset into top page of page table
level 2 = offset into level two of page table
offset  = offset into data page

2^10 = 1024, thus allows for 1024 4-byte values in 4k
 4 bytes = 32 bits = enough to store a pointer/address value
   (actually stores a page table entry as we'll discuss below)

(These offsets are a lot like how we find locations of inodes in ext4)
(Computer scientists have a limited number of ideas)

So a page table is just a bunch of pages which define virtual to physical mappings.  When you get a virtual address (on 32 biut)
 - use first 10 bits to find PTE in level 1 of the page table
     - use found PTE to find location of level 2 of page table page
 - use next 10 bits to find PTE in level 2 page table page
     - use found PTE to find location of desired data page
 - use last 12 bits to index into desired data page
     - this is the memory you want!

PTE = page table entry

page table entry is
 - a physical page number (for 32 bit sytems, 20 bits)
 - metadata about the page

If you had 4G of RAM, that is 2^32
  2^32 = 2^20 * 2^12
         num    size
         pages  of page
                (4k)

For 64-bit systems it is the same, but just gets messier
 (because virtual addresses are larger than physical addresses)
 can have 3 or 4 levels of pages

We split page tables into levels because we want entries for a level to
fit into one page

There is            1 level 1 page, for 32 bit systems, per process
           up to 2^20 level 2 pages, for 32-bit systems, per process
	    (but most procesess will only allocate some of these
	     unless it uses 4G of RAM on its own)

Remember every process needs its own page table
 - start with one level 1 page and 1 level 2 page
 - as process uses more RAM, add level 2 pages
  
Why "only" 64 bits?

This is 4G^4G
  2^64 = (2^32)^32

It is very big

I haven't talked about performance of virtual memory
 - note that each level of the page table is a memory access
 - so to look up one virtual address, we have to look at
    - the level 1 page table page
    - a level 2 page table page
    - the data page
 - 3 physical memory accesses for every virtual memory access
 - NOT A GOOD DEAL

Solution: have a very fast cache of virtual->physical memory mappings
 - this cache is called the TLB
 - TLB=translation lookaside buffer, nobody knows why this name
   is used nowadays (ok, go to the IBM design literature and you can
   see the rationale but it makes no sense today)
 - just remember the TLB is a cache of recently used PTEs
   (page table entries)

CPU looks in the TLB for a virtual addresss,
 - if it finds it, we're done, and we have lost almost no time
 - if we don't find it in the TLB, go look at the page table - lost time,
   but access will succeed
 - if we don't find it in the page table, tell the OS
    - because the process tried to access an invalid range of its
      virtual address space

On Intel and AMD systems, you cannot directly control the TLB entries, CPU
does its own thing (it looks at the page table directly)

But on other processers (I believe SPARC), the hardware doesn't look at
page tables, the OS does.  So the kernel can directly control TLB entries


What is the metadata associated with each PTE?
 - there's the physical page number (that's the data)
 - read, write, execute bits (like a file, but for each page)
 - valid bit
 - dirty bit
 - accessed bit
 - others?

We have per-page rwx bits so the CPU can alert the kernel when
a process does an improper memory access
 - write to read-only memory
 - execute to no-execute memory
Big help for security!

Remember the kernel isn't running most of the time
 - it is only called in when required
 - most of the time the CPU is running userspace processes
 - so need to make sure kernel is woken up when key decisions have
   to be made

Valid bit is used to say whether the PTE refers to a real physical page
 - you can have a physical page number of zero!
 - so need another bit to say whether it is valid or not
 - needed for when that particular page isn't allocated,
   OR if it refers to data that is on disk (and so needs
   to be read in)


(His name is Roshi, he's a boxer)

Dirty bit says whether the page has been modified since allocated
or read from disk
 - clean pages can be tossed out of RAM without consequence,
   can be read from disk later
 - dirty pages have to be written to disk ("cleaned"), until then
   the MUST stay in RAM

Kernel will try to kick out clean pages when RAM is needed
 - changing dirty to clean pages happens, but is a lower
   priority unless RAM is getting scarce

Accessed bit says whether the page has been recently accessed
 - 1 bit to represent time
 - not efficient to store a whole timestamp
 - but how is 1 bit enough?  clock algorithms (next time)

The kernel has its own page table for its code and data
 - thus it mostly uses virtual addresses itself
 - no fun to program with fragmented memory!
 - but it deals with physical addresses when necessary
   (userspace can't see them)

What if your system doesn't have swap?
 - you either tell processes to free memory or you kill them
 - this is what happens on Android, iOS (no swap),
   they kill procesess and restart them all the time
 - not completely sure why they don't have swap...but it is probably
   primarily for power consumption reasons (which is mostly how
   phones get weird)

Pages appear sequential in virtual memory but are fragmented in physical memory
 - just like how data in files is sequential but blocks of a file can
   be scattered across a disk


Next week, tutorial will show you virtual versus physical addresses
using a kernel module and special userspace program (developed again by William!)