Operating Systems 2019W Lecture 14

Video

The lecture given on March 6, 2019 is now available.
Notes

Lecture 14
----------
Virtual memory!

every process has its own virtual address space

on every memory access, each virtual address is translated into a physical address

The kernel must keep track of these per-process virtual->physical address mappings

Theoretically, you could have a table mapping each virtual address to each physical address

Mappings on a per-byte basis would be *way* too inefficient

process memory is divided into fixed-sized pages
 - normally 4K
 - but sometimes also really large (2M+) for special purposes

Before pages, there were segments

Segments have a base address and a bound (size)
 - variable length
 - typically had a semantic purpose (code, data, stack, etc)

Segment terminology still used when discussing parts of an executable
 (e.g. parts of an ELF file)


You could relocate segments
 - all memory accesess could be relative to a "base" register

But the world moved to a "flat" memory model (i.e. no segments)
 - segments can be confusing when they overlap
 - but the real problems is external fragmentation

internal fragmentation
 - space lost when allocating using fixed-sized chunks (e.g., 4K at a time)

external fragmentation
 - space divided into discontiguous chunks
 - cannot make larger contiguous allocations
 - happens when using variable-sized memory allocations

XXX....XXXX...XXXX

7 units available
4 is the largest contiguous piece

what if an allocation for 6 comes in?

Only way would be to compact memory - move things around until you got
a large enough contiguous block

Virtual memory is a solution for the external fragmentation problem
 - virtual addresses can be contiguous even when physical addresses aren't


To make virtual memory work, we need a mapping of virtual to physical
addresses at a page-level resolution

 4K => 4K

********

Sidebar: the memory hierarchy
 - fastest: small & volatile
 - slowest: large & persistent


PROGRAMMER/COMPILER/HARDWARE MANAGED
 CPU registers

HARDWARE MANAGED
 TLB
 CPU cache (L1) <- smallest, fastest
 CPU cache (L2)
 CPU cache (L3) <- often shared between cores

OS MANAGED - Virtal memory
 DRAM
--------
XPoint?

OS MANAGED - filesystems
 SSD/flash memory
 Spinning Hard drives

APP MANAGED
 Tapes


****************

 4K Page (virtual memory) -> 4K frame (physical memory)

This mapping will be used by the CPU, but managed by the OS

*Page tables* do this mapping
 - but it is more a (very wide) tree, not a table


851F1   521  <- 32 bit virtual address
 ^       ^
page#   page offset

I just need to translate the page # to a frame #

use the frame # plus the page offset to get the physical address

upper 20 bits: page number
lower 12 bits: page offset

need a way to translate 20 bits (page #) to 20 bits (frame #)

Could just use an array with 2^20 entries

But most processes only need a small fraction of 2^20 entries
 - so we want a sparse data structure

Remember we want to do all memory allocation in 4K chunks - even for the page table!

How many mappings can I store in 4K?

I can store 1024 (1K) 32-bit entries in 4K

1024 = 2^10

1st level page table: 1024 entries for 2nd-level page tables (PTEs)

each 2nd level page table: 1024 entries for pages (PTEs)

* We can have up to 1024 2nd level page tables, giving us 2^20 entries


But we're missing something
 - how many memory accesses do we need to resolve one address?!


TLB: "translation lookaside buffer"
 - caches virtual->physical mappings (PTEs)



Page table entries have frame #'s AND metadata
 - valid?
 - modified? (dirty)
 - accessed? (recently)  <--- NOT a time stamp
 - permission bits: rwx


Does the CPU or OS manage TLB entries?
 - depends on the architecture
 - most nowadays have the CPU walk the page table

How does mmap work in this?

Lazy allocation
 - pages are allocated on demand
 - disk is read on demand

Lazy allocation allows for aggressive file caching and memory overcommitment
 - improving performance in general but can lead to bad situations