Operating Systems 2020W Lecture 11

Video

Video from the lecture given on February 12, 2020 is now available.
Notes

Lecture 11
----------
* Assignment 2
* busy mounts
* node.js
* mmap

why do the functions we call in node have a 'Sync' at the end?
 - because by default I/O is async in JavaScript/node
   (upon completion a function is called, and until then node does
    other things)
 - don't worry about it for now, just use the Sync versions of functions

To exit node, type Ctrl-D
 - this makes the terminal send an end of file to the process


To solve #17 on the assignment, you want to erase an inode for a directory
 - the contents of the directory will be orphaned, thus landing in
   lost+found after filesystem repair
 - note you may have to force a filesystem check after erasing part of the
   "disk"
 - you can figure out where precisely an inode is using dumpe2fs and the info
   here (and the inode number):  
 
   https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Finding_an_Inode 

   if you are fuzzy on concepts, look in the textbook (in filesystem
   implemenation)!

 - can filesystems be corrupted if an unmount is forced?  YES
    - because not all changes may have been written to disk
    - unmount -> removing the filesystem (i.e., pulling a USB stick)
    - so filesystem may be in an inconsistent state, with data missing
      or only partial changes made (which may make no sense on their own)
    - normally with USB disks data is flushed to them frequently
      so just pulling them normally doesn't result in corruption.
      *normally*, not always (better to manually unmount before removing
      a device)


 - a corrupted filesystem means the metadata of the filesystem has been messed
   up, thus endangering all the data stored on the system
     - what if a block is marked as free but is actually being used to store
       file data?  This is a type of inconsistency that can result through
       filesystem corruption
 - in other words, filesystems are data structures.  data structures have
   invariants.  A corrupted filesystem is one where its invariants no longer
   hold.

 - messing with the root filesystem is very dangerous; others can be, but
   it depends on how it is being used.

 - RAID is a separate concept from filesystems (RAID is at the block level)

 - If you are curious what happens, mess with your VM!  (Or make a new
   one to destroy.)  Discovery requires experimentation, after all.

mmap
 - not on midterm, but will be covered later, is in tutorial 5


Key idea of mmap is that we can associate files with the memory of a process
 - interacting with a range of addresses in memory <-> read/write to file on
   disk
 - "memory-mapped I/O"

 - in 3000test, reading from data[] is actually reading from the file fn
   on disk
     - the kernel automatically takes care of reading the file and filling
       in memory as needed

 - mostly, when you mmap a file into memory, you could have just done
   read/writes to the file
     - but it can be more convenient to directly interact with
       data in an mmap'd buffer rather than interacting with a temporary
       buffer that you then have to read and write (to/from the file)
 - but there are some other cool tricks
   - dynamic libraries are mmap'd read only from disk at program start
     - key benefit - RAM allocated to these libraries can be shared
       across processes
     - The "SHR" column in top is mostly due to libraries that have been
       mmap'd by multiple processes (mapped multiple times but only
       consuming RAM once)
     - how this is implemented will be discussed later
 - dynamic linking means library code is only loaded at runtime (generally
   via mmap).  static linking means library code is added to the binary at
   compile/link time
     - thus code for printf resides in the dynamic C libary that is mmap'd in

mmap can take a file descriptor (to specify the file to be mapped into memory), but it doesn't have to
 - mmap's with a file desciptor of -1 are known as "anonymous mmaps"
 - (you should also set the MAP_ANONYMOUS flag)

anonymous maps are used to allocate memory from the kernel
 - anonymous maps make no sense for read-only memory (there's nothing there),
   so normally anonymous maps are read/write

anonymous mmap is normally used for larger memory allocations, while sbrk is used for smaller ones
 - the "heap" really isn't one thing
 - how this is all implemented depends on how malloc is written
   - there are actually many different kinds of malloc, and
     C apps often have custom memory allocators that may use sbrk and mmap
     themselves

mmaps of files follow the same permissions as always (note you do it to
a file descriptor, so you have to get past the checks in open)

When you look at /proc/<PID>/maps, the entries have permissions
 - each entry is a range of memory for that process
 - memory has permissions (readable, writable, executable)
   - permissions are for security and performance
   - read-only memory is easier to share
   - don't want bytes in memory to be interpreted as code
     if we don't want to, that's a common form of attack
     (machine code injection, e.g. buffer overflow attacks)
   - if you try writing to read-only memory, your process will be
     sent a signal (SIGSEGV or something similar)
   - the memory map is how the kernel decides whether a memory access
     (pointer access) is valid or not

mmap is not necessarily faster than read
 - main performance hit is accessing the disk normally
 - all depends on data access patterns
    - with read/write you have more explicit control
 
To really understand mmap, you have to understand how virtual memory is implemented

Maps can be shared or private
 - shared maps mean changes to the memory -> change file on disk
 - with private maps, changes to memory don't change files on disk

Also, fork and mmap interact
 - with shared mmaps (files or anonymous), memory is shared
   between parent and child
 - with private mmaps, memory is copied from parent to child

So you can actually control what parts of memory are copied and which
are shared based on flags to mmap
 - so can get shared memory benefits of multi-threaded processes
   without threads (can instead have processes that share some ranges of memory)

midterm review is on Feb. 26th, midterm is on Feb 28th (in class)
 - I will be there for the midterm, review wil be online

Remember that memory permissions and file permissions are different
 - same kinds of bits, but applying to completely different data
 - file permissions are for inodes
 - memory permissions are for ranges of memory, parts of a process's address
   space
    - DON'T CONFUSE THEM!