Operating Systems 2020W Lecture 11
Video
Video from the lecture given on February 12, 2020 is now available.
Notes
Lecture 11 ---------- * Assignment 2 * busy mounts * node.js * mmap why do the functions we call in node have a 'Sync' at the end? - because by default I/O is async in JavaScript/node (upon completion a function is called, and until then node does other things) - don't worry about it for now, just use the Sync versions of functions To exit node, type Ctrl-D - this makes the terminal send an end of file to the process To solve #17 on the assignment, you want to erase an inode for a directory - the contents of the directory will be orphaned, thus landing in lost+found after filesystem repair - note you may have to force a filesystem check after erasing part of the "disk" - you can figure out where precisely an inode is using dumpe2fs and the info here (and the inode number): https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Finding_an_Inode if you are fuzzy on concepts, look in the textbook (in filesystem implemenation)! - can filesystems be corrupted if an unmount is forced? YES - because not all changes may have been written to disk - unmount -> removing the filesystem (i.e., pulling a USB stick) - so filesystem may be in an inconsistent state, with data missing or only partial changes made (which may make no sense on their own) - normally with USB disks data is flushed to them frequently so just pulling them normally doesn't result in corruption. *normally*, not always (better to manually unmount before removing a device) - a corrupted filesystem means the metadata of the filesystem has been messed up, thus endangering all the data stored on the system - what if a block is marked as free but is actually being used to store file data? This is a type of inconsistency that can result through filesystem corruption - in other words, filesystems are data structures. data structures have invariants. A corrupted filesystem is one where its invariants no longer hold. - messing with the root filesystem is very dangerous; others can be, but it depends on how it is being used. - RAID is a separate concept from filesystems (RAID is at the block level) - If you are curious what happens, mess with your VM! (Or make a new one to destroy.) Discovery requires experimentation, after all. mmap - not on midterm, but will be covered later, is in tutorial 5 Key idea of mmap is that we can associate files with the memory of a process - interacting with a range of addresses in memory <-> read/write to file on disk - "memory-mapped I/O" - in 3000test, reading from data[] is actually reading from the file fn on disk - the kernel automatically takes care of reading the file and filling in memory as needed - mostly, when you mmap a file into memory, you could have just done read/writes to the file - but it can be more convenient to directly interact with data in an mmap'd buffer rather than interacting with a temporary buffer that you then have to read and write (to/from the file) - but there are some other cool tricks - dynamic libraries are mmap'd read only from disk at program start - key benefit - RAM allocated to these libraries can be shared across processes - The "SHR" column in top is mostly due to libraries that have been mmap'd by multiple processes (mapped multiple times but only consuming RAM once) - how this is implemented will be discussed later - dynamic linking means library code is only loaded at runtime (generally via mmap). static linking means library code is added to the binary at compile/link time - thus code for printf resides in the dynamic C libary that is mmap'd in mmap can take a file descriptor (to specify the file to be mapped into memory), but it doesn't have to - mmap's with a file desciptor of -1 are known as "anonymous mmaps" - (you should also set the MAP_ANONYMOUS flag) anonymous maps are used to allocate memory from the kernel - anonymous maps make no sense for read-only memory (there's nothing there), so normally anonymous maps are read/write anonymous mmap is normally used for larger memory allocations, while sbrk is used for smaller ones - the "heap" really isn't one thing - how this is all implemented depends on how malloc is written - there are actually many different kinds of malloc, and C apps often have custom memory allocators that may use sbrk and mmap themselves mmaps of files follow the same permissions as always (note you do it to a file descriptor, so you have to get past the checks in open) When you look at /proc/<PID>/maps, the entries have permissions - each entry is a range of memory for that process - memory has permissions (readable, writable, executable) - permissions are for security and performance - read-only memory is easier to share - don't want bytes in memory to be interpreted as code if we don't want to, that's a common form of attack (machine code injection, e.g. buffer overflow attacks) - if you try writing to read-only memory, your process will be sent a signal (SIGSEGV or something similar) - the memory map is how the kernel decides whether a memory access (pointer access) is valid or not mmap is not necessarily faster than read - main performance hit is accessing the disk normally - all depends on data access patterns - with read/write you have more explicit control To really understand mmap, you have to understand how virtual memory is implemented Maps can be shared or private - shared maps mean changes to the memory -> change file on disk - with private maps, changes to memory don't change files on disk Also, fork and mmap interact - with shared mmaps (files or anonymous), memory is shared between parent and child - with private mmaps, memory is copied from parent to child So you can actually control what parts of memory are copied and which are shared based on flags to mmap - so can get shared memory benefits of multi-threaded processes without threads (can instead have processes that share some ranges of memory) midterm review is on Feb. 26th, midterm is on Feb 28th (in class) - I will be there for the midterm, review wil be online Remember that memory permissions and file permissions are different - same kinds of bits, but applying to completely different data - file permissions are for inodes - memory permissions are for ranges of memory, parts of a process's address space - DON'T CONFUSE THEM!