Operating Systems 2020W Lecture 11
Video
Video from the lecture given on February 12, 2020 is now available.
Notes
Lecture 11
----------
* Assignment 2
* busy mounts
* node.js
* mmap
why do the functions we call in node have a 'Sync' at the end?
- because by default I/O is async in JavaScript/node
(upon completion a function is called, and until then node does
other things)
- don't worry about it for now, just use the Sync versions of functions
To exit node, type Ctrl-D
- this makes the terminal send an end of file to the process
To solve #17 on the assignment, you want to erase an inode for a directory
- the contents of the directory will be orphaned, thus landing in
lost+found after filesystem repair
- note you may have to force a filesystem check after erasing part of the
"disk"
- you can figure out where precisely an inode is using dumpe2fs and the info
here (and the inode number):
https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Finding_an_Inode
if you are fuzzy on concepts, look in the textbook (in filesystem
implemenation)!
- can filesystems be corrupted if an unmount is forced? YES
- because not all changes may have been written to disk
- unmount -> removing the filesystem (i.e., pulling a USB stick)
- so filesystem may be in an inconsistent state, with data missing
or only partial changes made (which may make no sense on their own)
- normally with USB disks data is flushed to them frequently
so just pulling them normally doesn't result in corruption.
*normally*, not always (better to manually unmount before removing
a device)
- a corrupted filesystem means the metadata of the filesystem has been messed
up, thus endangering all the data stored on the system
- what if a block is marked as free but is actually being used to store
file data? This is a type of inconsistency that can result through
filesystem corruption
- in other words, filesystems are data structures. data structures have
invariants. A corrupted filesystem is one where its invariants no longer
hold.
- messing with the root filesystem is very dangerous; others can be, but
it depends on how it is being used.
- RAID is a separate concept from filesystems (RAID is at the block level)
- If you are curious what happens, mess with your VM! (Or make a new
one to destroy.) Discovery requires experimentation, after all.
mmap
- not on midterm, but will be covered later, is in tutorial 5
Key idea of mmap is that we can associate files with the memory of a process
- interacting with a range of addresses in memory <-> read/write to file on
disk
- "memory-mapped I/O"
- in 3000test, reading from data[] is actually reading from the file fn
on disk
- the kernel automatically takes care of reading the file and filling
in memory as needed
- mostly, when you mmap a file into memory, you could have just done
read/writes to the file
- but it can be more convenient to directly interact with
data in an mmap'd buffer rather than interacting with a temporary
buffer that you then have to read and write (to/from the file)
- but there are some other cool tricks
- dynamic libraries are mmap'd read only from disk at program start
- key benefit - RAM allocated to these libraries can be shared
across processes
- The "SHR" column in top is mostly due to libraries that have been
mmap'd by multiple processes (mapped multiple times but only
consuming RAM once)
- how this is implemented will be discussed later
- dynamic linking means library code is only loaded at runtime (generally
via mmap). static linking means library code is added to the binary at
compile/link time
- thus code for printf resides in the dynamic C libary that is mmap'd in
mmap can take a file descriptor (to specify the file to be mapped into memory), but it doesn't have to
- mmap's with a file desciptor of -1 are known as "anonymous mmaps"
- (you should also set the MAP_ANONYMOUS flag)
anonymous maps are used to allocate memory from the kernel
- anonymous maps make no sense for read-only memory (there's nothing there),
so normally anonymous maps are read/write
anonymous mmap is normally used for larger memory allocations, while sbrk is used for smaller ones
- the "heap" really isn't one thing
- how this is all implemented depends on how malloc is written
- there are actually many different kinds of malloc, and
C apps often have custom memory allocators that may use sbrk and mmap
themselves
mmaps of files follow the same permissions as always (note you do it to
a file descriptor, so you have to get past the checks in open)
When you look at /proc/<PID>/maps, the entries have permissions
- each entry is a range of memory for that process
- memory has permissions (readable, writable, executable)
- permissions are for security and performance
- read-only memory is easier to share
- don't want bytes in memory to be interpreted as code
if we don't want to, that's a common form of attack
(machine code injection, e.g. buffer overflow attacks)
- if you try writing to read-only memory, your process will be
sent a signal (SIGSEGV or something similar)
- the memory map is how the kernel decides whether a memory access
(pointer access) is valid or not
mmap is not necessarily faster than read
- main performance hit is accessing the disk normally
- all depends on data access patterns
- with read/write you have more explicit control
To really understand mmap, you have to understand how virtual memory is implemented
Maps can be shared or private
- shared maps mean changes to the memory -> change file on disk
- with private maps, changes to memory don't change files on disk
Also, fork and mmap interact
- with shared mmaps (files or anonymous), memory is shared
between parent and child
- with private mmaps, memory is copied from parent to child
So you can actually control what parts of memory are copied and which
are shared based on flags to mmap
- so can get shared memory benefits of multi-threaded processes
without threads (can instead have processes that share some ranges of memory)
midterm review is on Feb. 26th, midterm is on Feb 28th (in class)
- I will be there for the midterm, review wil be online
Remember that memory permissions and file permissions are different
- same kinds of bits, but applying to completely different data
- file permissions are for inodes
- memory permissions are for ranges of memory, parts of a process's address
space
- DON'T CONFUSE THEM!