Operating Systems 2021F Lecture 16
Video
Video from the lecture given on November 11, 2021 is now available:
Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)
Notes
Lecture 16 ---------- - interviews - there will be more, just figuring my schedule for next week - yes, final will replace midterm if you do better - A3 will be posted by tomorrow, will be going over it a bit today Today: filesystems Filesystem - collection of files & directories - namespace for inode numbers - way to transform a block device into a place you can store files - data structure that is accessed using the file API From the kernel's perspective - has file system call interface (open, read, write, etc) (also for directories) - when it gets a pathname, where does it look to get the data? - it figures out which filesystem is responsible for the containing directory - it then asks that filesystem to do the file operations - this abstraction is knows as the "VFS" (virtual filesystem) layer /proc/filesystems lists all the different kinds of filesystems your Linux system knows about currently - the ones with "nodev" beside them are ones that have no corresponding storage device - means that while you access them with file-related system calls, what you're getting back isn't from a storage device, it is something else - known as pseudo filesystems it seems? df: dump filesystems - show currently mounted filesystems mount: add a filesystem to our current file hierarchy - mountpoint: where those files should go, normally an empty directory (if not empty, those files will be hidden) df . - tells me the filesystem responsible for the current directory df -a - show ALL the filesystems - including ones with no corresponding device (pseudo filesystems) Pseudo filesystems like /proc tend to have some weird info that give them away - inode numbers are weird - file sizes make no sense (are often zero) - timestamps aren't consistent The above is all true because those fields are just made up when you access the files, they aren't "stored" anywhere - file metadata for pseudo filesystems isn't that significant mount - get full information on mounted filesystems (with no args) - or you can use it to add filesystems to file hierarchy mounting a real filesystem - have access to data on a block device mounting a pseudo filesystem - have access to some new capability, typically kernel data structures or runtime data that doesn't need to be stored on disk - essentially all data is in RAM or generated algorithmically - just depends on the filesystem type /run - it is a "tmpfs" - temporary filesystem - data is temporary - no corresponding block device - it is a "RAM disk" - full filesystem, but *data is lost when system reboots* - used for PID files and lock files mainly - PIDs will change when system is rebooted - locks shouldn't be held across reboot - note here "tmpfs" is the filesystem type, /run is the mountpoint Notice that /tmp is NOT in tmpfs - it is in the filesystem of the root filesystem, normally ext4 - BUT files in /tmp are erased on every reboot - but that happens due to a boot time script If you boot with a live CD - data in /run would already be lost - data in /tmp should still be there /var/tmp is like /tmp, except it is NOT erased when rebooted In your VM, the root filesystem is mounted as follows: /dev/mapper/vg0-lv--0 on / type ext4 (rw,relatime) the "ext4" means that it is of type ext4. So, why isn't /tmp a tmpfs? - historic/distribution reasons (I know some distributions that do put /tmp/ on tmpfs) - can store a lot more in /tmp normally than you could in /tmpfs (as its storage is limited by RAM/virtual memory) (I'll discuss lock files later) Different operating systems have their own native filesystems - MSDOS: fat, vfat, fat32 - Windows: NTFS - MacOS: HFS, HFS+, APFS - FreeBSD: UFS, zfs - IRIX: xfs - Linux: ext2, ext3, *ext4*, btrfs, squashfs LOTS of filesystem types These are all regular filesystems used for regular disks, originally developed for magnetic hard drives, not SSDs - except APFS? Why so many? - some support different file sizes - different performance characteristics - reliability/durability - licensing - stubbornness/Not Invented Here Key difference - some are designed for UNIX-like systems (POSIX compliant) - others are not! - POSIX-compliant ones use inodes, others generally do not Remember a filesystem is just a data structure - so the filesystem type is the kind of data structure Note that some filesystems have specialized purposes - squashfs is designed to be compressed and read only Why would you want a read only filesystem? - storage medium is read only (e.g. optical media) iso9660, etc - for starting up the system <--- we'll get to this Look up the youtube channel "technology connections", whole series on optical media So now let's make and use a filesystem What do we need? - a block device What will we get? - files stored on the block device Challenge - we don't have any devices we can physically connect to our VM Workaround - we'll make a file that will behave like a block device To make an empty file, use dd, e.g. dd if=/dev/zero of=fakeblks bs=4096 count=100000 if: input file of: output file bs: block size count: count So reads count blocks of size bs from input file to output file - note it does exactly one read system call and one write system call for each block transferred - we read count*blocksize from if - we write count*blocksize to of (Note we are reading from /dev/zero, so we are reading from an infinite source of zero bytes) Note that you can't just use touch - it will just make a file of size zero You could use truncate - but we'll get to that To look at the contents of a binary file, you can use od - with the -a option, translates each byte to its corresponding named character If we run od on the file before and after running mkfs.ext4, we can see what bytes were modified in the file The kernel has many possibilities in determining how to service a given file operation request - if it is for a file on a regular filesystem, it uses that filesystem's code to interpret data read from or written to the mounted block device - if it is for a pseudo filesystem, it runs the code in the kernel that implements the file operations - they can do essentially anything - if it is for a character device - runs the code for the character device /dev/zero is a character device that always returns null bytes - specifically, when you do a "read" system call, it fills the buffer given to it with all zero bytes - will do this as many times as asked There's also /dev/urandom and /dev/random - infinite random bytes - but /dev/random is VERY slow because it tries to return "real" random bytes A filesystem is a data structure. So, we need ways to - create/initialize the data structure: mkfs - validate and ideally repair the data structure: fsck Why bother validating and repairing filesystem data structures? - we don't normally do that for hash tables or binary trees? - but they only stick around for short periods of time - if they get messed up, we just restart the program But filesystems are meant to last for years, and hardware & software can fail over that time - cosmic rays - manufacturing defects - code bugs... So filesystems are designed to be repairable - should be able to recover overall structure - ideally, preserve what data you can when things are damaged fsck tries its best to repair filesystems - but it relies on how the filesystem is structured in order to do its work Do pseudo filesystems have fsck? - no, because it wouldn't make sense, no persistent state to fix note that fsck is specific to each filesystem - every data structure needs its own specialized repair mechanisms What allows filesystems to be repaired? - lots of redundancy! "pointers" in filesystems are always bidirectional - so if one is missing we can recover it - (like a doubly linked list) Some data tells us how the rest of the data is organized - this is the "superblock" (think of it as the root node of a tree) - store multiple copies of the superblock because if this is lost we lose everything Remember filesystems are data structures for organizing blocks - a block is fixed sized, nowadays generally 4K, but some small power of 2 - locations in filesystems are in terms of block numbers, not addresses e.g., block 2000, not address 265101561 We always access the data structure by reading or writing entire ranges of blocks, not ranges of bytes