Operating Systems 2022F Lecture 11

Video

Video from the lecture given on October 18, 2022 is now available:
Video is also available through Brightspace (Resources->Zoom meeting->Cloud Recordings tab)
Notes

Lecture 11
----------

Grading
 - working on A2 and the midterm
 - should be mostly graded by after the break
 - I'll be posting solutions to the midterm and opening up interview
   slots *after* the midterms are graded (so will be after the break)
   - will ask specific people to interview
   - you may also volunteer by just signing up for a slot
 - please don't discuss the midterm on teams yet, some students still have
   to take it

Chapters 35-51 in the textbook will be relevant
 - WAY more material than I will cover
 - focus is on 39 & 40, but others have bits that are relevant
 - but it is all very interesting and worth learning

So for this week we're discussing files and filesystems
 - data structure for mapping hierarchical names to data
   (file paths to file contents)

Two obvious data structures: directories and files
 - directory is a list of files
    - a directory is a kind of file
    - regular files contain data

BUT in UNIX-like systems, that isn't quite how things are organized

The central data structure is the inode

A filesystem is tree-like, but with a bit more complex structure

Remember that filesystems are data structures for storing directories and files in *block-addressable storage*
 - as opposed to byte-addressible storage (RAM)

When you access RAM, you specify an address, and then you can get one or more bytes (with each byte having its own address)

Persistent storage (disks, SSDs) is not generally organized like this.  Instead,
we have a block number that, when requested, returns the contents of a block
 - and blocks are multiples of 512 bytes, today either 4K or 8K bytes

So if you wanted to refer to a specific byte on disk, you'd specify the block it is in and then the offset (location) within the block.

In any given block device, all blocks are the same size.
 - So in practice, when we are referring to a file, its data is stored in a set of blocks on disk.
 - but as a programmer you just see the file as an arbitrary-sized set of bytes, you don't notice the block structure.  But it is there.  You don't have to worry about it because the filesystem does for you.

That's the job of a filesystem - to turn blocks into files.

So a file with one byte of data will take up one block, wasting the rest of the block.  (So small files are inherently wasteful, which is why we generally put lots of small values inside of a file rather than having separate files for each.)

When you get the number of blocks with stat, it doesn't *actually* tell you the number of blocks.  It tells you the number of bytes used on disk, divided by a default block size (512 or 1024, not consistent).

Note that block size is a feature of the hardware
 - the interface for the device determines this

So, why doesn't persistent storage allow for byte-addressible storage?
 - because it would be inefficient, that isn't how it works
 - for hard drives, the drive head cannot read one byte at a time, it
   always reads many bytes at once (as they spin by).  So better
   to read/write in these chunks, which we abstract as blocks
 - SSDs have different constraints, but they also do read/write operations
   on many bytes at once, so again we have blocks

The real problem is that the kernel is the only thing that *really* knows about blocks, so userspace has to guess, and its default guess is wrong generally.
 - you can't ever do operations on a per-block level from userspace,
   so it doesn't generally matter

Normally filesystems have their own block size, which must be a (power of 2 generally) multiple of the physical block size.

So if you have a physical block size of 1K, your filesystem block size could be
1K, 2K, 4K, 8K, 16K, and so on.

If you have a physical block size of 4K, your filesystem blocksize could be 4K, 8K, 16K, and so on...but NOT 1K

 - getting multiple physical blocks to get a logical block for the filesystem is no big deal, you just read multiple physical blocks and mash them together
 - but accessing fractions of a physical block...you'd be throwing away most
   of the data read, and that would be bad

So the filesystem data structure has to determine how to use blocks most efficiently to store directories and files.
 - remember that directories too must fit into blocks

Another key task we have to solve in UNIX-like filesystems is hard links
 - ALL REGULAR FILES are hard links
 - a hard link is just a filename that refers to an inode
   - the inode has the file metadata and points to file data (in other blocks)

directory contains name->inode mappings
inodes contain pointers to blocks with file data (and file metadata)
(a symbolic link is a name->name mapping)

So when you add a hard link, you are making a new directory entry
that refers to an existing inode (the one that the other name referred to)

Remember that inode numbers are filesystem specific
 - so hard links can only be used between two files on the same filesystem
 - symbolic links can span filesystems

A given Linux system has many filesystems
 - remember that a filesystem can, but doesn't have to, be based on
   physical storage provided by a hard drive/SSD
 - really, a filesystem is just a data structure for mapping files to file contents, the contents can come from anywhere

Pay attention to filesystems that filesystems that have no blocks
when you run df -a
 - how can a filesystem have no blocks?  If it is a virtual filesystem,
   one that isn't backed by storage (persistent or ephemeral)

Types of persistent filesystems you may encounter
----------------
ext4, btrfs, xfs, zfs - filesystems for Linux
ntfs, fat32, vfat - filesystems for Windows
hfs, apfs - filesystem for MacOS

But on Linux we also have pseudo filesystems
 - proc, sys, dev

No real storage, instead they are interfaces to the kernel

In T5 Part A, you're playing with files, inodes, and mmap
 - mmap is a system call that maps the contents of a file to an area of
   a process's address space
 - so the process can access that memory and the kernel will read/write
   to the file as needed
     - it will only load into memory what's accessed, rest can
       stay on disk

On a UNIX system, there are many filesystems, but we access them through one hierarchy
 - we mount a filesystem on a directory to make its contents appear there
 - we can also create a filesystem in a file, and that filesystem can be mounted