Operating Systems 2021F Lecture 16: Difference between revisions
Created page with "==Video== Video from the lecture given on November 11, 2021 is now available: * [https://homeostasis.scs.carleton.ca/~soma/os-2021f/lectures/comp3000-2021f-lec16-20211111.m4v..." |
(No difference)
|
Latest revision as of 17:02, 11 November 2021
Video
Video from the lecture given on November 11, 2021 is now available:
Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)
Notes
Lecture 16
----------
- interviews
- there will be more, just figuring my schedule
for next week
- yes, final will replace midterm if you do better
- A3 will be posted by tomorrow, will be going over it a bit today
Today: filesystems
Filesystem
- collection of files & directories
- namespace for inode numbers
- way to transform a block device into a place
you can store files
- data structure that is accessed using the file API
From the kernel's perspective
- has file system call interface
(open, read, write, etc)
(also for directories)
- when it gets a pathname, where does it look to get the data?
- it figures out which filesystem is responsible for the
containing directory
- it then asks that filesystem to do the file operations
- this abstraction is knows as the "VFS" (virtual filesystem) layer
/proc/filesystems lists all the different kinds of filesystems your Linux system knows about currently
- the ones with "nodev" beside them are ones that have no corresponding storage device
- means that while you access them with file-related system calls,
what you're getting back isn't from a storage device, it
is something else
- known as pseudo filesystems it seems?
df: dump filesystems
- show currently mounted filesystems
mount: add a filesystem to our current file hierarchy
- mountpoint: where those files should go,
normally an empty directory
(if not empty, those files will be hidden)
df .
- tells me the filesystem responsible for the current directory
df -a
- show ALL the filesystems
- including ones with no corresponding device
(pseudo filesystems)
Pseudo filesystems like /proc tend to have some weird info that give them away
- inode numbers are weird
- file sizes make no sense (are often zero)
- timestamps aren't consistent
The above is all true because those fields are just made up when you access the files, they aren't "stored" anywhere
- file metadata for pseudo filesystems isn't that significant
mount
- get full information on mounted filesystems (with no args)
- or you can use it to add filesystems to file hierarchy
mounting a real filesystem
- have access to data on a block device
mounting a pseudo filesystem
- have access to some new capability, typically kernel data structures
or runtime data that doesn't need to be stored on disk
- essentially all data is in RAM or generated algorithmically
- just depends on the filesystem type
/run
- it is a "tmpfs" - temporary filesystem
- data is temporary
- no corresponding block device
- it is a "RAM disk" - full filesystem, but
*data is lost when system reboots*
- used for PID files and lock files mainly
- PIDs will change when system is rebooted
- locks shouldn't be held across reboot
- note here "tmpfs" is the filesystem type,
/run is the mountpoint
Notice that /tmp is NOT in tmpfs
- it is in the filesystem of the root filesystem, normally ext4
- BUT files in /tmp are erased on every reboot
- but that happens due to a boot time script
If you boot with a live CD
- data in /run would already be lost
- data in /tmp should still be there
/var/tmp is like /tmp, except it is NOT erased when rebooted
In your VM, the root filesystem is mounted as follows:
/dev/mapper/vg0-lv--0 on / type ext4 (rw,relatime)
the "ext4" means that it is of type ext4.
So, why isn't /tmp a tmpfs?
- historic/distribution reasons (I know some distributions
that do put /tmp/ on tmpfs)
- can store a lot more in /tmp normally than you could
in /tmpfs (as its storage is limited by RAM/virtual memory)
(I'll discuss lock files later)
Different operating systems have their own native filesystems
- MSDOS: fat, vfat, fat32
- Windows: NTFS
- MacOS: HFS, HFS+, APFS
- FreeBSD: UFS, zfs
- IRIX: xfs
- Linux: ext2, ext3, *ext4*, btrfs, squashfs
LOTS of filesystem types
These are all regular filesystems used for regular disks,
originally developed for magnetic hard drives, not SSDs
- except APFS?
Why so many?
- some support different file sizes
- different performance characteristics
- reliability/durability
- licensing
- stubbornness/Not Invented Here
Key difference
- some are designed for UNIX-like systems (POSIX compliant)
- others are not!
- POSIX-compliant ones use inodes, others generally do not
Remember a filesystem is just a data structure
- so the filesystem type is the kind of data structure
Note that some filesystems have specialized purposes
- squashfs is designed to be compressed and read only
Why would you want a read only filesystem?
- storage medium is read only (e.g. optical media)
iso9660, etc
- for starting up the system <--- we'll get to this
Look up the youtube channel "technology connections", whole
series on optical media
So now let's make and use a filesystem
What do we need?
- a block device
What will we get?
- files stored on the block device
Challenge
- we don't have any devices we can physically connect to our VM
Workaround
- we'll make a file that will behave like a block device
To make an empty file, use dd, e.g.
dd if=/dev/zero of=fakeblks bs=4096 count=100000
if: input file
of: output file
bs: block size
count: count
So reads count blocks of size bs from input file to output file
- note it does exactly one read system call and one write system call
for each block transferred
- we read count*blocksize from if
- we write count*blocksize to of
(Note we are reading from /dev/zero, so we are reading from an infinite source of zero bytes)
Note that you can't just use touch
- it will just make a file of size zero
You could use truncate
- but we'll get to that
To look at the contents of a binary file, you can use od
- with the -a option, translates each byte to its corresponding
named character
If we run od on the file before and after running mkfs.ext4,
we can see what bytes were modified in the file
The kernel has many possibilities in determining how to service a given file operation request
- if it is for a file on a regular filesystem,
it uses that filesystem's code to interpret data
read from or written to the mounted block device
- if it is for a pseudo filesystem,
it runs the code in the kernel that implements the file
operations
- they can do essentially anything
- if it is for a character device
- runs the code for the character device
/dev/zero is a character device that always returns null bytes
- specifically, when you do a "read" system call, it
fills the buffer given to it with all zero bytes
- will do this as many times as asked
There's also /dev/urandom and /dev/random
- infinite random bytes
- but /dev/random is VERY slow
because it tries to return "real" random bytes
A filesystem is a data structure. So, we need ways to
- create/initialize the data structure: mkfs
- validate and ideally repair the data structure: fsck
Why bother validating and repairing filesystem data structures?
- we don't normally do that for hash tables or binary trees?
- but they only stick around for short periods of time
- if they get messed up, we just restart the program
But filesystems are meant to last for years, and hardware & software can fail over that time
- cosmic rays
- manufacturing defects
- code bugs...
So filesystems are designed to be repairable
- should be able to recover overall structure
- ideally, preserve what data you can when things are damaged
fsck tries its best to repair filesystems
- but it relies on how the filesystem is structured
in order to do its work
Do pseudo filesystems have fsck?
- no, because it wouldn't make sense,
no persistent state to fix
note that fsck is specific to each filesystem
- every data structure needs its own specialized repair mechanisms
What allows filesystems to be repaired?
- lots of redundancy!
"pointers" in filesystems are always bidirectional
- so if one is missing we can recover it
- (like a doubly linked list)
Some data tells us how the rest of the data is organized
- this is the "superblock"
(think of it as the root node of a tree)
- store multiple copies of the superblock because
if this is lost we lose everything
Remember filesystems are data structures for organizing blocks
- a block is fixed sized, nowadays generally 4K, but some small
power of 2
- locations in filesystems are in terms of block numbers,
not addresses
e.g., block 2000, not address 265101561
We always access the data structure by reading or writing entire
ranges of blocks, not ranges of bytes