Operating Systems 2020W Lecture 10

Video

The video from the lecture given on February 7, 2020 is now available.
Notes

Lecture 10
----------
Topics:
 - VM issues
 - signals and sleep
 - filesystems
 - mount
 - dd
 - superblocks, inode blocks, data blocks
 - fsck
 - sshfs

On the VMs, the root filesystem is on /dev/mapper/COMPbase--vg-root,
this is an LVM volume.  LVM is the Logical Volume Manager
 - allows you to combine multiple disks and partitions into one
   virtual device
 - really not needed for a simple VM, but ubuntu defaults to using one

We want to play with filesystems
 - but we don't have external devices we can add easily
   (at least on openstack currently)
 - so we want to make filesystems in files
   - treat a file like a block device
 - normally you don't do this, but it can be useful
   - this is how virtual machine disks are implmented

To use a new filesystem:
 - prepare the block device (plug in USB stick, or make an empty file)
 - create a filesystem on the block device
 - mount the new filesystem on a chosen mount point (an empty directory)

The dd command:
 - allows you to copy from an input file to an output file
 - key feature is it allows you to specify the number of "blocks"
   to copy and how big they should be
 - a block to dd is just the size of the buffer to use when reading
   and writing
 - so if bs=4096 and count=10, then dd will issue 10 reads and 10 writes,
   each reading or writing with a 4096 byte buffer (asking to read 4096 bytes
   and writing up to 4096 bytes depending on how many were read)

Why can't we just do "cp /dev/zero newfile" rather than
  dd if=/dev/zero of=newfile bs=4096 count=10000

We can't use cp because you can't copy a character device.  It will potentially
provide an infinite amount of data.  So we use dd to control how much
we read and how those reads are done.

Yes there are other ways to make a file with zeros, but dd is generally useful
  - can fill a file with random bytes by reading from /dev/urandom
  - can manipulate arbitrary portions of large files (copy, erase)
  - very useful for directly reading from and writing to actual devices
    (hard disks, etc)

(We will be making our own character devices later in this term)

Remember that device files have their own implementation of read,
write, etc.  They can do anything, including return random bytes or
zeros (or digits of pi).

mount: add filesystem to filesystem hierarchy
"mount fs dir" means make the files in filesystem fs appear under directory dir

dir was an empty directory before, but after this command all the files in fs
will appear in it

Example:
 - if you insert a USB stick on an ubuntu system, you'll see its files
   appear in /media/<user>/<device name>

mass storage devices have "less" storage than advertised because they
advertise storage in base-10 units but we use base-2 in practice
e.g., difference between 1000 (10^3) versus 1024 (2^10).  Difference
gets really big when we talk about gigabytes and up!

(But we also lose some space due to filesystem overhead, by default ext4
reduces available space by 5% so it always has some extra space to play with.  The root user can use this reserved space and you can change it.)

When we treat a file as a block device, we do this by associating it with
a "loopback" block device, e.g. /dev/loop0.  You don't have to play with this
device directly, normally it is allocated for us automatically (but
shows up when we do df).

one key reason we make filesystems is to provide isolation
 - when you fill up a USB stick you don't fill up your main filesystem
 - similarly, when you fill up a virtual disk, it won't fill up the rest
   of the system, it will just use up the maximum space the virtual disk could

Imagine running a program so it only had access to the files in a special
virtual filesystem
 - it literally couldn't see anything else, let alone mess with anything else
 - snaps, containers are built around this idea

You can manipulate the parameters of a filesystem with utilities
 - for ext4, you can use e2label (to give/change its name) or other
   parameters using tune2fs (e.g., how much space is reserved for the root user)

In a UNIX filesystem, the blocks are divided between three types
 - data blocks (contents of files)
 - inode blocks (file metadata)
    - directory blocks are normally a kind of inode block
 - superblocks

Superblocks have metadata about whole filesystems (rather than individual files)
 - what type of filesystem (ext4? vfat?)
 - what are the parameters of the fs (how big?  block size?  how many inodes?)
 - if you lose the superblock you lose the filesystem
   - which is why you normally have backup superblocks

Normally a filesystem will take up an entire device (so mkfs.ext4 asks
the kernel how big the device is and adjusts accordingly).  We control
it for virtual filesystems by controlling how big of a file we
allocate (with dd or similar).

losing a superblock means erase or corrupt.  It isn't quite delete
because you can't add or remove blocks from a device once created (how do you "grow" a USB stick?)

Note that filesystems are complex data structures.  If you look up a
filesystem's superblock and low-level format, expect to find way more
info than you might otherwise expect!

Note that a filesystem's blocksize might not match the block size of
the underlying device
 - but normally it will be a multiple of it, e.g.
   the device has a block size of 1024 but the filesystem's block size is
   4096
 - need to be able to easily translate between device blocks and
   filesystem blocks

If you delete the primary superblock (which is either block 0 or 1 of most filesystems), you'll make the filesystem unmountable.
 - but you can recover by repairing the filesystem using a backup superblock
 - fsck can normally do it, but you may have to tell it where to find
   the backup superblock

Superblocks don't change often (and what changes is not too important), so they don't have to be update all the time.  They are there for disasters

Filesystems should be recoverable, a feature that isn't needed of most data
structures
 - the data structures you've studied, how well do they deal with
   corrupted or deleted pointers?  generally not well!
 - filesystems are tolerant of such problems because storage can go bad

Note that tools like fsck *are not* data recovery tools
 - they try to save the filesystem, not files
 - they can delete files in order to restore integrity to a filesystem

If a filesystem is corrupted, *never* write to it.  Instead,
 - get your data off the filesystem
 - THEN try to repair it

A great tool for copying data off a failing disk is dd
 - can get a raw filesystem image that can later be examined using
   advanced tools
 - there are fancy versions of dd that can get data even when
   reads fail on parts of the disk

Most of the time nowadays, if a device has errors, the device should
be thrown away (recycled)
 - there are low-level errors all the time on modern storage devices
 - these errors are hidden by the device controllers
 - when they can no longer hide the errors, that means there are too many
   to hide, so the device is going bad fast -
   GET YOUR DATA OUT ASAP and replace!


Professional data recovery people can recover data when drives have been damaged in all kinds of ways
 - they charge for this
 - and they have real limits
 - HAVE BACKUPS

When you run fsck, it may find inodes that are allocated but have no
hard links to them
 - file contents without a filename

fsck will gives these inodes a name, e.g. for inode 5200 it will create
/lost+found/#5200

If you find files in lost and found, it means fsck put them there
 - which means your filesystem was corrupted, but maybe is okay now?
 - hope you had backups!

A filesystem are a set of files that are accessed using common code, with
all the files under a mountpoint
 - most commonly on a device (e.g. hard drive)
 - but can be virtual (/proc)
 - and can be remote (nfs, samba/cifs, ceph, sshfs)