COMP3000 Operating Systems F23: Tutorial 5

From Soma-notes

In this tutorial you will be learning about files and filesystems by experimenting with and extending 3000test.c, and creating and manipulating local filesystems. WARNING: Several of the commands here can lead to system corruption and data loss if not properly used. You have been warned. Please use a VM and make backups, when necessary.

Files and inodes

In UNIX/Linux filesystems, a filename does not directly refer to the contents of a file. Instead, a filename refers to an inode (as specified in a directory entry --- dentry, of its parent directory), and the inode then refers to the data. An inode is where file metadata is stored. File ownership, timestamps, and permissions are all stored in a file's inode. Regular files are just hard links connecting a file name to an inode.

In addition to regular files, we also have symbolic links, directories, block devices, character devices, and pipes. Each is its own type of inode.

Note that while you can find out a file's inode, you cannot go from an inode to a pathname or otherwise manipulate an inode directly from userspace - you always have to go through the pathname. The kernel, however, can access individual inodes directly (indeed it has to in order to get to the contents of a file when given a filename).

3000test.c uses stat() to give information on a given inode.

Tasks/Questions (Part A)

  1. Compile and run 3000test.c. It takes a filename as an argument and reports information on the file. Try giving it the following and see what it reports:
    • a regular text file that exists
    • a directory
    • a symbolic link
    • a device file (character or block)
  2. Change 3000test to use lstat() rather than stat. How does its behavior change?
  3. Modify 3000test so when it is given a symbolic link it reports the name of the target. Use readlink(2) (this notation describes how to access the man page, e.g., man 2 readlink).
  4. Are there files or directories that you cannot run 3000test on? Can you configure file/directory permissions so as to make something inaccessible to 3000test? Note that this is twofold: 1) whether it is completely inaccessible to 3000test (nothing can be displayed, except the error msg); 2) whether it can be accessed for the search (no access to file content).

Creating, Mounting, and Unmounting Filesystem

In UNIX/Linux systems we always have a single hierarchical namespace for files and directories. Pathnames that start with a / are absolute, starting at the top of the namespace, while ones that do not start with a / are relative to the current directory.

The files in this single namespace come from many sources, each of which can have very different semantics. These sources are known as filesystems.

When first started, a system must have at least one filesystem. This first filesystem is known as the root filesystem and has the contents of /. We can then add other filesystems to this initial set of files by mounting each filesystem on an existing directory.

On Ubuntu 22.04, the root filesystem is ext4 by default and many others are mounted on top of it. For example, a proc filesystem is mounted on /proc, a sysfs is mounted on /sys. If you run "mount" with no arguments, it lists all the currently mounted filesystems. Note that most of these are virtual filesystems, in that they do not correspond to any persistent storage (like a disk).

If you insert a USB stick on a laptop or desktop running Linux, it must be mounted before you can access its contents. The current convention is for it to be mounted in /media/<mounting user>/<name of USB stick filesystem>. This usually happens automatically nowadays when the USB stick is plugged in.

To create an ext4 filesystem, you just run

mkfs.ext4 <writable block device>  #Do not try it with any /dev/*

If you were to run this on the root filesystem of your current system, it would reformat your entire disk if it was allowed to complete. In all likelihood the system should prevent you from doing this as the root filesystem (and any mounted filesystem) gets special protections.

To mount a filesystem, you specify the filesystem to be mounted (by its type or by the device) and the mountpoint:

mount /dev/sdb1 /mnt  #Do not try it unless you have that block device and want to do it

This would mount the first partition of the second disk on /mnt.

Filesystems can be stored on block devices - devices whose contents are accessed by specifying a block index/address. The other main type of device file, character devices, are used for devices where the data is accessed a single byte (character) at a time. Terminals, modems, printers – for example, will generally be represented as a character device.

Note that mount here can identify the filesystem on /dev/sdb1 by looking at its superblock. The superblock is usually either in block 0 or 1 of a device. The superblock holds metadata of a filesystem, allowing it to be identified and specified. Without the superblock you cannot do anything with a filesystem. Because of its importance, there are backup superblocks in addition to the primary one. The blocks of a filesystem can be classified as either being superblocks, inode blocks, or data blocks.

Sometimes we want to play with a new filesystem but we don't have a physical device to format. If we have free space for a file in our current filesystem, however, we can put a filesystem in a file by using a loopback block device. A loopback block device is a block device where its data is stored in a regular file on another filesystem (rather than on a separate device). If you do filesystem commands with a regular file, it will transparently associate the file with an available loopback block device (e.g., /dev/loop1).

Note that inode numbers are filesystem-specific. Thus, the contents of a file are uniquely specified by its filesystem and inode. Only the kernel has to worry about this level of detail; in userspace, a file's full path contains all the necessary information.

Filesystems on persistent storage are always at risk of corruption. File system checker programs (fsck) can detect and repair filesystem errors.

Tasks/Questions (Part B)

  1. Run dd if=/dev/zero of=foo bs=8192 count=32K and use the ls command to check it. What is the logical size of the file foo? How much space does it consume on disk? (Hint: Look at the size option to ls). In comparison, create another file with dd if=/dev/zero of=foo2 bs=8192 seek=31K count=1K. Any observations?
  2. Run mkfs.ext4 foo. Does foo consume any more space or less? Do the same (mkfs.ext4 foo2) to the other file and answer the same question.
  3. Create any file in /mnt (e.g., sudo touch test.txt) and run sudo mount foo /mnt. Do you still see the file you just created?
  4. Run df. What device is mounted on /mnt? What is this device?
  5. Run sudo umount /mnt. What have gone away and what is back?
  6. Run dd if=/dev/zero of=foo conv=notrunc count=10 bs=512. How does the "conv=notrunc" change dd's behavior (versus the command in question 1)?
  7. Run sudo mount foo /mnt. What error do you get?
  8. What command can you run to make foo mountable again? What characteristic of the file system enables this command to work? (Hint: revisit the instructions of this tutorial if you have no idea)