Operating Systems 2021F Lecture 13

From Soma-notes

Video

Video from the lecture given on November 2, 2021 is now available:

Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)

Notes

Lecture 13
----------
* Grading
  - A2 and the midterm are basically graded
  - will combine and post by Thursday
  - Thursday we'll also go over midterm solutions
  - if your Tutorials 1-4 haven't been graded, please talk to your TA,
    it is probably an oversight
  - if you do better on the final, the final grade will replace
    the midterm (as the final is cumulative)
* Interviews
  - will post schedule by Thursday
  - I will ask some of you to sign up, otherwise you may volunteer
  - if you want me to check questions that you think have been misgraded,
    please sign up for an interview (unless it is pretty brief)
  - I will also be doing interviews for the final
  - you can't lose marks, but I could report you to the Dean
* Tutorial 5

By the way I'm very behind on messages & email, but will be catching up, will update on Thursday

Generally the hardest thing in this tutorial is getting passwordless login to work for SSH
 - either you have a conceptual issue or you missed something small
 - working => you got both right
 - if you have to type in a passphrase to unlock your key, it is still "passwordless"
    - the remote machine never sees the password
    - you can get rid of the passphrase if you wish

To restart the system, you can either do "sudo reboot" or
  "sudo shutdown -r now"
  - but you shouldn't do that for chsh
  - just log out and log back in, or log in separately
  - if I'm running a backup that I want to finish before shutting down, I'll
    do "sudo shutdown -h +15", so the system will "halt" in 15 minutes.

The only time when you must reboot is when the kernel gets upgraded
 - other reboots for updates can be bypassed, but sometimes it is a pain
   to restart all the services that need to be restarted
 - I should note there are ways to patch the kernel without rebooting
   but it only works for some updates
 - if you can't login with the new user, try "sudo service sshd restart"
    - that will restart sshd
    - check syntax?

When you do setuid before setgid, the setgid should fail because you no longer have higher privileges
 - if it doesn't fail, you haven't dropped privileges yet

Windows has to reboot all the time because processes by default lock files that are in use
 - and services are using files all the time

UNIX systems do not lock files that are in use.
  - we'll discuss this more very shortly

process uid's
 - there's actually a uid and an euid
 - uid: who "owns" the process
   - they can kill it
 - euid: what privileges the process has
 - normally uid=euid
    - then euid isn't shown
 - but when a binary is setuid, its euid is set to the uid of the file
    - the uid of the process remains unchanged

Setuid binaries are used to give controlled access to privileged resources
 - i.e., passwd & chsh allow changes to /etc/passwd, /etc/shadow
   that are owned by root
 - execve checks for the setuid bit on the binary being loaded,
   if it is set it changes the euid of the process to the uid of the file
   (otherwise it leaves it unchanged)
 - if there is a vulnerability in a setuid root binary, then an attacker
   can get full root access through it
     - that's why setuid binaries are shown in red, they are dangerous
     
Same thing for gid and egid
 - when you use groups to manage access

It is time to talk more about files & filesystems

In UNIX like systems, there are the following kinds of files
 - regular files
 - directories (d)
 - symbolic links (s)
 - character devices (c)
    - unbuffered character devices (u), not often used
 - block devices (b)
 - pipes (p)  <--- mkpipe

If it isn't a file, directory, or symbolic link, it is a "special file"
 - we'll cover special files later
 - can make them with mknod

Note that these aren't really file "types"
 - regular files just have extensions, e.g., .txt, but they are optional
 - applications can use them or ignore them, the OS doesn't care

What is a filename?
 - it is a key in a hierarchical key/value store
   - the file is the key associated with a value
 - but it is *not* just for storing data
 - counter example: /proc
   - if you do cat /proc/cpuinfo, you aren't getting data from disk,
     you're asking the kernel what CPUs it has
 - "linear array of bytes", yes...but those bytes don't have to
   come from fixed storage, they can be generated on the fly

In UNIX-like systems, files are kind of universal
 - used for all kinds of things other than storing data
 - uniform API (open, read, write, etc.) that can be applied to
   almost anything

Cool thing about a universal API is it allows for programs to be connected and
to work together without a-priori agreeing on a way to interact
 - they are just interacting with "files"

Not everything is a file in UNIX, but lots of things are
 - the designers of UNIX created Plan 9 later, and there everything
   really is a file (never caught on, but that's where /proc came from)
 - network stuff is mostly not in files (folks at Berkeley made that choice)
   - has a special API, can't just use open/read/write
   
a-priori => what you know in advance

In UNIX there is one namespace for files
 - all files must exist in /...
 - no drive letters, etc

namespace
 - all values in it "mean" the same thing

in a namespace, two things can't have the same name
 - each name is unique

if you have multiple name spaces, you can have the "same" name in different namespaces
 - can differentiate by noting the namespace each are in

But we can put whatever we want into that file hierarchy
 - normally we put different kinds of things in different directories
   - /proc, /dev, and /usr are very different
     (talk to the kernel, special files, and system files)
   - if you plug in a USB stick, it will probably show up in /media/<username>
   
When I say hierarchy, I mean tree
 - think family tree, or trees from 2402

How do we combine different kinds of files all together 

Our file hierarchy is actually showing files from many different sources
 - each source is called a "filesystem"
 - we add filesystems to the file hierarchy by "mounting" the filesystem

We can see all the filesystems by just running "mount"
 - filesystems with storage can be seen with "df" (dump filesystems)

Inside a filesystem, all the files are "the same" (will explain this in a bit)
 - look at the devices associated with each filesystem, they vary
 - some correspond to actual storage devices, but many don't

What actually happens when you open and access a file depends upon the filesystem it is part of
 - in fact, the kernel virtualizes the file interface, allows
   each filesystem to implement their own versions of file operations (open,
   read, etc)

When you add a user, they don't get a filesystem
 - they just get a home directory
 - by default in /home nowadays, but can be just about anywhere

When filesystems are mounted, the files in the filesystem go under a
mount point
 - a mount point is just an empty directory
 - mounting logically replaces the empty directory with files that
   are in the new filesystem
 - any directory can be a mount point for a new filesystem
   - if it isn't empty, its files will be inaccessible until the new filesystem is unmounted
      - you've taken away their names

In Tutorial 6 we'll be playing with filesystems

If you want to read the textbook's treatment of filesystems, it will help
 - the textbook authors are experts in filesystems

a key concept in UNIX filesystems: inodes

filenames refer to inodes
inodes refer to data

inodes are a level of indirection between filenames and file data
 - we want it because in UNIX, an inode can have multiple file names,
   and this is core to how UNIX works

(indirection - like a pointer to a pointer)
virtual functions in C++, that's indirection
pointers are a kind of indirection

When you stat a file, you're actually getting back info on the file's inode