Operating Systems 2020W Lecture 8

From Soma-notes

Video

The video from the lecture given on January 31, 2020 is now available.

In-Class Notes

Lecture 8
---------
Topics for today
 - ssh
 - usernames, groups, uid, gid
 - process id's: uid, euid, fsuid
 - file permissions: read, write, execute
 - directory permissions
 - setuid, setgid
 - /etc/passwd, /etc/group, /etc/shadow
 - login process

different kinds of user ID's associated with a process:
 - uid: the original user ID
 - euid: "effective" uid, one used for determining privileges
         (e.g., what processes you can terminate)
 - fsuid: the uid used for accessing files

fsuid is mostly used by file server programs, e.g. NFS, samba (CIFS on Linux)
 - we'll ignore it from here on

Normally euid=uid, but with setuid...euid becomes the uid of the executable
file, while the uid stays the same.
 - so euid generally becomes 0 (root) while the uid stays that of a regular
   process

When a process does a system call, the kernel has to decide whether that system
call is authorized
 - normally it checks the process's euid (and normally euid=uid) to see if this
   is allowed
 - if euid=0, almost everything is allowed (as this is root's user ID)
 - regular users, you have to check the file permissions (for file access) or
   the uid of the process (for sending signals)

to the operating system, there are only uid's and gid's, no usernames or groups
 - we have files defining the mapping of uid<->usernames, gid<->groups

User accounts are in /etc/passwd
Groups are in /etc/group

On most modern systems, there are NO PASSWORDS in /etc/passwd
 - there used to be
 - but having passwords in a file that is readable by everyone
   on the system is a bit of a security risk
 - nowadays, passwords are stored in /etc/shadow (a file that
   has restricted access)

It is a pain to manually edit /etc/shadow and /etc/passwd and keep things
in sync.  So, if you want to do edits...
 - "shadowconfig off"
 - edit /etc/passwd, /etc/group
 - "shadowconfig on"

No limit to the number of groups a user can be in, not sure about the
limit for how many users a group can have
 - all done by lookups in userspace so depends on the utilities, not
   the kernel

How are passwords stored in /etc/shadow?
 - the $6$ prefix means they are encoded using SHA-512 (a variant of SHA-2)
 - the DES variant of original crypt is horribly insecure
 - and actually SHA-512 isn't really that great, web apps normally use
   other functions

You should have access to the "crypt(3)" man page, but it wasn't
installed on my desktop for some reason

The point of a secure hash:
 - easy to compute the hash
 - hard (infeasible) to find something that hashes to a given value

So technically, you don't have to guess someone's password, you just
have to guess a string that has the same SHA-512 hash as their password
 - in practice this is the same thing


If you stole password hashes (stole /etc/shadow or /etc/passwd with
password hashes stored in it), you can do an offline attack to guess
passwords
 - you just guess possible passwords, compute their hashes, and compare
   with the hashes in the password database
 - "salt" is added (a known string prefix) to each password so that
   two people with the same password don't have the same hash, e.g.

     alice has "banana" as password with salt "tacos"
        hash is of "tacosbanana"
     bob   has "banana" as password with salt "pizza"
        hash is of "pizzabanana"

   alice and bob don't have to remember their salt, it is generated
   automatically and stuck in the password file (appended to the hash
   string), so it isn't any more secret than the hash

NOTE THAT PASSWORD GUESSING PROGRAMS ARE VERY GOOD NOW
 - look up "john the ripper"
 - good at guessing common substitutions, e.g. numbers for letters
 - knows many dictionaries
 - and modern computers, with GPUs, can do a ridiculous number of guesses
 - and there are rainbow tables (precomputed tables of all possible password
   to hash mappings for up to like 12 characters)

This means that if a password hash database is compromised, assume the attackers will crack all the passwords
 - so you hash, but try to keep it private

note that no hash function has been proved to be hard to reverse,
we just think they are...until they aren't
 - cryptographers keep breaking old hash functions
 - so always make your applications so they can change
 
note that secure hash functions are designed so a one bit change in input changes half the bits of the output on average
 - so small changes should lead to big hash differences


So once you type your password, the login process (or ssh or however you got in) has to:
 - create a child process
 - have it set up the environment for the new user
 - change the uid, gid to the new user
 - execve the user's shell/startup program

To perform administrative tasks as a regular user that have to modify
files owned by someone else (generally root)
 - the program binary should be setuid root (or setuid the user who
   has access to the file, or setgid to the group who has access)

Note that secure hashes are often used as unique identifiers
 - even though technically they aren't unique
 - e.g., certificates, change sets in git
 - when you look, secure hashes are everywhere


But now let's talk about connecting to your VMs without passwords

To do this, we identify ourselves using public key cryptography
 - instead of a password, we generate a public/private key pair
 - we are identified by the pubic key
 - we prove we are who we are by using the private key to answer
   challenges generated using the public key

Better than passwords because if you tell anyone your password they know
your password (and thus can impersonate you)
 - note that the process on a system that accepts a password always
   sees its plaintext.  You don't store it on disk (you just store the hash)
   but if the process is compromised the attacker can get the actual password
 - with public key crypto, the remote system NEVER has the private key,
   so there is nothing to steal that is confidential


Whenever someone talks about digital signatures, code signing, certificates,
TLS/SSL - they are talking about technologies built on public key cryptography

For example, let's talk about SSH
 - can identify users using a password, but better
   to identify using a public-private key pair
 - generate keypair on local machine using ssh-keygen
 - copy public key to remote system
    - locally it is id_rsa.pub or similar
    - on remote system, add it to authorized_keys (can have multiple
      keys, one per line)
    - all files are in .ssh (note this directory should only be
      accessible by the user)

authorized_keys stores public keys associated with a user
  (on remote system)

known_hosts stores public keys associated with remote hosts
  (on local system)

With ssh, remote hosts are identified by public/private key pairs
   OPTIONALLY, users can be identified by public/private key pairs

this is the same with SSL/TLS
   - but almost nobody identifies users with public/private key pairs
     (i.e., certificates, not passwords)
   - if you use a secure token, though, this may be possible

I expect you to know how to set up passwordless connections using ssh
   - because that is a super common use case in the cloud
the rest I'm not going to ask about

And I expect you to understand accounts, uid, gid, etc
  - just not most of the crypto