COMP 3000 Fall 2019
Assignment 3 solutions

1. [1] How can you change the inode associated with a regular file?  Specifically, what commands can you run that will result in a file that has exactly the same name as before but has a different inode?  Please give the commands and explain what each does.

  A: To change the inode for file A, you have to copy it and make the
  copy have the same name as the original.  One way to do this:

   cp -a A temp
   rm A
   ln temp A
   rm temp

  The -a flag preserves permissions and timestamps, but it is not
  necessary to make the file byte-for-byte the same.

2. [1] Does a "hole" in a UNIX file affect its logical size?  What about its physical size?  Explain briefly.

  A: A hole in a UNIX file means that part of the file (that consists
  of all zeros) is not allocated any data blocks.  So if you add holes
  to a file, you don't change its logical size, but you do reduce its
  physical size (the number of blocks it takes up on disk).

3. [1] To setup key-based login using ssh, what file do you have to create or change on the remote system?  What do you put in it?

  A: You add your public key to the authorized_keys file in the .ssh
  directory in your home directory on the remote system.

4. [2] What is the difference in behavior between "dd bs=1024 count=10" and "dd bs=2048 count=5" when run in a terminal?  Why is there this difference?  (You may want to use strace.)

  A: You can get the first to terminate after hitting enter 10 times
  (typing arbitrary things for each line) while the second terminates
  after hitting enter 5 times.  This difference happens because the
  first issues 10 read system calls while the second makes 5.  In the
  terminal, each read is terminated by hitting enter (end of line)
  because that's how buffered input works in a terminal, to allow for
  line-level basic editing (e.g. deleting characters).

5. [2] How do you recover an ext4 filesystem using a backup superblock?  Show your answer is correct by creating a filesystem, erasing its primary superblock (only), and then recovering the filesystem using a backup superblock.

  A: You can erase the primary superblock by using dd to overwrite the
     first block of the filesystem.  You can then use fsck to recover,
     giving it a backup superblock as an argument.

     Worked example below.
     
     Make a 1G filesystem as follows:

       truncate -s 1G myfs
       mkfs.ext4 myfs

     Find the primary and backup superblocks (they were also reported by mkfs):

       dumpe2fs myfs | grep superblock

     Erase the primary superblock:

       dd if=/dev/zero of=myfs conv=notrunc count=1 bs=4096

     (note the bs should be the same size as the block size reported by mkfs)

     Try mounting it:

       sudo mount myfs /mnt

     This should fail.  Then repair it with a backup superblock

       fsck.ext4 -b 32768 myfs

     You may need to say yes to some prompts.
     Try mounting it again and it should be fine!  

6. [2] How could you erase all superblocks (inculding backups) in an ext4 filesystem?  Specifically, how could you find them all, and what command could you use to erase each of them (and nothing else)?

  A: Find the superblocks:
       dumpe2fs myfs | grep superblock

     You'll get output like this:

       Primary superblock at 0, Group descriptors at 1-1
       Backup superblock at 32768, Group descriptors at 32769-32769
       Backup superblock at 98304, Group descriptors at 98305-98305
       Backup superblock at 163840, Group descriptors at 163841-163841
       Backup superblock at 229376, Group descriptors at 229377-229377

     For each of these blocks, use dd to erase them:

       dd if=/dev/zero of=myfs conv=notrunc bs=4096 count=1 seek=0
       dd if=/dev/zero of=myfs conv=notrunc bs=4096 count=1 seek=32768
       dd if=/dev/zero of=myfs conv=notrunc bs=4096 count=1 seek=98304
       dd if=/dev/zero of=myfs conv=notrunc bs=4096 count=1 seek=163840
       dd if=/dev/zero of=myfs conv=notrunc bs=4096 count=1 seek=229376
       
     You can do this as a one-line command:

       for x in `dumpe2fs myfs | grep 'superblock' | awk '{print $4}' | \
       tr -d ','`; do dd if=/dev/zero of=myfs conv=notrunc bs=4096 count=1 \
       seek=$x; done

     (The backslashes at the end mean the line continues, allows you
     to cut and paste a command broken over multiple lines.)
     
7. [2] When you specify the -a (archive) flag to rsync, it is equivalent to the flags -rlptgoD.  Why are each of these flags important to "archiving"?  What are two ways that a copy produced with just "-a" won't be an exact copy?

  A: The arguments mean the following:

       r  recursive, copy entire directories not just a specified file
       l  preserve symbolic links
       p  preserve permissions
       t  preserve timestamps
       o  preserve owner
       g  preserve group
       D  preserve device & special files

     Note this does not preserve hard links (i.e. two files that are
     hard links to the same inode will get separate inodes on the
     destination), ACLs, extended attributes, and it does not delete files or
     directories present in the destination that are not in the source.
     
8. [2] What is one advantage of using key-based logins with ssh?  What is one disadvantage?

   A: Advantages: No need to type in your password every time you
   login (if you use an SSH agent), no danger of someone
   guessing/compromising your password because you used it on another
   website.  Disadvantages: you can only login from a system which has
   your private key, if system with private key is compromised the
   remote system is compromised (assuming attacker can break any
   protecting passphrase, which is a reasonable assumption), while
   with a password you'd only be compromised if you used the
   compromised system to login (and thus revealed your password).  If
   you think of other advantages or disadvantages, you may want to
   post them to discord!

9. [2] Can you open a file if you only know its inode number?  How does this help explain how inode numbers shown in sshfs don't correspond to the inode numbers on the remote system?

   A: You can't open a file's contents if you only know the
   corresponding inode number.  UNIX and Linux provides no direct way
   of manipulating inodes; everything has to go through a pathname.
   With sshfs, the program interacting with the actual files on the
   remote system is just an ordinary ssh/sftp process.  Thus, when it
   gets requests for operations, it has to make those requests in
   terms of filenames on the remote system---but it will get those
   requests in terms of inodes.  (To see why, look up the FUSE API.)
   Thus there's no point in figuring out the inode numbers on the
   remote system; sshfs just keeps track of the filenames and
   associates them with made-up inodes.

10. [5] What trace command could you use to do each of the following?  Explain each part of the command you give.  Hint: you may want to look at the System.map for your kernel in /boot, it has the symbols defined by the current kernel.

   NOTE that the solutions below are for Ubuntu 18.04 running a 4.15.0 Linux
   kernel.  Other systems may require other parameters.

10.1. Report every time the chdir system call was made along with the directory that was changed to.

   A: trace 'sys_chdir "%s", arg1'

      sys_chdir: the chdir system call entry point
      "%s", arg1: print argument 1 as a string

10.2 Report every time a specific bash process does a write system call.

   A: trace -p <PID> sys_write
      e.g., if the process is 26631:
      trace -p 26631 sys_write

      -p <PID>:        only monitor process <PID>
      sys_write:       the entry point for the write system call

      Note this solution works for any specific process, not just bash.


10.3 Report what userspace function called the write system call to a specific bash process

   A: trace -U -p <PID> sys_write

      -p <PID>:        only monitor process <PID>
      -U:              print the userspace call stack
      sys_write:       the entry point for the write system call
      
10.4 Report every call to the clone system call

   A: trace sys_clone

      sys:clone:       the entry point for the clone system call

                       (on newer kernels such as 5.3.0, there is no
		        sys_clone symbol; instead, there is a
		        __x64_sys_clone symbol)

10.5 Report every 64-bit call to the execve system call (or every call to execve), reporting the program that was executed.

   A: trace 'sys_execve "%s", arg1'

      sys_execve:      the entry point for the execve system call

                       (on newer kernels such as 5.3.0, there is no
		        sys_execve symbol; instead there is a
			__x64_sys_execve symbol)

      "%s", arg1       print the first argument to execve as a string