COMP 3000 2022F Assignment 3 Solutions

1. [2] Download and inspect
   [https://homeostasis.scs.carleton.ca/~soma/os-2022f/code/3000contain.sh
   3000contain.sh].  Is there a risk of data loss from running this
   script?  Specifically, how much of a risk is there from each rm
   command?  Be specific.

A: As it is, there is no real risk of data loss from the rm
commands. There are three rm commands, we examine each below.

The first, on line 24, deletes the image file which the script itself
created.  There can only be data loss if something important was
stored in the image file or if the user had created another file
called 3000fsimage and placed it in the current directory.

The second, on line 31, only runs if 3000fs exists and is a directory.
If the user of the script placed anything in 3000fs before we mounted
the image there it could be deleted by this command; otherwise, this
rm just removes an empty directory that was created by a previous run
of the script.

The third, on line 64, deletes 3000setupfs.sh, a file that is created
by this script.  Again, unless the user created a file with the same
name, there's no risk of data loss.


2. [1] Run 3000contain.sh.  After 3000contain.sh runs, you're put in a
   new shell where / is now the contents of 3000fs, and you can't see
   anything that wasn't in 3000fs.  Exiting the shell gets you back to
   where you were.  After exiting, how do you get back to the
   contained environment?

A: In the 3000fs directory, run the command

   unshare --root=. -f -p --mount-proc

This will get you back into the contained environment.  (Well,
technically, this makes a new confined environment using the same
filesystem as before.  To get back into the "same" environment we'd
needt to have run a diferent command that saved the sharing context so
it could be given to this command as an argument.)


3. [2] How does the output of ps differ when run inside the contained
   environment versus outside?  What part of 3000contain.sh caused
   this difference?

A: Inside the contained environment, ps just shows the processes of
that environment, nothing else.  So you'll see bash and ps if you run
ps, with bash being PID 1!  This is caused by the unshare command.
The -p option ushares the PID space and --mount-proc mounts /proc with
the pid restrictions.


4. [2] What does line 58 of 3000contain.sh do?  When does it run?  Be
   sure to explain all of its effects.

A: Line 58 is the following:

echo '/usr/bin/busybox --install' >> $SETUP

On its own, it just appends this line to the end of $SETUP, which is 3000setupfs.sh.  This line and the lines around it generate a shell script.  This script is then run on on line 63 via a chroot command.  By running it under chroot, it will run with / being the 3000fs directory, the directory that contains the mounted filesystem in 3000fsimage.

This line causes busybox to run and create the hard links to all the commands it supports.  Before this command, the new filesystem just has a few files.  After this command, it has all the basic files needed for a little Linux environment, including ls, ps, vi, and many more.  (They are all minimal versions, but they work!)


5. [2] What is the largest file we can create in the confined
   environment (once initialized by 3000contain.sh)?  What determines
   this limit?

A: The largest file that can be created immediately after running the
script is 425783296 bytes (approximately 406MiB), assuming we are
running as root.  This size is determined first by the dd command,
which creates a file of 491520000 bytes (8192 block size * 60000
blocks, 468.75 MiB).  We then lose space to filesystem overhead, with
us ending up with only 422.7 MiB total.  Of this we lose 7.3 MiB for
files.  422.7 - 7.3 = 415.4.  We have about 9 MiB unaccounted for, but
remember that file metadata is significant for larger files, so this
is likely due to how ext4 is implemented.

(For full credit, you just have to get the space roughly right by
doing an experiment and then 1. note how space is initially reserved
with dd and 2. how we lose space to filesystem overhead.)


6. [2] If you fill up the disk in the host system, how will it change
   the amount of data that can be stored in the confined environment?
   Does this depend on what has been previously stored in the confined
   environment?

A: Filling up space in the host filesystem can reduce the space in the
confined environment...if it is done before the confined environment
filesystem is fully allocated.  When the filesystem is created in
3000fsimage, it de-allocates blocks that should contain all zero
bytes.  These blocks will need to be allocated as the filesystem fills
up, but if the host's filesystem is full these blocks can't be
allocated.

But, interestingly enough, you can sometimes still create files in the
confined system.  But instead of storing the contents, you'll just get
null blocks (because they can't be allocated).  This failure is in
effect a "storage medium" failure and so doesn't show up as a normal
errors; instead, you'll see errors in the kernel log about I/O errors
when writing to /dev/loop3 or similar.  In effect we have a virtual
disk with bad blocks!

But, if we had previously created a large file in the confined
environment, all the necessary blocks will have been allocated to
3000fsimage, and so then we can fill up the host disk without any
consequences for the confined environment.

(And yes, you can avoid this issue by adding the -E nodiscard option
to mkfs.ext4, as then it will fully allocate the filesystem at
filesystem creation and won't throw away null blocks.)


7. [2] Many files in our confined environment refer to the same inode.
   What was the original name of this inode?  How do you know?

A: The original name is /usr/bin/busybox, as this is the program that
created all of the other hard links to this file by running it with
the --install option (added to the script on line 58, run on line 63).
We can tell this by removing lines selectively from the SETUP script
(lines 57-60) and noticing that when 58 is removed, the confined
filesystem has hardly any files but busybox is there, but when it is
in there, we have lots of hard links (all to busybox).


8. [1] Copy and make nano work in the new environment.  What files did
   you have to copy to get it to work?  How did you know to copy them?

A: You have to copy /lib/x86_64-linux-gnu/libncursesw.so.6 to /lib in
the confined environment.  You can see this by copying nano into the
environment and trying to run it, it reports that it can't find this
file.  You could also use the command

  sudo cp `ldd /usr/bin/nano | awk '{print $3}'` 3000fs

but this will copy all of the dependencies, and only libncursesw.so.6
is new.

(Note this answer is from Fall 2021.)


9. [2] How can you add a user "contain" to 3000fs using useradd (and
   nothing else)?  Make sure the user also is in a new group "contain"
   and has a home directory /home/contain (in 3000fs).  This user
   should only be visible when you're in the confined environment.
   How did you confirm that your answer works?

A: The command:

  useradd -m -U -R `pwd` contain

added to 3000contain.sh, run before the end (tested to run on line 52
after copying the passwd and related files) does what is required.
The -m option makes the home directory, -U creates the group named the
same as the user, -R `pwd` does a chroot into the current directory,
and contain is the user & group to be created.

I confirmed it by adding this line to the script, running it to get
into the confined environment, and then checking that I could run "su
- contain" to become the user and checked that I had a home directory
with the right permissions.  (If you set a password for contain you
could also use login to log in as the user contain.)


10. [2] How can you mount the main root filesystem inside of the
    confined environment?  What part of 3000contain.sh made this possible?

A: First, in the main root filesystem run df to find out its device.
You'll get output something like this:

  student@comp3000:~$ df .
  Filesystem            1K-blocks    Used Available Use% Mounted on
  /dev/mapper/vg0-lv--0   8187320 4471888   3279824  58% /

In the chroot'd environment we can then mount it as follows:

  bash-5.1# mkdir /mainfs
  bash-5.1# mount /dev/mapper/vg0-lv--0 /mainfs

After this, /mainfs will contain all of the system's files.

(A weird consequence of this is how Linux avoids duplicate vies of the
confined files.  For example, if we ran 3000makefs.sh in
/home/student/Documents/A3, we'd see the following when just logging
in:

  student@comp3000:~$ ls /home/student/Documents/A3
  3000fs  3000fsimage  3000makefs.sh
  student@comp3000:~$ ls /home/student/Documents/A3/3000fs
  bin  etc   lib    linuxrc     mainfs  root  sbin  tmp  var
  dev  home  lib64  lost+found  proc    run   sys   usr

However, if we look at the same paths after running chroot, things
look a bit different than we might expect:

  bash-5.1# ls /
  bin         home        linuxrc     proc        sbin        usr
  dev         lib         lost+found  root        sys         var
  etc         lib64       mainfs      run         tmp
  bash-5.1# ls /mainfs/home/student/Documents/A3/3000fs
  bash-5.1# ls /mainfs/home/student/Documents/A3
  3000fs         3000fsimage    3000makefs.sh

Note how 3000fs appears to be empty while A3 shows the files we would
expect.)

This is all made possible by line 60, the mounting of /dev in the
setup script.  Note that this command doesn't have to be run in the
script, it also works after the unshare command, and in fact there
isn't a straightforward way to prevent this with just the unshare
command, we need other security mechanisms to prevent this.

(Much of this answer, but not the last part, comes from last year's
solutions.)


11. [2] How can you change the hostname in the confined environment to
    "mycontainer" without changing the hostname of the host system?
    (Note that the "hostname" command can be used to check and set a
    system's hostname.)  Is this change persistent, i.e., will the
    hostname stay the same when you exit and re-enter the confined
    environment?

A: We just have to add the option -u to the unshare command to unshare the UTS namespace, which is what holds the hostname.  So we change the last line of the script to be

  unshare --root=. -f -p --mount-proc -u

After this, running "hostname contain" won't change the hostname of the host.

This change is not persistent, as it just changes some kernel state.
Hostnames are only persistent because startup scripts set it on boot
from a file (generally /etc/hostname).  To make it persistent we would
have change our script to set it every time we create the container.