COMP 3000 2020W
Assignment 4 Solutions

Tutorial 8 Questions
--------------------

1. Can a process change where data and code is stored in virtual
memory?  What about in physical memory?

   A: A process has complete control over where things are stored in
   its virtual memory.  It has no control over where its code and data
   is stored in physical memory.  Physical memory is managed by the
   kernel and parts of physical memory can be allocated and
   de-allocated at will.  A process can even find parts of its address
   space is not available; in such a situation, it will be paused when
   that memory is accessed until it is made available by the kernel.

2. If two processes mmap the same library, will that library
(necessarily) have the same virtual addresses for both processes?
What about the same physical addresses?

   A: That library will have the same physical addresses, as only one
   copy will be loaded.  However, its location in virtual memory can
   vary from one process to the next - each can map the library (which
   is just a file) to different virtual addresses.

3. What system calls does 3000memview2 use to get physical addresses?
Are any of these new (ones we haven't previously seen in class)?  Why?

   A: 3000memview2 uses ioctl system calls to interact with the
   /dev/physicalview character device.  We haven't seen this system
   call before because it is only used to interact with device files,
   regular files, devices, and symbolic links don't support ioctl
   calls.

4. Who has access to /dev/physicalview?  How do you know (from the code)?

   A: Everyone has access, because in physicalview_devnode() on line
   161 the mode for the device is set to 0666, which means that the
   file owner, group, and everyone else has read and write access to
   the file.

5. List all of the page table lookups that get_physical() does in
3000physicalview.c.  Why are there so many lookups?

   A: pgd_offset(), p4d_offset(), pud_offset(), pmd_offset(), and
   pte_offset_map() - five lookups, as some x86-64 chips implement
   5-level page tables and so the Linux kernel uses 5-level lookups
   always, with architectures that have fewer levels turning the
   missing ones into no-ops.

6. Can you do an ioctl call on regular files?  Why or why not?

   A: You cannot do ioctl calls on regular files.  They don't support
   it, and there is no good reason to support it.  The whole point of
   ioctl calls is to allow access to device-specific functionity
   defined by its device driver.  Regular files have no such
   functionality to expose.
   
7. What are the values of PAGE_SHIFT and PAGE_SIZE?  Where are they
defined?  What do they represent?

   A: They are defined in the arch/x86/include/asm/page_types.h file
   (and in other architecture-specific directories).  PAGE_SHIFT
   represents the number of bits required to represent an offset in a
   page, 12 in this case.  PAGE_SIZE is the number of bytes in a page,
   which is 2^12 or 4096 bytes.  (In the source this is encoded as
   _AC(1,UL) << PAGE_SHIFT, which is a 1 shifted to the left 12 times,
   so 2^12.  The extra stuff is so the constant works in both assembly
   language and C.)


Tutorial 9 Questions
--------------------

1. Where is FILTER_PID defined?  Where is it used?

   A: It is defined on line 20 of 3000shellwatch.py, as part of the
   string incorporating the PID passed in as an argument.  It is then
   used on line 58 of bpfprogram.c to determine whether the filter
   function returns 0 or 1.  Note that in bpfprogram.c FILTER_PID is a
   compile time constant that is passed in on the compliation command
   line using a -D flag (when called by 3000shellwatch.py).

2. How could you make 3000shellwatch.py watch for events in any
process, not just a specific one?  What events would it then report?

   A: To watch for all processes, change the filter() function in
   bpfprogram.c to return 0 always.  When you do this, you'll get the
   events for all processes on the system (including all raw system
   calls, what the user wrote messages (calls to fgets), received
   signals, and the distribution of read lengths) that would otherwise
   just be reported for 3000shell.  The number of events is quite
   large even on the class VMs as there are a number of programs
   running constantly in the background.  (You still have to pass in a
   PID on the command line, but it will be ignored.  You could change
   the required flag on line 13 of 3000shellwatch.py to be zero and
   add a default value if you wanted to not have to specify the PID
   argument.)

3. Make 3000shellwatch.py monitor all instances of 3000shell by checking a process's comm property.  Be sure to remove the PID argument.  (Hint: see bashreadline)

   A: Replace filter() in bpfprogram.c with the following:

static int filter()
{
  char comm[TASK_COMM_LEN] = {};

  bpf_get_current_comm(&comm, sizeof(comm));

  if (comm[0] == '3' &&
      comm[1] == '0' &&
      comm[2] == '0' &&
      comm[3] == '0' &&
      comm[4] == 's' &&
      comm[5] == 'h' &&
      comm[6] == 'e' &&
      comm[7] == 'l' &&
      comm[8] == 'l') {
    return 0;
  }
      
  return 1;
}

  In 3000shellwatch.py, delete lines 12-15, 20, and change line 45 to
  be something like:

    print(f'Tracing 3000shell events, ctrl-c to exit...', file=sys.stderr)

  (You could also change how probes are attached to processes, there
  are potentially multiple ways to solve this problem but this is the
  most straightforward.)


4. What code of 3000shellwatch runs in userspace?  What runs in kernel
space?

   A: 3000shellwatch.py and utils.py run from userspace (in user mode
   on the CPU).  The code in bpfprogram.c (after being compiled and
   loaded into the kernel) runs in kernel space (in supervisor mode on
   the CPU).

5. Why does 3000shellwatch require root privileges to run?  Give an
example of a small change you could make to 3000shellwatch that would
give an unprivileged user the ability to see or do something that they
normally can't.

   A: 3000shellwatch requires root privileges because it allows
   observation of any process on the system.  Just from the previous
   question, we're monitoring every process running the 3000shell
   executable.  It is trivial to make this match to any executable, or
   indeed to any process.  Thus you could observe any data being
   written or read, including passwords or confidential information.
   This is way beyond the capabilities of a regular unprivileged user.


6. How are a uprobe and a uretprobe similar?  How are they different?

   A: uprobe and uretprobe both attach probes (code that will be run
   on a specific event) to userspace functions.  uprobe sets a probe
   when a function is called, uretprobes are when functions return.

7. What is the signals dictionary used for?

   A: The signals dictionary (in utils.py) is used to translate the
   signal numbers returned by signal events, see line 34 in
   3000shellwatch.py.  The kernel just works with signal numbers,
   userspace has to associate them with names.

8. As presented in the tutorial, does 3000shellwatch have to use eBPF
to achieve its goals?  Could it instead have used ptrace?  Argue for
or against, based on the level of access you've seen gdb and strace
have to processes using the ptrace system call.

   A: The original version of 3000shellwatch could have been
   implemented with ptrace.  gdb has show access to individual
   functions (you can set breakpoints) and both gdb and strace track
   system calls.  So long as we are only interested in one process
   3000shellwatch can be implemented using ptrace; however, we can't
   (easily) use ptrace if we want to monitor more than one process.
   (It turns out ptrace can change program behavior and is not good to
   use in production.  eBPF is specifically designed to allow
   monitoring in production environments.)

9. On line 69 of bpfprogram.c, does sys_exit refer to the exit system
call?  Explain.

   A: This event refers to system call exit, not the exit system call.
   This probe is run every time a system call exits, passing that info
   to userspace if the system call isn't excluded by the filter
   function.  It also records stats of every read system call (again
   that isn't filtered).