COMP 3000 2022F Assignment 4 Solutions

1. [1] Download, build, and install the c3000procreport module.  What
   is reported in the kernel log when the module is installed?

A: You should see something like this:

Nov 25 00:28:38 comp3000 kernel: [128237.983052] c3000procreport:
  loading out-of-tree module taints kernel.
Nov 25 00:28:38 comp3000 kernel: [128237.983264] c3000procreport:
  module verification failed: signature and/or required key missing -
  tainting kernel
Nov 25 00:28:38 comp3000 kernel: [128237.990627] comp3000: procreport
  device registered using major 236.

The first two messages report that we are loading an unsigned kernel
module (while out of tree technically means that the module isn't part
of the main Linux kernel source tree, what it actually means is that
it isn't signed by the same key the kernel has been signed with.)

The third line actually comes from the module, specifically from the
info() call at the end of procreport_init().

(You just need to report on the log messages, you don't need an
explanation.)


2. [2] When you run <tt>cat /dev/procreport</tt>, what does it report?
   Give sample output and explain what each part means briefly.

A: You should get an output like this:

Your PID is 5580!
Buffer at 0x00007f252854e000 virtual is at 0x00000001165d6000 physical.

The first line says the process ID of the proces that accessed the
device.  In this case the process was running cat.

The second line gives the virtual and physical address of the buffer
passed to procreport_read().  In other words, the first value is buf
as it was passed to procreport_read, and the second is this value
translated to a physical address by procreport_get_physical().


3. [2] Run <tt>bpftrace watch_procreport.bt</tt> in one terminal and
   then run <tt>dd if=/dev/procreport bs=1024 count=1</tt>.  What does
   the bpftrace script report?  Why are we getting this output?
   Explain what you see.

A:  You should get output like this:

Attaching 6 probes...
open called.
read called.
get_physical called.
read returned 90.
release called.

The first line is from bpftrace, with the 6 referring to the six
probes defined in watch_procreport.bt.  Then:

  open called -> procreport_open() was called
    - dd opened /dev/procreport
  read called -> procreport_read() was called
    - dd read from /dev/procreport
  get_physical called -> procreport_get_physical() was called
    - because this is the first read, offset is 0, so we
      do the main logic of proreport_read() which calls this function
  read returned 90 -> procreport_read() returns 90
    - procreport_read returns the number of characters in the message
      string that was copied to the buffer
  release called -> procreport_release() was called
    - we only told dd to read once, so /dev/procreport was close after
      the first read

(Note that running cat /dev/procreport will be similar except read
will be called a second time, returning 0.)


4. [2] If you run the command <tt>dd if=/dev/procreport seek=1 bs=1024
   count=1</tt>, what happens?  Does <tt>bpftrace
   watch_procreport.bt</tt> give you any insight as to what is
   happening?

A: When we run this, the command hangs and never returns (until
interrupted).  When we run the bpftrace script we see the following
output:

open called.

But that's it.  We don't see any read.  This makes it clear that
procreport_read isn't being called.  However, if we run strace, we'll
see that after the open dd is attempting to llseek; however, since our
driver doesn't support llseek, dd hangs.

Now if we look at strace, however, we see that the situation is more
complex.  We get the following system calls:

openat(AT_FDCWD, "/dev/procreport", O_RDONLY) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
... (calls accessing locale files)
ioctl(1, MTIOCGET, 0x7ffd9dffdaa0)      = -1 ENOTTY (Inappropriate ioctl for device)
lseek(1, 1024, SEEK_CUR)                = -1 ESPIPE (Illegal seek)
ioctl(1, MTIOCGET, 0x7ffd9dffdaa0)      = -1 ENOTTY (Inappropriate ioctl for device)
lseek(1, 0, SEEK_END)                   = -1 ESPIPE (Illegal seek)
read(1,

So dd is trying to do lseek's and ioctl's on the device, getting
errors, and then finally tries to do a read.  But somehow
procreport_read() is never being called.  Thus somehow the lseek's and
ioctl's have messed things up enough that the read system call never
gets to the driver's read function, instead hanging.

(Now personally I had thought the read would have been called with a
non-zero offset and would then immediately return.  But that clearly
didn't happen.  This *might* be a bug, but I have no idea.  Operating
systems are complicated!)


5. [2] What is the purpose of line 127, a call to put_user()?  Why
   can't this line be replaced with a simple array assignment?

A: This code copies the message to buf safely.  We have to use
put_user() rather than just doing a simple array assignment because
the kernel can't just write to userspace without taking special
actions.

In fact, if we replace this line with the following two lines:

                buf[i] = message[i];
                i++;

Any process doing a read on the device is instantly killed and we get
a big kernel error report that starts with lines like the following:

Nov 25 22:41:34 comp3000 kernel: [208214.553244] BUG: unable to handle page faul
t for address: 00007f9ebe267000
Nov 25 22:41:34 comp3000 kernel: [208214.553268] #PF: supervisor write access in
 kernel mode
Nov 25 22:41:34 comp3000 kernel: [208214.553277] #PF: error_code(0x0003) - permi
ssions violation
Nov 25 22:41:34 comp3000 kernel: [208214.553285] PGD 80000000208e6067 P4D 800000
00208e6067 PUD 2243e067 PMD 225ff067 PTE 8000000027fb2867
Nov 25 22:41:34 comp3000 kernel: [208214.553324] Oops: 0003 [#1] SMP PTI
Nov 25 22:41:34 comp3000 kernel: [208214.553338] CPU: 1 PID: 6450 Comm: cat Tain
ted: G           OE     5.15.0-53-generic #59-Ubuntu
Nov 25 22:41:34 comp3000 kernel: [208214.553350] Hardware name: OpenStack Founda
tion OpenStack Nova, BIOS 1.10.2-1ubuntu1 04/01/2014
Nov 25 22:41:34 comp3000 kernel: [208214.553356] RIP: 0010:procreport_read+0xd3/
0x110 [c3000procreport]

Notice this refers to our procreport_read and subsequently even
includes the machine code of the function, backtrace, and the state of
CPU registers.

So yeah, we need to call put_user() or similar before accessing a
userspace buffer from kernel space.


6. [1] What is the purpose of line 111, a call to put_user()?  Remove
   this line and report how the behavior of the module changes.

A: On its own, this line seems to do nothing, as it is writing to the
buffer before we have any valid data, and the data written, the
character '0', is later overwritten.  However, if we omit this line,
the physical address is always reported as 0, so this line is
apparently essential.  We also get an error in the kernel logs saying
"Invalid pte for address 0x00007f0c15a16000".

(I hypothesize that this line is necessary to update the kernel's
virtual to physical mappings for the process.  Before the buffer has
been accessed, this virtual address corresponds to no physical address
at all, at least in the kernel's current page table.  But once it is
actually accessed, then the virtual to physical mappings are updated
and so procreport_get_physical() can return something useful.)


7. [2] What is the purpose of the calls to get_zeroed_page() (line
   104) and free_page() (line 132)?  What happens if either is
   missing?

A: The purpose of these functions is to allocate and then free kernel
memory for message.  Without the allocation we don't know what message
will be pointing to; without free'ing, we have a memory leak.

Note that these routines allocate exactly one page of memory.  Single
page allocations are efficient in the kernel because that's the unit
by which most memory is managed.  (There are other allocators for
smaller amounts of memory, but they are specialized to repeated
allocations of specific types of data structures.)

If we just remove the call to get_zero_page(), we'll get an error from
cat saying "/dev/procreport: Cannot allocate memory".

If we remove the call to get_zero_page() and the subsequent check for
NULL, then when we read from /dev/procreport the process is killed and
we get a kernel Oops message saying "BUG: kernel NULL pointer
dereference, address: 0000000000000000" along with a whole bunch of
debugging info.

(So it appears that message is initialized to NULL automatically by
the kernel even though it is allocated on the stack.)

If we omit the free_page() call we don't see anything obvious change,
but that is to be expected as we've introduced a small memory leak.
However, this can add up.  If you look at /proc/meminfo it reports
"MemFree".  This value keeps decreasing if you keep accessing
/dev/procreport:

mem1:MemFree:         2182788 kB
mem2:MemFree:         2175684 kB
mem3:MemFree:         2171716 kB
mem4:MemFree:         2158392 kB
mem5:MemFree:         2163604 kB

From the first to the last, we've lost almost 18 megabytes of memory.
Between each of these checks there were 1000 "cat /proc/meminfo"'s
run.  A page here or there isn't much but it can add up!  And we'll
only get this memory back after we reboot.


8. [4] Change c3000procreport so that it reports not just the calling
   process but also lists all of the process's ancestors until there
   are no more to report.  You should support up to 10 levels of
   ancestor processes.  Each report should be on a line saying "PID's
   parent is PID", where the PID's are process IDs.

A: First, add the following declarations to the top of
procreport_read() (around line 99) so we have a variable to hold the
parent's pid and both the current and parent task_struct's:

        pid_t parent_pid;
        struct task_struct *this_process, *parent_process;

We then add the following code between lines 119 and 120 (just after
the call to strlen):

        this_process = current;
	for (i = 0; i < 10; i++) {
                rcu_read_lock();
		parent_process = rcu_dereference(this_process->real_parent);
                parent_pid = task_tgid_vnr(parent_process);
                rcu_read_unlock();

		msglen += snprintf(message + msglen, PAGE_SIZE - msglen,
                                   "%d's parent is %d.\n",
                                   thepid, parent_pid);

		thepid = parent_pid;
                this_process = parent_process;

                if (thepid == 1) {
			break;
                }
        }

The code for getting the parent PID is from the code for the getppid
system call in kernel/sys.c.

Note this code should be inserted before the msglen++ as that makes
sure the terminating null character gets copied.


9. [4] Change c3000procreport so that it reports the address of all
   the intermediate lookups in when determining the physical address
   corresponding to the read buffer's virtual
   * Make sure you do so by changing how procreport_get_physical()
     works (rather than just expanding procreport_read()).  How did
     you change the interface to procreport_get_physical()?
   * Are the addresses being reported virtual or physical addresses?
     How do you know?

A: First, we change the declaration of procreport_get_physical() so it
takes a pointer to an array of unsigned long values.  This is where
we'll return the lookup path:

  unsigned long procreport_get_physical(unsigned long addr,
                                        unsigned long *address_lookup_path)

Next we define a counter:

  int i = 0;

And then after each _offset call, we add the value to the path and
increment i.  We terminate the array with a zero value, just to make
accessing it easy and so we don't have to fix how many levels of
lookup there are for the caller.  (We assume the maximum is 10.)  For
example, after assigning and checking pgd on lines 49-53, we'd add
this:

  address_lookup_path[i++] = (unsigned long) pgd;

Adding this throughout the function, we get the following modified
version of procreport_get_physical:

unsigned long procreport_get_physical(unsigned long addr,
                                      unsigned long *address_lookup_path)
{
        pgd_t *pgd;
        p4d_t *p4d;
        pud_t *pud;
        pmd_t *pmd;
        pte_t *pte;

        unsigned long pfn = 0;
        unsigned long phys = 0;

        int i = 0;

        pgd = pgd_offset(current->mm, addr);
        if (!pgd || pgd_none(*pgd) || pgd_bad(*pgd)) {
                err("Invalid pgd for address 0x%016lx\n", addr);
                return phys;
        }
        address_lookup_path[i++] = (unsigned long) pgd;

        p4d = p4d_offset(pgd, addr);
        if (!p4d || p4d_none(*p4d) || p4d_bad(*p4d)) {
                err("Invalid p4d for address 0x%016lx\n", addr);
                return phys;
        }
        address_lookup_path[i++] = (unsigned long) p4d;

        pud = pud_offset(p4d, addr);
        if (!pud || pud_none(*pud) || pud_bad(*pud)) {
                err("Invalid pud for address 0x%016lx\n", addr);
		return phys;
        }
        address_lookup_path[i++] = (unsigned long) pud;

        pmd = pmd_offset(pud, addr);
        if (!pmd || pmd_none(*pmd) || pmd_bad(*pmd)) {
		err("Invalid pmd for address 0x%016lx\n", addr);
		return phys;
        }
        address_lookup_path[i++] = (unsigned long) pmd;

        pte = pte_offset_map(pmd, addr);
	if (!pte || pte_none(*pte)) {
                err("Invalid pte for address 0x%016lx\n", addr);
                return phys;
        }
 	address_lookup_path[i++] = (unsigned long) pte;
        pfn = pte_pfn(*pte);
	phys = (pfn << PAGE_SHIFT) + (addr % PAGE_SIZE);

        address_lookup_path[i] = 0;

        return phys;
}

Now, to use this in procreport_read(), we first define at the top the array:

        unsigned long address_lookup_path[10];

We add this array to the call to procreport_get_physical:

        buf_phys = procreport_get_physical((unsigned long) buf,
                                           address_lookup_path);

And then, after the call to strlen, we add the following:

	msglen += snprintf(message + msglen, PAGE_SIZE - msglen,
                           "Address Lookup Path: ");
        for (i = 0; i < 10; i++) {
                if (address_lookup_path[i] == 0) {
	                break;
		}
                msglen += snprintf(message + msglen, PAGE_SIZE - msglen,
                                   "0x%016lx ", address_lookup_path[i]);
 	}

        msglen += snprintf(message + msglen, PAGE_SIZE - msglen, "\n");

This will printout a line that looks something like the following:

Address Lookup Path: 0xffff9df68ad707f0 0xffff9df68ad707f0
0xffff9df681287620 0xffff9df6844d06c0 0xffff9df684408690

These addresses must be virtual, simply because our virtual machine
only has 4G of RAM which would have at most 32 bits of physical
addresses and this has a full 64 bits, with the high bits mostly being
set.