COMP 3000 2022F Assignment 4 Solutions 1. [1] Download, build, and install the c3000procreport module. What is reported in the kernel log when the module is installed? A: You should see something like this: Nov 25 00:28:38 comp3000 kernel: [128237.983052] c3000procreport: loading out-of-tree module taints kernel. Nov 25 00:28:38 comp3000 kernel: [128237.983264] c3000procreport: module verification failed: signature and/or required key missing - tainting kernel Nov 25 00:28:38 comp3000 kernel: [128237.990627] comp3000: procreport device registered using major 236. The first two messages report that we are loading an unsigned kernel module (while out of tree technically means that the module isn't part of the main Linux kernel source tree, what it actually means is that it isn't signed by the same key the kernel has been signed with.) The third line actually comes from the module, specifically from the info() call at the end of procreport_init(). (You just need to report on the log messages, you don't need an explanation.) 2. [2] When you run cat /dev/procreport, what does it report? Give sample output and explain what each part means briefly. A: You should get an output like this: Your PID is 5580! Buffer at 0x00007f252854e000 virtual is at 0x00000001165d6000 physical. The first line says the process ID of the proces that accessed the device. In this case the process was running cat. The second line gives the virtual and physical address of the buffer passed to procreport_read(). In other words, the first value is buf as it was passed to procreport_read, and the second is this value translated to a physical address by procreport_get_physical(). 3. [2] Run bpftrace watch_procreport.bt in one terminal and then run dd if=/dev/procreport bs=1024 count=1. What does the bpftrace script report? Why are we getting this output? Explain what you see. A: You should get output like this: Attaching 6 probes... open called. read called. get_physical called. read returned 90. release called. The first line is from bpftrace, with the 6 referring to the six probes defined in watch_procreport.bt. Then: open called -> procreport_open() was called - dd opened /dev/procreport read called -> procreport_read() was called - dd read from /dev/procreport get_physical called -> procreport_get_physical() was called - because this is the first read, offset is 0, so we do the main logic of proreport_read() which calls this function read returned 90 -> procreport_read() returns 90 - procreport_read returns the number of characters in the message string that was copied to the buffer release called -> procreport_release() was called - we only told dd to read once, so /dev/procreport was close after the first read (Note that running cat /dev/procreport will be similar except read will be called a second time, returning 0.) 4. [2] If you run the command dd if=/dev/procreport seek=1 bs=1024 count=1, what happens? Does bpftrace watch_procreport.bt give you any insight as to what is happening? A: When we run this, the command hangs and never returns (until interrupted). When we run the bpftrace script we see the following output: open called. But that's it. We don't see any read. This makes it clear that procreport_read isn't being called. However, if we run strace, we'll see that after the open dd is attempting to llseek; however, since our driver doesn't support llseek, dd hangs. Now if we look at strace, however, we see that the situation is more complex. We get the following system calls: openat(AT_FDCWD, "/dev/procreport", O_RDONLY) = 3 dup2(3, 0) = 0 close(3) = 0 lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) ... (calls accessing locale files) ioctl(1, MTIOCGET, 0x7ffd9dffdaa0) = -1 ENOTTY (Inappropriate ioctl for device) lseek(1, 1024, SEEK_CUR) = -1 ESPIPE (Illegal seek) ioctl(1, MTIOCGET, 0x7ffd9dffdaa0) = -1 ENOTTY (Inappropriate ioctl for device) lseek(1, 0, SEEK_END) = -1 ESPIPE (Illegal seek) read(1, So dd is trying to do lseek's and ioctl's on the device, getting errors, and then finally tries to do a read. But somehow procreport_read() is never being called. Thus somehow the lseek's and ioctl's have messed things up enough that the read system call never gets to the driver's read function, instead hanging. (Now personally I had thought the read would have been called with a non-zero offset and would then immediately return. But that clearly didn't happen. This *might* be a bug, but I have no idea. Operating systems are complicated!) 5. [2] What is the purpose of line 127, a call to put_user()? Why can't this line be replaced with a simple array assignment? A: This code copies the message to buf safely. We have to use put_user() rather than just doing a simple array assignment because the kernel can't just write to userspace without taking special actions. In fact, if we replace this line with the following two lines: buf[i] = message[i]; i++; Any process doing a read on the device is instantly killed and we get a big kernel error report that starts with lines like the following: Nov 25 22:41:34 comp3000 kernel: [208214.553244] BUG: unable to handle page faul t for address: 00007f9ebe267000 Nov 25 22:41:34 comp3000 kernel: [208214.553268] #PF: supervisor write access in kernel mode Nov 25 22:41:34 comp3000 kernel: [208214.553277] #PF: error_code(0x0003) - permi ssions violation Nov 25 22:41:34 comp3000 kernel: [208214.553285] PGD 80000000208e6067 P4D 800000 00208e6067 PUD 2243e067 PMD 225ff067 PTE 8000000027fb2867 Nov 25 22:41:34 comp3000 kernel: [208214.553324] Oops: 0003 [#1] SMP PTI Nov 25 22:41:34 comp3000 kernel: [208214.553338] CPU: 1 PID: 6450 Comm: cat Tain ted: G OE 5.15.0-53-generic #59-Ubuntu Nov 25 22:41:34 comp3000 kernel: [208214.553350] Hardware name: OpenStack Founda tion OpenStack Nova, BIOS 1.10.2-1ubuntu1 04/01/2014 Nov 25 22:41:34 comp3000 kernel: [208214.553356] RIP: 0010:procreport_read+0xd3/ 0x110 [c3000procreport] Notice this refers to our procreport_read and subsequently even includes the machine code of the function, backtrace, and the state of CPU registers. So yeah, we need to call put_user() or similar before accessing a userspace buffer from kernel space. 6. [1] What is the purpose of line 111, a call to put_user()? Remove this line and report how the behavior of the module changes. A: On its own, this line seems to do nothing, as it is writing to the buffer before we have any valid data, and the data written, the character '0', is later overwritten. However, if we omit this line, the physical address is always reported as 0, so this line is apparently essential. We also get an error in the kernel logs saying "Invalid pte for address 0x00007f0c15a16000". (I hypothesize that this line is necessary to update the kernel's virtual to physical mappings for the process. Before the buffer has been accessed, this virtual address corresponds to no physical address at all, at least in the kernel's current page table. But once it is actually accessed, then the virtual to physical mappings are updated and so procreport_get_physical() can return something useful.) 7. [2] What is the purpose of the calls to get_zeroed_page() (line 104) and free_page() (line 132)? What happens if either is missing? A: The purpose of these functions is to allocate and then free kernel memory for message. Without the allocation we don't know what message will be pointing to; without free'ing, we have a memory leak. Note that these routines allocate exactly one page of memory. Single page allocations are efficient in the kernel because that's the unit by which most memory is managed. (There are other allocators for smaller amounts of memory, but they are specialized to repeated allocations of specific types of data structures.) If we just remove the call to get_zero_page(), we'll get an error from cat saying "/dev/procreport: Cannot allocate memory". If we remove the call to get_zero_page() and the subsequent check for NULL, then when we read from /dev/procreport the process is killed and we get a kernel Oops message saying "BUG: kernel NULL pointer dereference, address: 0000000000000000" along with a whole bunch of debugging info. (So it appears that message is initialized to NULL automatically by the kernel even though it is allocated on the stack.) If we omit the free_page() call we don't see anything obvious change, but that is to be expected as we've introduced a small memory leak. However, this can add up. If you look at /proc/meminfo it reports "MemFree". This value keeps decreasing if you keep accessing /dev/procreport: mem1:MemFree: 2182788 kB mem2:MemFree: 2175684 kB mem3:MemFree: 2171716 kB mem4:MemFree: 2158392 kB mem5:MemFree: 2163604 kB From the first to the last, we've lost almost 18 megabytes of memory. Between each of these checks there were 1000 "cat /proc/meminfo"'s run. A page here or there isn't much but it can add up! And we'll only get this memory back after we reboot. 8. [4] Change c3000procreport so that it reports not just the calling process but also lists all of the process's ancestors until there are no more to report. You should support up to 10 levels of ancestor processes. Each report should be on a line saying "PID's parent is PID", where the PID's are process IDs. A: First, add the following declarations to the top of procreport_read() (around line 99) so we have a variable to hold the parent's pid and both the current and parent task_struct's: pid_t parent_pid; struct task_struct *this_process, *parent_process; We then add the following code between lines 119 and 120 (just after the call to strlen): this_process = current; for (i = 0; i < 10; i++) { rcu_read_lock(); parent_process = rcu_dereference(this_process->real_parent); parent_pid = task_tgid_vnr(parent_process); rcu_read_unlock(); msglen += snprintf(message + msglen, PAGE_SIZE - msglen, "%d's parent is %d.\n", thepid, parent_pid); thepid = parent_pid; this_process = parent_process; if (thepid == 1) { break; } } The code for getting the parent PID is from the code for the getppid system call in kernel/sys.c. Note this code should be inserted before the msglen++ as that makes sure the terminating null character gets copied. 9. [4] Change c3000procreport so that it reports the address of all the intermediate lookups in when determining the physical address corresponding to the read buffer's virtual * Make sure you do so by changing how procreport_get_physical() works (rather than just expanding procreport_read()). How did you change the interface to procreport_get_physical()? * Are the addresses being reported virtual or physical addresses? How do you know? A: First, we change the declaration of procreport_get_physical() so it takes a pointer to an array of unsigned long values. This is where we'll return the lookup path: unsigned long procreport_get_physical(unsigned long addr, unsigned long *address_lookup_path) Next we define a counter: int i = 0; And then after each _offset call, we add the value to the path and increment i. We terminate the array with a zero value, just to make accessing it easy and so we don't have to fix how many levels of lookup there are for the caller. (We assume the maximum is 10.) For example, after assigning and checking pgd on lines 49-53, we'd add this: address_lookup_path[i++] = (unsigned long) pgd; Adding this throughout the function, we get the following modified version of procreport_get_physical: unsigned long procreport_get_physical(unsigned long addr, unsigned long *address_lookup_path) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd; pte_t *pte; unsigned long pfn = 0; unsigned long phys = 0; int i = 0; pgd = pgd_offset(current->mm, addr); if (!pgd || pgd_none(*pgd) || pgd_bad(*pgd)) { err("Invalid pgd for address 0x%016lx\n", addr); return phys; } address_lookup_path[i++] = (unsigned long) pgd; p4d = p4d_offset(pgd, addr); if (!p4d || p4d_none(*p4d) || p4d_bad(*p4d)) { err("Invalid p4d for address 0x%016lx\n", addr); return phys; } address_lookup_path[i++] = (unsigned long) p4d; pud = pud_offset(p4d, addr); if (!pud || pud_none(*pud) || pud_bad(*pud)) { err("Invalid pud for address 0x%016lx\n", addr); return phys; } address_lookup_path[i++] = (unsigned long) pud; pmd = pmd_offset(pud, addr); if (!pmd || pmd_none(*pmd) || pmd_bad(*pmd)) { err("Invalid pmd for address 0x%016lx\n", addr); return phys; } address_lookup_path[i++] = (unsigned long) pmd; pte = pte_offset_map(pmd, addr); if (!pte || pte_none(*pte)) { err("Invalid pte for address 0x%016lx\n", addr); return phys; } address_lookup_path[i++] = (unsigned long) pte; pfn = pte_pfn(*pte); phys = (pfn << PAGE_SHIFT) + (addr % PAGE_SIZE); address_lookup_path[i] = 0; return phys; } Now, to use this in procreport_read(), we first define at the top the array: unsigned long address_lookup_path[10]; We add this array to the call to procreport_get_physical: buf_phys = procreport_get_physical((unsigned long) buf, address_lookup_path); And then, after the call to strlen, we add the following: msglen += snprintf(message + msglen, PAGE_SIZE - msglen, "Address Lookup Path: "); for (i = 0; i < 10; i++) { if (address_lookup_path[i] == 0) { break; } msglen += snprintf(message + msglen, PAGE_SIZE - msglen, "0x%016lx ", address_lookup_path[i]); } msglen += snprintf(message + msglen, PAGE_SIZE - msglen, "\n"); This will printout a line that looks something like the following: Address Lookup Path: 0xffff9df68ad707f0 0xffff9df68ad707f0 0xffff9df681287620 0xffff9df6844d06c0 0xffff9df684408690 These addresses must be virtual, simply because our virtual machine only has 4G of RAM which would have at most 32 bits of physical addresses and this has a full 64 bits, with the high bits mostly being set.