Operating Systems 2020W: Tutorial 8
Introduction
WARNING: The commands and programs in this tutorial are potentially extremely dangerous and may result in crashes or loss of data. Additionally, questions may not work as expected on a machine other than the course VM. For that reason, you are strongly encouraged to do this tutorial on the provided OpenStack virtual machine.
In this tutorial we will be examining the physical memory mapping of processes with the help of a kernel module that performs a 5-level page table walk for userspace addresses. You may wish to read about 5-level paging.
To get started, we will first examine the source code for 3000physicalview.c and 3000memview2.c, both of which are in 3000physicalview.tar.gz.
Getting Started
- Compile 3000physicalview and 3000memview2 using the provided Makefile (i.e. by running make).
- Insert 3000physicalview by running make insert. Confirm that the module is inserted using lsmod.
- Examine the call to copy_from_user and copy_to_user on lines 120 and 132 of 3000physicalview.c. Consider the following:
- How are these functions different from put_user that we have seen in the previous tutorial?
- Why are these functions necessary? Couldn't we just access the userspace address directly? What would happen if we did?
- 3000physicalview exposes its API to userspace in the form of an ioctl(2) call. Consider the following:
- What is an ioctl? How is it different from a read or write system call? Hint: check man 2 ioctl.
- How does 3000physicalview implement its ioctl? What arguments does it take?
- How does 3000memview2 call the ioctl? What arguments does it pass to the ioctl?
Examining Physical Memory Mappings
- With 3000physicalview inserted, run 3000memview2 and examine the output. Note that it presents virtual memory addresses on the left, and physical addresses on the right. Are these mappings consistent with what you expected?
- Compare 3000memview2 with 3000memview from Tutorial 2. What is similar about their code, and what is different? How similar is their output?
- Do you notice a pattern in the virtual addresses of buf[i]? Is this same pattern present in the physical addresses? Why or why not?
- Run 3000memview2 a few more times and consider the following:
- Are the virtual addresses the same or different between runs? How about physical addresses?
- Some physical addresses don't seem to be changing between runs. Which ones? Why do you think this might be the case?
- Spawn a root shell with sudo su and force the kernel to drop the virtual memory cache using sync && echo 3 > /proc/sys/vm/drop_caches. Run 3000memview2 one more time and note that the physical addresses that stayed the same previously have now changed. What do you think just happened?
Revisiting Trace
- Modify 3000memview2.c by adding a call to sleep(10); at the beginning of main. This will give you a chance to run trace attached to its pid. Compile with the Makefile as before. For the following questions, run your new 3000memview2 in one terminal, and the trace command in another.
- Run trace -p `pidof 3000memview2` -K 'p::_copy_to_user' to get the kernel stack trace every time the module invokes copy_to_user. How does this differ from the stack trace for put_user from last tutorial?
- Run trace -p `pidof 3000memview2` -K 'p::_copy_from_user' to get the kernel stack trace every time the module invokes copy_from_user. Compare this stack trace with that of the previous question.
- Now let's trace various kernel memory allocations outside of our module. Run the following trace commands:
- trace -M 100 -K 't:kmem:kmalloc printf "allocated %d bytes at address 0x%llx" args->bytes_alloc, args->ptr' to trace the next 100 slab allocations and print the kernel stack. You may wish to pipe this output into less to read it more easily. What do you notice about the kernel's virtual addresses compared to what you have seen in userspace? Hint: Check the most significant hex digits.
- trace -M 100 -K 't:kmem:mm_page_alloc printf "allocted 2^%d pages at page frame number %lu" args->order, args->pfn' to trace the next 100 page allocations and print the kernel stack. You may wish to pipe this output into less to read it more easily. Based on what you can see, does page allocation seem to differ from slab allocation? How so?
Code
3000memview2.c
/* 3000memview2.c Userland demonstration of 3000physicalview ioctl interface
* Copyright (C) 2020 William Findlay
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>. */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define USERSPACE
#include "3000physicalview.h"
char *gmsg = "Global Message";
const int buffer_size = 30;
void report_memory(char *prefix, int fd, unsigned long virt)
{
struct physicalview_memory mem = {};
mem.virt = virt;
if (ioctl(fd, PHYSICALVIEW_WALK, (unsigned long)&mem))
{
fprintf(stderr, "Error making ioctl call\n");
return;
}
if (!mem.phys)
{
printf("%s 0x%016lx -> UNKNOWN\n", prefix, mem.virt);
return;
}
printf("%s 0x%016lx -> 0x%016lx\n", prefix, mem.virt, mem.phys);
}
int main(int argc, char *argv[], char *envp[])
{
char format[16];
char *lmsg = "Local Message";
char *buf[buffer_size];
int i;
int fd = open("/dev/3000physicalview", O_RDONLY);
printf("Memory map report (virtual -> physical)\n");
report_memory("argv: ", fd, (unsigned long)argv);
report_memory("argv[0]: ", fd, (unsigned long)argv[0]);
report_memory("envp: ", fd, (unsigned long)envp);
report_memory("envp[0]: ", fd, (unsigned long)envp[0]);
report_memory("lmsg: ", fd, (unsigned long)lmsg);
report_memory("&lmsg: ", fd, (unsigned long)&lmsg);
report_memory("gmsg: ", fd, (unsigned long)gmsg);
report_memory("&gmsg: ", fd, (unsigned long)&gmsg);
report_memory("main: ", fd, (unsigned long)&main);
report_memory("&buf: ", fd, (unsigned long)&buf);
for (i = 0; i<buffer_size; i++) {
buf[i] = (char *) malloc(4096);
snprintf(format, 16, "buf[%02d]: ", i);
report_memory(format, fd, (unsigned long)buf[i]);
}
return 0;
}
3000physicalview.c
/* 3000physicalview.c A kernel module to expose virtual->physical memory mappings in userspace as an ioctl
* Copyright (C) 2020 William Findlay
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Warning: This module is extremely insecure.
* It is designed purely for teaching purposes.
* Using it is stupid unless you are in COMP3000. */
#include "3000physicalview.h"
static struct device *device = NULL;
static struct class *class = NULL;
static int major_number;
/* Helper functions below this line ---------------- */
/* Walk the page table of current task for virtual address addr */
static unsigned long get_physical(unsigned long addr)
{
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
unsigned long pfn = 0;
unsigned long phys = 0;
/* Find pgd */
pgd = pgd_offset(current->mm, addr);
if (!pgd || pgd_none(*pgd) || pgd_bad(*pgd))
{
printk(KERN_ERR "Invalid pgd for address 0x%016lx\n", addr);
return phys;
}
/* Find p4d */
p4d = p4d_offset(pgd, addr);
if (!p4d || p4d_none(*p4d) || p4d_bad(*p4d))
{
printk(KERN_ERR "Invalid p4d for address 0x%016lx\n", addr);
return phys;
}
/* Find pud */
pud = pud_offset(p4d, addr);
if (!pud || pud_none(*pud) || pud_bad(*pud))
{
printk(KERN_ERR "Invalid pud for address 0x%016lx\n", addr);
return phys;
}
/* Find pmd */
pmd = pmd_offset(pud, addr);
if (!pmd || pmd_none(*pmd) || pmd_bad(*pmd))
{
printk(KERN_ERR "Invalid pmd for address 0x%016lx\n", addr);
return phys;
}
/* Find pte */
pte = pte_offset_map(pmd, addr);
if (!pte || pte_none(*pte))
{
printk(KERN_ERR "Invalid pte for address 0x%016lx\n", addr);
return phys;
}
/* Get physical address of page table entry */
pfn = pte->pte & PTE_PFN_MASK;
phys = (pfn << PAGE_SHIFT) + (addr % PAGE_SIZE);
return phys;
}
/* Define file operations below this line ------------------- */
/* Callback to device open */
static int physicalview_open(struct inode * inode, struct file * file)
{
printk(KERN_INFO "3000physicalview open\n");
return 0;
}
/* Callback to device close */
static int physicalview_release(struct inode * inode, struct file * file)
{
printk(KERN_INFO "3000physicalview closed\n");
return 0;
}
/* Callback to device ioctl */
static long physicalview_ioctl(struct file *file, unsigned int cmd, unsigned long addr)
{
struct physicalview_memory *mem;
switch (cmd)
{
case PHYSICALVIEW_WALK:
/* Allocate kernel memory for our struct */
mem = kmalloc(sizeof(struct physicalview_memory), GFP_KERNEL);
if (!mem)
{
printk(KERN_ERR "Unable to allocate space for struct\n");
return -EFAULT;
}
/* Get virt from userspace */
if (copy_from_user(mem, (struct physicalview_memory *)addr,
sizeof(struct physicalview_memory)))
{
printk(KERN_ERR "Unable to copy struct from user\n");
kfree(mem);
return -EFAULT;
}
/* Call helper to get physical mapping for virtual address */
mem->phys = get_physical(mem->virt);
/* Give phys back to userspace */
if (copy_to_user((struct physicalview_memory *)addr, mem,
sizeof(struct physicalview_memory)))
{
printk(KERN_ERR "Unable to copy struct to user\n");
kfree(mem);
return -EFAULT;
}
/* Cleanup, cleanup, everybody do their share */
kfree(mem);
break;
default:
return -EINVAL;
}
return 0;
}
/* Register file operations */
static struct file_operations fops = {
.open = physicalview_open,
.release = physicalview_release,
.unlocked_ioctl = physicalview_ioctl,
};
/* World readable and writable because... security? */
static char *physicalview_devnode(struct device *device, umode_t *mode)
{
if (mode)
*mode = 0666;
return NULL;
}
/* Entry and exit points below this line ------------------------------------ */
/* Module initialization */
int init_module(void)
{
printk(KERN_INFO "3000physicalview initializing\n");
/* Register character device */
major_number = register_chrdev(0, DEVICE_NAME, &fops);
if (major_number < 0)
{
goto failed_chrdevreg;
}
/* Create device class */
class = class_create(THIS_MODULE, CLASS_NAME);
if (IS_ERR(class))
{
goto failed_classreg;
}
/* Set devnode to set our "super secure" permissions from above */
class->devnode = physicalview_devnode;
/* Create device */
device = device_create(class, NULL, MKDEV(major_number, 0), NULL, DEVICE_NAME);
if (IS_ERR(device))
{
goto failed_devreg;
}
printk(KERN_INFO "3000physicalview is loaded and regstered with major number %d!\n", major_number);
return 0;
/* NOTREACHED... */
/* Errors here */
failed_devreg:
printk(KERN_ERR "Failed to register 3000physicalview device!\n");
class_unregister(class);
class_destroy(class);
failed_classreg:
printk(KERN_ERR "Failed to register 3000physicalview class!\n");
unregister_chrdev(major_number, DEVICE_NAME);
failed_chrdevreg:
printk(KERN_ERR "Failed to register 3000physicalview as a character device!\n");
return -1;
}
/* Module destructor callback */
void cleanup_module(void)
{
/* Cleanup, cleanup, everybody do their share */
device_destroy(class, MKDEV(major_number, 0));
class_unregister(class);
class_destroy(class);
unregister_chrdev(major_number, DEVICE_NAME);
printk(KERN_INFO "3000physicalview cleanup complete\n");
}
3000physicalview.h
/* 3000physicalview.c A kernel module to expose virtual->physical memory mappings in userspace as an ioctl
* Copyright (C) 2020 William Findlay
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>. */
/* Warning: This module is extremely insecure.
* It is designed purely for teaching purposes.
* Using it is stupid unless you are in COMP3000. */
#ifndef PHYSICALVIEW_H
#define PHYSICALVIEW_H
#ifndef USERSPACE /* Kernelspace only */
#include <linux/module.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/device.h>
#include <linux/string.h>
#include <linux/sched.h>
#include <linux/gfp.h>
#include <linux/slab.h>
#include <linux/mm.h>
#include <linux/uaccess.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
#include <asm/pgtable_types.h>
#define DEVICE_NAME "3000physicalview"
#define CLASS_NAME "COMP3000"
MODULE_DESCRIPTION("Walk page table for userspace virtual address.");
MODULE_AUTHOR("William Findlay");
MODULE_LICENSE("GPL");
MODULE_VERSION("0.0.1");
#else /* Userspace only */
#include <sys/ioctl.h>
#endif
/* Both userspace and kernelspace */
struct physicalview_memory
{
unsigned long virt;
unsigned long phys;
};
/* Ioctl stuff */
#define PHYSICALVIEW_BASE 'k'
#define PHYSICALVIEW_WALK _IOWR(PHYSICALVIEW_BASE, 1, struct physicalview_memory)
#endif /* PHYSICALVIEW_H */
Makefile
obj-m += 3000physicalview.o
.PHONY: all, clean, insert, remove
all: 3000memview2 3000physicalview.ko
3000physicalview.ko: 3000physicalview.c 3000physicalview.h
make ARCH=x86_64 -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
3000memview2: 3000memview2.c
gcc -O2 -o 3000memview2 3000memview2.c
insert:
sudo insmod 3000physicalview.ko
remove:
sudo rmmod 3000physicalview
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
rm 3000memview2