Operating Systems 2017F Lecture 16: Difference between revisions

From Soma-notes
No edit summary
No edit summary
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
In Class
==Video==
Comp 3000
Lecture 16
Important notes: Tutorial 5:
File system :
 persistent data structure organized around blocks (which are fixed allocation units)
 maps hierarchal names (keys) to values
 provides a file-like API like open,  read, write, close,etc
What does it mean to “make” a file system?
 Initializing a data structure.
 “formatting” a disk
Physical vs Logical : logical size of a file: the size your program see when accessing the file (bytes in a file)
Physical : How much space it takes up on disk , in terms of blocks , fixed units of storage allocation


Physical :  
The video from the lecture given on Nov. 7, 2017 [http://homeostasis.scs.carleton.ca/~soma/os-2017f/lectures/comp3000-2017f-lec16-09Nov2017.mp4 is now available].
 By default or for multiple of files it is 1K blocks
 Example : Ext4 has 4k blocks


Kernel Programing :
==Code==
 Warning:
o If you use linux base , you may crash your whole system, just backup before you do so using “rsync”.
Open Stack : log in through the terminal using your instance’s Ip address , but it failed to work . when you ssh to it you must write ssh “Address” –l Ubuntu
 You are required to use sude to add a user name, so u can play around in root.
What is a Kernel module ?
 A way of splitting up kernel functionality so everything does not have to load at boot.
 Modifies a kernel functionality
 Runs in kernel space , is the key thing to think about
o It is more powerful than root and it can do anything
o Access to all kernel memory
o And you can modify everything
 If you miss anything in the kernel development your system will crash
 Kernel machine provides you with a floppy by default which explains why it still exists in Anil’s terminal
 Once you install a module , the module is unstrained
Why do we use modules? Why don’t we load processes instead?
 No new mechanisms
 Increased security (restricted access)
 Makes the kernel less smaller, microkernel design,
o Putting in the functions that are supposed to be in the kernel into processes
o Process do IPC rather than code talking in supervisor mode
 Examples :
• Filesystems
• Drivers
• Networking
• Minix, QNX, GNU, hurd,
 Why is Linux  “monolithics” kernel ?
o Switching between contexts are expensive (context switch)
o How to make microkernels fast can be adopted by monolithics kernels to make them even faster
o Unreal security benefits :
 if you control the file system process, you can control everything


[http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/lec16/newgetpid.zip Code for download]


==newgetpid.c==


<source lang="C" line>
/* Code derived from:
  https://appusajeev.wordpress.com/2011/06/18/writing-a-linux-character-device-driver/
  and
  http://pete.akeo.ie/2011/08/writing-linux-device-driver-for-kernels.html
*/


#include <linux/module.h>
#include <linux/string.h>
#include <linux/fs.h>
#include <linux/device.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <asm/uaccess.h>


#define dbg(format, arg...) do { if (debug) pr_info(CLASS_NAME ": %s: " format, __FUNCTION__, ## arg); } while (0)
#define err(format, arg...) pr_err(CLASS_NAME ": " format, ## arg)
#define info(format, arg...) pr_info(CLASS_NAME ": " format, ## arg)
#define warn(format, arg...) pr_warn(CLASS_NAME ": " format, ## arg)


Rebuilding and changing the kernel:
 1)Type “make” : more compilacted than 2401
o Kernel built
 2)Make modules
 3)Sudo make install
 4) sudo make-modules install
 5) Sudo shutdown –r now : for the vm to reboot
 Which configuration would you use to build your own kernel ?
o Don’t do configurations from scratch
o Copy the configurations and use them
o Make localmodconfig : output for ls mod and uses that for configuring your kernel
o Requires time and effort
 Why less /dev/ones doesn’t exist anymore?
o Since reboot occurred
o You must load the modules again
o Head –c 100 /dev/ones to be able to use it again
 Implementing the device file of dev 1 ?


o Implement the file API required
#define DEVICE_NAME "newgetpid"
o Teach the kernel what it means to do operations like read, etc
#define CLASS_NAME "comp3000"
Code from the tutorial ones.c:
 Open ones_read code: file descriptor, file , buffer, amount of bytes to read and offset
o Offset : position in the file
o Fills the buffer with ones
o Why don’t we just set it to 1 instead of putting put_user?
 Char *buf : Pointer for a user space process, in order for the kernel to write to user spacer safely
o Line 46: Why use printk and not printf? Since printf is not defined because the c library is not available in the kernel, how can you can c library when the c library depends on the kernel? Kernel is independent, does not depend on any libraries.


static struct class* newgetpid_class = NULL;
static struct device* newgetpid_device = NULL;
static int newgetpid_major;


static int newgetpid_open(struct inode *the_inode, struct file *f)
{
        return 0;
}


Commands:
static ssize_t newgetpid_read(struct file *f, char *buf, size_t len, loff_t *offset)
Man ls : to see different ls commands
{
 Ls –las block
        size_t i, msglen;
 Cat /dev/ones |less : it is like dev u random , but instead of generating random number, it instead generates infinite of number 1
        pid_t thepid;
 Ls –mod : displays all the moduls which are currently loaded on the virtual machine
 IBM ps/2 :  series of computers created to control PC, developed the interfaces to have a mouse and keyboard.
 Less readme : to check instructions of how to do a make
 Make menuconfig : options of kernel configurations
 Cat/pro
 Less .config : bad idea to go directly in it , use “make menuconfig “
 /boot : where the kernel got installed.
o Ls –lah : to see the size
 Less/ etc/modules
 Modul init: what function should be called when loaded and when it is unloaded
 Modul exit:
 Creating a device file : defining a file which has special semantics, define a struct and functions which should be called to explain each file operation, open , read, release(like closing but not really)
 What happens if you start running to the file ? permission are read only, not writing
o Override that? Still your permission is denied, you can only read since we didn’t write a function in the struct to write.


Additional notes:
        char message[100];
--> Warning: it's possible to destroy your entire system with 1 command. Solution: have good backups
       
--> Core kernel functionality is implemented via modules
        if (*offset > 0) {
--> use ls mod to see modules that are loaded
                return 0;
--> In practice, you load modules all at once
        }
       
        thepid = task_tgid_vnr(current);


'''Why do we need to load code into the kernel anyways?'''
        snprintf(message, 100, "Your PID is %d!\n", thepid);
--> more secure, ability to restrict access
       
        msglen = strlen(message);


--> examples of microkernel: minix (predecessor to Linux), QNX, GNU hurd
        if (len < msglen) {
--> once you install a module, it's unrestrained
                msglen = len;
        }


'''What is a monolithic kernel?'''
        for (i = 0; i < msglen; i++) {
--> a type of OS architecture where the entire OS is working in kernel space
                put_user(message[i], buf++);
--> can dynamically load/unload modules at runtime
        }


'''make localmodconfig:'''
        *offset = i;
--> takes output of ls mod and configures your kernel


'''ones.c program:'''
        return i;
}


'''/dev/ones:'''
static int newgetpid_release(struct inode *the_inode, struct file *f)
--> permissions are read only
{
        printk(KERN_ALERT "Newgetpid device closed\n");
        return 0;
}


'''file_operations ones_fops():'''
--> define what happens when you open a file, read from it, release tells you what happens when you're done with it (not the same thing as close)


'''ones_read():'''
static struct file_operations newgetpid_fops = {
--> len = number of bytes to read
        .open = newgetpid_open,
--> offset tells you where you are in the file
        .read = newgetpid_read,
--> put_user(): takes care of whatever needs to be done to write into that process properly
        .release = newgetpid_release,
};


'''ones_release:'''


'''Why are we using printk instead of printf?'''
static char *newgetpid_devnode(struct device *dev, umode_t *mode)
--> printf is not yet defined (ie. C library is not available in the kernel)
{
--> kernel doesn't depend on any libraries, all code belongs to the kernel itself
        if (mode)
--> printk is the kernel's own implementation of printf (outputs to the kernel log --> /var/log/kern.log)
        *mode = 0444;
        return NULL;
}


vfs = virtual filesystem layer
static int __init newgetpid_init(void)
{
        int retval;
 
        newgetpid_major = register_chrdev(0, DEVICE_NAME, &newgetpid_fops);
        if (newgetpid_major < 0) {
                err("failed to register device: error %d\n", newgetpid_major);
                retval = newgetpid_major;
                goto failed_chrdevreg;
        }
        newgetpid_class = class_create(THIS_MODULE, CLASS_NAME);
        if (IS_ERR(newgetpid_class)) {
                err("failed to register device class '%s'\n", CLASS_NAME);
                retval = PTR_ERR(newgetpid_class);
                goto failed_classreg;
        }
newgetpid_class->devnode = newgetpid_devnode;


'''How do we limit access to user space processes?'''
        newgetpid_device = device_create(newgetpid_class, NULL, MKDEV(newgetpid_major, 0),
--> Do a permission check
                                    NULL, DEVICE_NAME);


--> kernels need to be updated regularly to correct bugs that make the kernel vulnerable to programs trying to gain access to important user space processes
        if (IS_ERR(newgetpid_device)) {
--> unlikely() = tells you that this branch is not likely to be taken, optimize the current path
                err("failed to create device '%s'\n", DEVICE_NAME);
                retval = PTR_ERR(newgetpid_device);
                goto failed_devreg;
        }
       
        info("Newgetpid device registered using major %d.\n", newgetpid_major);
       
        return 0;
       
failed_devreg:
        class_unregister(newgetpid_class);
        class_destroy(newgetpid_class);
failed_classreg:
        unregister_chrdev(newgetpid_major, DEVICE_NAME);
failed_chrdevreg:
        return -1;
}


'''vfs_read:'''
static void __exit newgetpid_exit(void)
{
        device_destroy(newgetpid_class, MKDEV(newgetpid_major, 0));
        class_unregister(newgetpid_class);
        class_destroy(newgetpid_class);
        unregister_chrdev(newgetpid_major, "newgetpid");
        info("Unloading Newgetpid module.\n");
        return;
}


'''file->f_op->read:'''  
module_init(newgetpid_init);
--> this is how our read function will be called
module_exit(newgetpid_exit);
 
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anil Somayaji <soma@scs.carleton.ca>");
MODULE_DESCRIPTION("A write newgetpid character device module");
</source>
 
==Makefile==
 
<source lang=make line>
obj-m := newgetpid.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
        $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
</source>
 
=== Additional Notes ===
 
What determines what files you can and cannot create?<br>
* Ssh privileges <br>
<br>
* Anything that you can do as an ssh user you can do in the vm, just doing file operations<br>
* Sshfs means when you do read and write system calls for programs in a directory it also does it on the remote machine<br>
* Ssh is a good tool to access remote files locally, we will use it to edit modules<br>
<br>
Ones/newgetpid program:<br>
* We want to extend its functionality in a specific way<br>
* We want to access info about the process that made the system call<br>
* Let's get current processes id<br>
** Normally we would use getpid<br>
** But we can't make system calls in kernel space<br>
** But we can call the function that the system call uses or just copy the functionality<br>
* Code for this is in "kernel/sys.c"<br>
<br>
Syscall_define0:<br>
* Macro that expands<br>
* Defines system calls with "getpid" and takes no arguments<br>
<br>
* Can use the code inside the function but not the function itself in the kernel<br>
* Getpid returns a pid_t<br>
* Instead of get_ones returning all those ones we want it to return the pid<br>
* Lets try to get it to output a basic string with the pid<br>
<br>
How does printk work?<br>
* Printk sends its output to the kernel log<br>
* We changed the name to "newgetpid"<br>
* How do we convert int to string to print the pid?<br>
** Make a buffer, let's call it "message"<br>
<br>
Why does the pid keep incrementing each time we call "cat /dev/newgetpid"?<br>
* "Cat" spawns a process so every time we do a fork we get a new pid<br>
<br>
Why can we use snprintf but not printf?<br>
* We include "linux/kernel.h" which defines snprintf but not printf<br>
* Printf assumes we have a standard output to print to<br>
* Snprintf only needs character arrays for it to work<br>
<br>
* Read functionality uses an API<br>
* Adding new functionality like "write" is easy, just look at the standard API and original kernel source<br>
* All device files have their own custom read and write functions<br>
<br>
Why do we use goto's?<br>
* C has no exception handling functionality so we implement our own<br>
* Jumps to error paths: failed_devreg, failed_classreg, failed_chrdevreg (very important in the kernel)<br>
* Kernel needs to be able to handle it's own errors<br>
* Needs to free up allocated resources, "undo" everything<br>
<br>
* "." is the current directory<br>
* ".." is the parent directory<br>
** Introduces another hardlink<br>
<br>
* In order to build kernel modules, you need to have the headers associated with the current kernel you're running
* Modules are specified to a particular version of the kernel

Latest revision as of 08:38, 24 November 2017

Video

The video from the lecture given on Nov. 7, 2017 is now available.

Code

Code for download

newgetpid.c

/* Code derived from:
  https://appusajeev.wordpress.com/2011/06/18/writing-a-linux-character-device-driver/
  and
  http://pete.akeo.ie/2011/08/writing-linux-device-driver-for-kernels.html
*/

#include <linux/module.h>
#include <linux/string.h>
#include <linux/fs.h>
#include <linux/device.h>
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/sched.h>
#include <asm/uaccess.h>

#define dbg(format, arg...) do { if (debug) pr_info(CLASS_NAME ": %s: " format, __FUNCTION__, ## arg); } while (0)
#define err(format, arg...) pr_err(CLASS_NAME ": " format, ## arg)
#define info(format, arg...) pr_info(CLASS_NAME ": " format, ## arg)
#define warn(format, arg...) pr_warn(CLASS_NAME ": " format, ## arg)


#define DEVICE_NAME "newgetpid"
#define CLASS_NAME "comp3000"

static struct class* newgetpid_class = NULL;
static struct device* newgetpid_device = NULL;
static int newgetpid_major;

static int newgetpid_open(struct inode *the_inode, struct file *f)
{
        return 0;
}

static ssize_t newgetpid_read(struct file *f, char *buf, size_t len, loff_t *offset)
{
        size_t i, msglen;
        pid_t thepid;

        char message[100];
        
        if (*offset > 0) {
                return 0;
        }
        
        thepid = task_tgid_vnr(current);

        snprintf(message, 100, "Your PID is %d!\n", thepid);
        
        msglen = strlen(message);

        if (len < msglen) {
                msglen = len;
        }

        for (i = 0; i < msglen; i++) {
                put_user(message[i], buf++);
        }

        *offset = i;

        return i;
}

static int newgetpid_release(struct inode *the_inode, struct file *f)
{
        printk(KERN_ALERT "Newgetpid device closed\n");
        return 0;
}


static struct file_operations newgetpid_fops = {
        .open = newgetpid_open,
        .read = newgetpid_read,
        .release = newgetpid_release,
};


static char *newgetpid_devnode(struct device *dev, umode_t *mode)
{
        if (mode)
	        *mode = 0444;
        return NULL;
}

static int __init newgetpid_init(void)
{
        int retval;
  
        newgetpid_major = register_chrdev(0, DEVICE_NAME, &newgetpid_fops);
        if (newgetpid_major < 0) {
                err("failed to register device: error %d\n", newgetpid_major);
                retval = newgetpid_major;
                goto failed_chrdevreg;
        }
 
        newgetpid_class = class_create(THIS_MODULE, CLASS_NAME);
        if (IS_ERR(newgetpid_class)) {
                err("failed to register device class '%s'\n", CLASS_NAME);
                retval = PTR_ERR(newgetpid_class);
                goto failed_classreg;
        }
 
	newgetpid_class->devnode = newgetpid_devnode;

        newgetpid_device = device_create(newgetpid_class, NULL, MKDEV(newgetpid_major, 0),
                                    NULL, DEVICE_NAME);

        if (IS_ERR(newgetpid_device)) {
                err("failed to create device '%s'\n", DEVICE_NAME);
                retval = PTR_ERR(newgetpid_device);
                goto failed_devreg;
        }
        
        info("Newgetpid device registered using major %d.\n", newgetpid_major);
        
        return 0;
        
 failed_devreg:
        class_unregister(newgetpid_class);
        class_destroy(newgetpid_class);
 failed_classreg:
        unregister_chrdev(newgetpid_major, DEVICE_NAME);
 failed_chrdevreg:
        return -1;
}

static void __exit newgetpid_exit(void)
{
        device_destroy(newgetpid_class, MKDEV(newgetpid_major, 0));
        class_unregister(newgetpid_class);
        class_destroy(newgetpid_class);
        unregister_chrdev(newgetpid_major, "newgetpid");
        info("Unloading Newgetpid module.\n");
        return;
}

module_init(newgetpid_init);
module_exit(newgetpid_exit);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Anil Somayaji <soma@scs.carleton.ca>");
MODULE_DESCRIPTION("A write newgetpid character device module");

Makefile

obj-m := newgetpid.o
KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)
default:
        $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules

Additional Notes

What determines what files you can and cannot create?

  • Ssh privileges


  • Anything that you can do as an ssh user you can do in the vm, just doing file operations
  • Sshfs means when you do read and write system calls for programs in a directory it also does it on the remote machine
  • Ssh is a good tool to access remote files locally, we will use it to edit modules


Ones/newgetpid program:

  • We want to extend its functionality in a specific way
  • We want to access info about the process that made the system call
  • Let's get current processes id
    • Normally we would use getpid
    • But we can't make system calls in kernel space
    • But we can call the function that the system call uses or just copy the functionality
  • Code for this is in "kernel/sys.c"


Syscall_define0:

  • Macro that expands
  • Defines system calls with "getpid" and takes no arguments


  • Can use the code inside the function but not the function itself in the kernel
  • Getpid returns a pid_t
  • Instead of get_ones returning all those ones we want it to return the pid
  • Lets try to get it to output a basic string with the pid


How does printk work?

  • Printk sends its output to the kernel log
  • We changed the name to "newgetpid"
  • How do we convert int to string to print the pid?
    • Make a buffer, let's call it "message"


Why does the pid keep incrementing each time we call "cat /dev/newgetpid"?

  • "Cat" spawns a process so every time we do a fork we get a new pid


Why can we use snprintf but not printf?

  • We include "linux/kernel.h" which defines snprintf but not printf
  • Printf assumes we have a standard output to print to
  • Snprintf only needs character arrays for it to work


  • Read functionality uses an API
  • Adding new functionality like "write" is easy, just look at the standard API and original kernel source
  • All device files have their own custom read and write functions


Why do we use goto's?

  • C has no exception handling functionality so we implement our own
  • Jumps to error paths: failed_devreg, failed_classreg, failed_chrdevreg (very important in the kernel)
  • Kernel needs to be able to handle it's own errors
  • Needs to free up allocated resources, "undo" everything


  • "." is the current directory
  • ".." is the parent directory
    • Introduces another hardlink


  • In order to build kernel modules, you need to have the headers associated with the current kernel you're running
  • Modules are specified to a particular version of the kernel