Operating Systems 2017F Lecture 17

From Soma-notes

Video

The video from the lecture given on Nov. 14, 2017 is now available.

Notes

In Class

Lecture 17
----------

How do you do kernel hacking?

* Be humble
  - you don't know how everything works, and nobody else does

* Verify your assumptions as you go
  - perform lots of experiments
  - incremental development, compile and run very often

* Check for errors ALWAYS
  - will save you from later errors
  - kernel has to "live cleanly"

* Find another part of the kernel that is close to what you want to do.
  Read that code, use their approach where possible
  - You don't understand all the abstractions and assumptions, so
    "pattern match" to minimize trouble
  - Realize that the assumptions behind code you find don't necessarily match
    your own.


Additional Notes : 
Lecture 17: 
COMP 3000
Important notes: 

Install kernel module on the remote vm 
Edit them locally, mount the file locally 
Sshfs : What determines what files I can or can’t create, it is file operations, useful tools to edit modules  
Cd . : what does the . do ? nothing 
. = is the current directory 
.. = parent directory 
It can link to itself 
Compiling and loading all in the vm :  inorder to build ur own modules , you must associate the header to a specific kernel. 
Something is wrong with the metadata ? the files are in the soma not anil class?

Module Professor is walking through , ones:
 1) Extend its functionality , so it can access info about the process which has made the system call. 
2) Output your process ID is blocked 
-> how to make these changes:
1) change the dev 1 name ,  newgetpid, change the make file 
-> 2) #define dev name = newgetpid (renaming the device)
-> 3) get the pid of the user: 
 where is get pid? /kernel/sys.c.  SYS: not a c code, defining a system call that get pid with 0 arguments . 
	Get pid returns? 
Line 37 ) newgetpid_read: these changes are close but it kept repeating itself  
	Typy char *message = “your PID is unknown!\n”;
o	Msglen = 21; 
	Change the for loop 
o	For (I =0; I < msglen; i++){
o	    Put_user(message[i], buf++);
o	}
 *offset : long int , and it was equal to 0 so we had to update the offset 
-1: error
0 : no more bytes to read 
	To figure out if we return the right PID , code a printk
	Convert int to a string presentation , in order to output how? 
o	Line 39 : create a buffer called message char message [100];
o	Snprintf : like printf 
	Why is the number is the pid increasing ? since the cat spawns a process 
	Why not use printf instead of snprintf ? since we don’t have STDOUT, it prints to a buffer. 

Printk : goeas to kernel log
	Reading from a file is just an API
	REST API is similar to the os : since we can read and write anything
	With a device file : ?? //ask the prof 
	To implement a new function, get the major and minor and device name , creat a function by registering a device file 
	Every device as their own read and write functions , they have to be custome defined 
	They are special, and you can do this to any file 
	Fuse : file system user space , to connect 
	Why do we use goto? : we should not use , can only in the same functions , raising an exception essential in kernel code since the kernel has to resource management and allocating resources and in error condition you don’t . 
	Failed_devreg : deallocating 
	Printf : its not defined in the kernel code, it is defined in the c code and we don’t have stdout 
	

Textbook : No more readings for further lectures 
	Active learning rather than passing 




Important Commands: 
Ls vm : to display the directories in the vm
Ls –la vm ?: 
Sshfs: 
Get pid: process ID 
Rm ones*: removing the ones
Sudo rmmod newgetpid : 
Sudo insmode newgetpid.ko: to load the modules?
Sudo insmode|grep new 
Man getpid : information to get pif of  a process
Ls –la : for directories and links
Mkdir : making  directory.
Uname –r?: 
First number on the left : 21 hardlinks : every directory has a sub directory to it. 
Additional Notes : snprintf is essentially a function that redirects the output of printf to a buffer. This is particularly useful for avoiding repetition of a formatted string. 



* Understand the "flow of control" inside the program
  - architecture
  - division of responsibilites
  - purpose of data structures/objects


Scheduling
----------
How does the kernel maintain control?
 - CPU is given entirely to userspace processes to run most of the time

Kernel needs to run when it needs to run
 - and no process should be able to stop it

*The interrupt table*
 - pointers to code that the CPU runs when it receives different hardware
   or software interrupts
 - since the kernel runs first, it gets to set the interrupt table
 - only supervisor code can change the table, not user code

So when the ethernet card receives data...
...the ethernet card sends an interrupt to the kernel
...the CPU calls the kernel code for handling ethernet data

When the clock generates a timer interrupt...
...the CPU calls the kernel code for handling timer interrupts

In order to process an interrupt, an CPU core has to be taken over
 - that core was probably running a userspace process

Scheduling is all about what to do after having kicked a userspace process
off of a core

Normally on a core
* userspace process is running
* interrupt happens
* core swiches to supervisor mode, runs kernel code
* last part of kernel code is scheduler, chooses which userspace code to run
* goto top :-)

Kernel is entered via interrupts, exited via the scheduler

Entry and exit to the kernel has to do low-level tasks like changing CPU modes,
manage CPU registers
 - must be written in assembly

Most OS kernels minimize this low-level code

What criteria should the scheduler use in deciding what to run?
 - "fairness"
   - prevent starvation
   - absent other conditions, equal allocation of resources
 - bias resources towards "foreground" tasks in interactive systems
   - never biased enough
   - always hacks, heuristics

Memory overcommitment
 - it is possible to allocate way more memory than can be ever used
 - goes into "memory debt"

Midterm Review

1. Execve and no it does not create a new process

2. Dynamically linked programs map in dynamically linked libraries

3. Hard links maps filenames to inodes

4. Sleep until the consumer wakes it

5. Using type conversion, read returns number of bytes written. We must convert back to the number of unsigned long, minus one because maximum index of array is one less

6. uses of signals:

  • kill process
  • find out when child process is terminated
  • start/stop a process
  • detect when a program has made a ptr error

7. Pointers in C contain virtual memory addresses because the memory of a process is a virtual address space. Each process thinks it has its own stretch of memory.

8. processes make a sys call to allocate memory. execve -> then mmap and/or sbreak to allocate memory

9. shell opens the file output.txt.

10. mmap contents of file can allocate the contents of it into ram.

  • comparing 1 mb of both files at a time will be less expensive
  • only makes sense to mmap if you will access files multiple times

11. producer and consumer now have no way of talking to each other because they do not share the same memory.

12. no. there is a race condition. while loop loops while sem < 0. someone else could access and modify the variable between the while and the sem--;.

  • you need special instructions to test and set the variable at the same time.
  • also, int sem parameter isn't a ptr so it's a local variable.

Additional Notes

How do you do kernel hacking? Some tips for low level kernel code: Be humble; You cannot know how everything works
Verify your assumptions as you go
Perform lots of experiments
Incremental Development, compile and run often

Check for errors ALWAYS or they will cascade and hurt you later

  • This will save you from later errors
  • Kernel has to live cleanly
  • This principle applies to any large scale coding


Find another part of the kernel that is close to what you want to do (Search online! This is a good link: http://elixir.free-electrons.com/linux/v4.4.83/source
You do not understand all abstractions and assumptions, so "pattern match" to minimize trouble

  • Functionality that you want to implement might already exist in the kernel, look at how they do things and follow the pattern.
  • Refine understanding as you go, the other implementation might be for something else.


Realize that the assumptions behind code you don't necessarily match to your own. Spend the time reading other people's code.
Understand the "flow of control" inside the program

  • Architecture
  • Division of responsibilities
  • Purpose of data structures/objects

For the tutorial 6 he gave us a link for a reason. If you click on it you can search getpid and then dig deeper! For example cut and paste the line inside getuid function.

Scheduling
Scheduler decides when what process should run at a high level
Questions: How does the scheduler work/kernel control when a program runs/how can you make sure one program does not take over CPU from anyone else?
CPU is given entirely to userspace processes to run most of the time
Kernel needs to run when it needs to run and no process should be able to stop it

  • The interrupt table: pointers to code that the CPU runs when it receives different hardware or software interrupts. Since the kernel runs first it gets to set the interrupt table. Only supervisor code can change the table not user code.


When the Ethernet card receives data:
1) The Ethernet card sends an interrupt to the kernel
2) The CPU calls the kernel code for handling Ethernet data

When the clock generates a timer interrupt:
1) The CPU calls the kernel code for handling timer interrupts

2) In order to process an interrupt, a CPU core has to be taken over
3) That core was probably running a userspace process
4) Scheduling is all about what to do after having kicked a userspace process off of a core

Normally on a core:

  • Userspace is happening
  • Interrupt happens
  • core switches to supervisor mode, runs kernel code
  • last part of kernel code is scheduler, chooses which userspace code to run. Repeat the process

Kernel is entered via interrupts, exited via the scheduler
Scheduler needs to be efficient because interrupts are frequent and we do not want to waste resources that could be put towards actual computing
Entry and exit to kernel has to do low-level tasks like changing CPU modes, manage CPU registers. This is Assembly code.
Go to http://elixir.free-electrons.com/linux/v4.4.83/source
Go into arch: code specific to CPU architecture. Two most commons are arm and x86
Go up one level
Drivers directory have drivers for all kinds of things
Linux Kernel designed for x86 but it now abstracted to support different architectures. Go into architecture -> x86 -> entry
Open entry.s: Large assembly language file. This is the code that runs before system call specific code.
shced.h -> Contains definition of task_struct. Data structure that keeps track of a process.

What criteria should the scheduler use in deciding what to run?

  • Fairness: Starvation is when a program does not get the CPU
    • Everyone should get a turn
    • CPU is passed around
    • Nobody should be left that they never get the CPU
    • Prevent Starvation
    • Absent other conditions, equal allocation of resources
  • Bias resources towards "foreground" tasks in interactive systems
    • never Biased enough
    • always hacks, heuristics


Memory overcommitment
Memory does not get allocated when you mmap. It is loaded "lazily" in the Linux kernel. It is possible to allocate way more memory than can ever be used.
Goes into "memory debt"