Operating Systems 2017F Lecture 17
Video
The video from the lecture given on Nov. 14, 2017 is now available.
Notes
In Class
Lecture 17
----------
How do you do kernel hacking?
* Be humble
  - you don't know how everything works, and nobody else does
* Verify your assumptions as you go
  - perform lots of experiments
  - incremental development, compile and run very often
* Check for errors ALWAYS
  - will save you from later errors
  - kernel has to "live cleanly"
* Find another part of the kernel that is close to what you want to do.
  Read that code, use their approach where possible
  - You don't understand all the abstractions and assumptions, so
    "pattern match" to minimize trouble
  - Realize that the assumptions behind code you find don't necessarily match
    your own.
* Understand the "flow of control" inside the program
  - architecture
  - division of responsibilites
  - purpose of data structures/objects
Scheduling
----------
How does the kernel maintain control?
 - CPU is given entirely to userspace processes to run most of the time
Kernel needs to run when it needs to run
 - and no process should be able to stop it
*The interrupt table*
 - pointers to code that the CPU runs when it receives different hardware
   or software interrupts
 - since the kernel runs first, it gets to set the interrupt table
 - only supervisor code can change the table, not user code
So when the ethernet card receives data...
...the ethernet card sends an interrupt to the kernel
...the CPU calls the kernel code for handling ethernet data
When the clock generates a timer interrupt...
...the CPU calls the kernel code for handling timer interrupts
In order to process an interrupt, an CPU core has to be taken over
 - that core was probably running a userspace process
Scheduling is all about what to do after having kicked a userspace process
off of a core
Normally on a core
* userspace process is running
* interrupt happens
* core swiches to supervisor mode, runs kernel code
* last part of kernel code is scheduler, chooses which userspace code to run
* goto top :-)
Kernel is entered via interrupts, exited via the scheduler
Entry and exit to the kernel has to do low-level tasks like changing CPU modes,
manage CPU registers
 - must be written in assembly
Most OS kernels minimize this low-level code
What criteria should the scheduler use in deciding what to run?
 - "fairness"
   - prevent starvation
   - absent other conditions, equal allocation of resources
 - bias resources towards "foreground" tasks in interactive systems
   - never biased enough
   - always hacks, heuristics
Memory overcommitment
 - it is possible to allocate way more memory than can be ever used
 - goes into "memory debt"
Midterm Review
1. Execve and no it does not create a new process
2. Dynamically linked programs map in dynamically linked libraries
3. Hard links maps filenames to inodes
4. Sleep until the consumer wakes it
5. Using type conversion, read returns number of bytes written. We must convert back to the number of unsigned long, minus one because maximum index of array is one less
6. uses of signals:
- kill process
- find out when child process is terminated
- start/stop a process
- detect when a program has made a ptr error
7. Pointers in C contain virtual memory addresses because the memory of a process is a virtual address space. Each process thinks it has its own stretch of memory.
8. processes make a sys call to allocate memory. execve -> then mmap and/or sbreak to allocate memory
9. shell opens the file output.txt.
10. mmap contents of file can allocate the contents of it into ram.
- comparing 1 mb of both files at a time will be less expensive
- only makes sense to mmap if you will access files multiple times
11. producer and consumer now have no way of talking to each other because they do not share the same memory.
12. no. there is a race condition. while loop loops while sem < 0. someone else could access and modify the variable between the while and the sem--;.
- you need special instructions to test and set the variable at the same time.
- also, int sem parameter isn't a ptr so it's a local variable.
Additional Notes
How do you do kernel hacking? Some tips for low level kernel code:
Be humble; You cannot know how everything works
Verify your assumptions as you go
Perform lots of experiments
Incremental Development, compile and run often
Check for errors ALWAYS or they will cascade and hurt you later
- This will save you from later errors
- Kernel has to live cleanly
- This principle applies to any large scale coding
Find another part of the kernel that is close to what you want to do (Search online! This is a good link: http://elixir.free-electrons.com/linux/v4.4.83/source 
You do not understand all abstractions and assumptions, so "pattern match" to minimize trouble 
- Functionality that you want to implement might already exist in the kernel, look at how they do things and follow the pattern.
- Refine understanding as you go, the other implementation might be for something else. 
 
Realize that the assumptions behind code you don't necessarily match to your own. Spend the time reading other people's code. 
Understand the "flow of control" inside the program
- Architecture
- Division of responsibilities
- Purpose of data structures/objects
For the tutorial 6 he gave us a link for a reason. If you click on it you can search getpid and then dig deeper! For example cut and paste the line inside getuid function. 
Scheduling
Scheduler decides when what process should run at a high level
Questions: How does the scheduler work/kernel control when a program runs/how can you make sure one program does not take over CPU from anyone else?
CPU is given entirely to userspace processes to run most of the time
Kernel needs to run when it needs to run and no process should be able to stop it
- The interrupt table: pointers to code that the CPU runs when it receives different hardware or software interrupts. Since the kernel runs first it gets to set the interrupt table. Only supervisor code can change the table not user code.
When the Ethernet card receives data: 
1) The Ethernet card sends an interrupt to the kernel 
2) The CPU calls the kernel code for handling Ethernet data  
When the clock generates a timer interrupt: 
1) The CPU calls the kernel code for handling timer interrupts 
2) In order to process an interrupt, a CPU core has to be taken over
3) That core was probably running a userspace process 
4) Scheduling is all about what to do after having kicked a userspace process off of a core
Normally on a core:
- Userspace is happening
- Interrupt happens 
- core switches to supervisor mode, runs kernel code
- last part of kernel code is scheduler, chooses which userspace code to run. Repeat the process
Kernel is entered via interrupts, exited via the scheduler
Scheduler needs to be efficient because interrupts are frequent and we do not want to waste resources that could be put towards actual computing
Entry and exit to kernel has to do low-level tasks like changing CPU modes, manage CPU registers. This is Assembly code. 
Go to http://elixir.free-electrons.com/linux/v4.4.83/source
Go into arch: code specific to CPU architecture. Two most commons are arm and x86
Go up one level
Drivers directory have drivers for all kinds of things
Linux Kernel designed for x86 but it now abstracted to support different architectures. 
Go into architecture -> x86 -> entry 
Open entry.s: Large assembly language file. This is the code that runs before system call specific code.
shced.h -> Contains definition of task_struct. Data structure that keeps track of a process.
What criteria should the scheduler use in deciding what to run?
- Fairness: Starvation is when a program does not get the CPU 
 - Everyone should get a turn
- CPU is passed around
- Nobody should be left that they never get the CPU
- Prevent Starvation 
- Absent other conditions, equal allocation of resources
 
- Bias resources towards "foreground" tasks in interactive systems
- never Biased enough
- always hacks, heuristics
 
Memory overcommitment
Memory does not get allocated when you mmap. It is loaded "lazily" int he Linux kernel. It is possible to allocate way more memory than can ever be used. 
Goes into "memory debt"