Operating Systems 2017F Lecture 17: Difference between revisions
|  Created page with " == Additional Notes ==  How do you do kernel hacking? Some tips for low level kernel code: - Be humble; You cannot know how everything works  -" | HebaJallad (talk | contribs) | ||
| (34 intermediate revisions by 4 users not shown) | |||
| Line 1: | Line 1: | ||
| ==Video== | |||
| == Additional Notes == | The video from the lecture given on Nov. 14, 2017 [http://homeostasis.scs.carleton.ca/~soma/os-2017f/lectures/comp3000-2017f-lec17-14Nov2017.mp4 is now available]. | ||
| ==Notes== | |||
| ===In Class=== | |||
| <pre> | |||
| Lecture 17 | |||
| ---------- | |||
| How do you do kernel hacking? | |||
| * Be humble | |||
|   - you don't know how everything works, and nobody else does | |||
| * Verify your assumptions as you go | |||
|   - perform lots of experiments | |||
|   - incremental development, compile and run very often | |||
| * Check for errors ALWAYS | |||
|   - will save you from later errors | |||
|   - kernel has to "live cleanly" | |||
| * Find another part of the kernel that is close to what you want to do. | |||
|   Read that code, use their approach where possible | |||
|   - You don't understand all the abstractions and assumptions, so | |||
|     "pattern match" to minimize trouble | |||
|   - Realize that the assumptions behind code you find don't necessarily match | |||
|     your own. | |||
| Additional Notes :  | |||
| Lecture 17:  | |||
| COMP 3000 | |||
| Important notes:  | |||
| Install kernel module on the remote vm  | |||
| Edit them locally, mount the file locally  | |||
| Sshfs : What determines what files I can or can’t create, it is file operations, useful tools to edit modules   | |||
| Cd . : what does the . do ? nothing  | |||
| . = is the current directory  | |||
| .. = parent directory  | |||
| It can link to itself  | |||
| Compiling and loading all in the vm :  inorder to build ur own modules , you must associate the header to a specific kernel.  | |||
| Something is wrong with the metadata ? the files are in the soma not anil class? | |||
| Module Professor is walking through , ones: | |||
|  1) Extend its functionality , so it can access info about the process which has made the system call.  | |||
| 2) Output your process ID is blocked  | |||
| -> how to make these changes: | |||
| 1) change the dev 1 name ,  newgetpid, change the make file  | |||
| -> 2) #define dev name = newgetpid (renaming the device) | |||
| -> 3) get the pid of the user:  | |||
|  where is get pid? /kernel/sys.c.  SYS: not a c code, defining a system call that get pid with 0 arguments .  | |||
| 	Get pid returns?  | |||
| Line 37 ) newgetpid_read: these changes are close but it kept repeating itself   | |||
| 	Typy char *message = “your PID is unknown!\n”; | |||
| o	Msglen = 21;  | |||
| 	Change the for loop  | |||
| o	For (I =0; I < msglen; i++){ | |||
| o	    Put_user(message[i], buf++); | |||
| o	} | |||
|  *offset : long int , and it was equal to 0 so we had to update the offset  | |||
| -1: error | |||
| 0 : no more bytes to read  | |||
| 	To figure out if we return the right PID , code a printk | |||
| 	Convert int to a string presentation , in order to output how?  | |||
| o	Line 39 : create a buffer called message char message [100]; | |||
| o	Snprintf : like printf  | |||
| 	Why is the number is the pid increasing ? since the cat spawns a process  | |||
| 	Why not use printf instead of snprintf ? since we don’t have STDOUT, it prints to a buffer.  | |||
| Printk : goeas to kernel log | |||
| 	Reading from a file is just an API | |||
| 	REST API is similar to the os : since we can read and write anything | |||
| 	With a device file : ?? //ask the prof  | |||
| 	To implement a new function, get the major and minor and device name , creat a function by registering a device file  | |||
| 	Every device as their own read and write functions , they have to be custome defined  | |||
| 	They are special, and you can do this to any file  | |||
| 	Fuse : file system user space , to connect  | |||
| 	Why do we use goto? : we should not use , can only in the same functions , raising an exception essential in kernel code since the kernel has to resource management and allocating resources and in error condition you don’t .  | |||
| 	Failed_devreg : deallocating  | |||
| 	Printf : its not defined in the kernel code, it is defined in the c code and we don’t have stdout  | |||
| 	 | |||
| Textbook : No more readings for further lectures  | |||
| 	Active learning rather than passing  | |||
| Important Commands:  | |||
| Ls vm : to display the directories in the vm | |||
| Ls –la vm ?:  | |||
| Sshfs:  | |||
| Get pid: process ID  | |||
| Rm ones*: removing the ones | |||
| Sudo rmmod newgetpid :  | |||
| Sudo insmode newgetpid.ko: to load the modules? | |||
| Sudo insmode|grep new  | |||
| Man getpid : information to get pif of  a process | |||
| Ls –la : for directories and links | |||
| Mkdir : making  directory. | |||
| Uname –r?:  | |||
| First number on the left : 21 hardlinks : every directory has a sub directory to it.  | |||
| Additional Notes : snprintf is essentially a function that redirects the output of printf to a buffer. This is particularly useful for avoiding repetition of a formatted string.  | |||
| * Understand the "flow of control" inside the program | |||
|   - architecture | |||
|   - division of responsibilites | |||
|   - purpose of data structures/objects | |||
| Scheduling | |||
| ---------- | |||
| How does the kernel maintain control? | |||
|  - CPU is given entirely to userspace processes to run most of the time | |||
| Kernel needs to run when it needs to run | |||
|  - and no process should be able to stop it | |||
| *The interrupt table* | |||
|  - pointers to code that the CPU runs when it receives different hardware | |||
|    or software interrupts | |||
|  - since the kernel runs first, it gets to set the interrupt table | |||
|  - only supervisor code can change the table, not user code | |||
| So when the ethernet card receives data... | |||
| ...the ethernet card sends an interrupt to the kernel | |||
| ...the CPU calls the kernel code for handling ethernet data | |||
| When the clock generates a timer interrupt... | |||
| ...the CPU calls the kernel code for handling timer interrupts | |||
| In order to process an interrupt, an CPU core has to be taken over | |||
|  - that core was probably running a userspace process | |||
| Scheduling is all about what to do after having kicked a userspace process | |||
| off of a core | |||
| Normally on a core | |||
| * userspace process is running | |||
| * interrupt happens | |||
| * core swiches to supervisor mode, runs kernel code | |||
| * last part of kernel code is scheduler, chooses which userspace code to run | |||
| * goto top :-) | |||
| Kernel is entered via interrupts, exited via the scheduler | |||
| Entry and exit to the kernel has to do low-level tasks like changing CPU modes, | |||
| manage CPU registers | |||
|  - must be written in assembly | |||
| Most OS kernels minimize this low-level code | |||
| What criteria should the scheduler use in deciding what to run? | |||
|  - "fairness" | |||
|    - prevent starvation | |||
|    - absent other conditions, equal allocation of resources | |||
|  - bias resources towards "foreground" tasks in interactive systems | |||
|    - never biased enough | |||
|    - always hacks, heuristics | |||
| Memory overcommitment | |||
|  - it is possible to allocate way more memory than can be ever used | |||
|  - goes into "memory debt" | |||
| </pre> | |||
| ===Midterm Review=== | |||
| 1. Execve and no it does not create a new process | |||
| 2. Dynamically linked programs map in dynamically linked libraries | |||
| 3. Hard links maps filenames to inodes | |||
| 4. Sleep until the consumer wakes it | |||
| 5. Using type conversion, read returns number of bytes written. We must convert back to the number of unsigned long, minus one because maximum index of array is one less | |||
| 6. uses of signals: | |||
| * kill process | |||
| * find out when child process is terminated | |||
| * start/stop a process | |||
| * detect when a program has made a ptr error | |||
| 7. Pointers in C contain virtual memory addresses because the memory of a process is a virtual address space. Each process thinks it has its own stretch of memory. | |||
| 8. processes make a sys call to allocate memory. execve -> then mmap and/or sbreak to allocate memory | |||
| 9. shell opens the file output.txt. | |||
| 10. mmap contents of file can allocate the contents of it into ram. | |||
| * comparing 1 mb of both files at a time will be less expensive | |||
| * only makes sense to mmap if you will access files multiple times | |||
| 11. producer and consumer now have no way of talking to each other because they do not share the same memory. | |||
| 12. no. there is a race condition. while loop loops while sem < 0. | |||
| someone else could access and modify the variable between the while and the sem--;. | |||
| * you need special instructions to test and set the variable at the same time. | |||
| * also, int sem parameter isn't a ptr so it's a local variable. | |||
| ===Additional Notes=== | |||
| How do you do kernel hacking? Some tips for low level kernel code: | How do you do kernel hacking? Some tips for low level kernel code: | ||
| Be humble; You cannot know how everything works<br> | |||
| - | Verify your assumptions as you go<br> | ||
| Perform lots of experiments<br> | |||
| Incremental Development, compile and run often<br> | |||
| <br> | |||
| Check for errors ALWAYS or they will cascade and hurt you later<br> | |||
| *This will save you from later errors<br> | |||
| *Kernel has to live cleanly<br> | |||
| *This principle applies to any large scale coding | |||
| <br> | |||
| Find another part of the kernel that is close to what you want to do (Search online! This is a good link: http://elixir.free-electrons.com/linux/v4.4.83/source <br> | |||
| You do not understand all abstractions and assumptions, so "pattern match" to minimize trouble <br> | |||
| *Functionality that you want to implement might already exist in the kernel, look at how they do things and follow the pattern.<br> | |||
| *Refine understanding as you go, the other implementation might be for something else. <br> | |||
| <br>  | |||
| Realize that the assumptions behind code you don't necessarily match to your own. Spend the time reading other people's code. <br> | |||
| Understand the "flow of control" inside the program<br> | |||
| *Architecture<br> | |||
| *Division of responsibilities<br> | |||
| *Purpose of data structures/objects<br> | |||
| For the tutorial 6 he gave us a link for a reason. If you click on it you can search getpid and then dig deeper! For example cut and paste the line inside getuid function. <br> | |||
| <br> | |||
| Scheduling<br> | |||
| Scheduler decides when what process should run at a high level<br> | |||
| Questions: How does the scheduler work/kernel control when a program runs/how can you make sure one program does not take over CPU from anyone else?<br> | |||
| CPU is given entirely to userspace processes to run most of the time<br> | |||
| Kernel needs to run when it needs to run and no process should be able to stop it<br> | |||
| * The interrupt table: pointers to code that the CPU runs when it receives different hardware or software interrupts. Since the kernel runs first it gets to set the interrupt table. Only supervisor code can change the table not user code. | |||
| <br> | |||
| When the Ethernet card receives data: <br> | |||
| 1) The Ethernet card sends an interrupt to the kernel <br> | |||
| 2) The CPU calls the kernel code for handling Ethernet data  <br> | |||
| <br> | |||
| When the clock generates a timer interrupt: <br> | |||
| 1) The CPU calls the kernel code for handling timer interrupts <br> | |||
| <br> | |||
| 2) In order to process an interrupt, a CPU core has to be taken over<br> | |||
| 3) That core was probably running a userspace process <br> | |||
| 4) Scheduling is all about what to do after having kicked a userspace process off of a core<br> | |||
| <br> | |||
| Normally on a core:<br> | |||
| * Userspace is happening<br> | |||
| * Interrupt happens <br> | |||
| * core switches to supervisor mode, runs kernel code<br> | |||
| * last part of kernel code is scheduler, chooses which userspace code to run. Repeat the process<br> | |||
| Kernel is entered via interrupts, exited via the scheduler<br> | |||
| Scheduler needs to be efficient because interrupts are frequent and we do not want to waste resources that could be put towards actual computing<br> | |||
| Entry and exit to kernel has to do low-level tasks like changing CPU modes, manage CPU registers. This is Assembly code.  | |||
| <br> | |||
| Go to http://elixir.free-electrons.com/linux/v4.4.83/source<br> | |||
| Go into arch: code specific to CPU architecture. Two most commons are arm and x86<br> | |||
| Go up one level<br> | |||
| Drivers directory have drivers for all kinds of things<br> | |||
| Linux Kernel designed for x86 but it now abstracted to support different architectures.  | |||
| Go into architecture -> x86 -> entry <br> | |||
| Open entry.s: Large assembly language file. This is the code that runs before system call specific code.<br> | |||
| shced.h -> Contains definition of task_struct. Data structure that keeps track of a process.<br> | |||
| <br> | |||
| What criteria should the scheduler use in deciding what to run?<br> | |||
| *Fairness: Starvation is when a program does not get the CPU <br> | |||
| **Everyone should get a turn | |||
| **CPU is passed around | |||
| ** Nobody should be left that they never get the CPU | |||
| **Prevent Starvation <br> | |||
| **Absent other conditions, equal allocation of resources | |||
| *Bias resources towards "foreground" tasks in interactive systems  | |||
| **never Biased enough | |||
| **always hacks, heuristics | |||
| <br> | |||
| Memory overcommitment<br> | |||
| Memory does not get allocated when you mmap. It is loaded "lazily" in the Linux kernel. It is possible to allocate way more memory than can ever be used. <br> | |||
| Goes into "memory debt" <br> | |||
Latest revision as of 20:39, 16 November 2017
Video
The video from the lecture given on Nov. 14, 2017 is now available.
Notes
In Class
Lecture 17
----------
How do you do kernel hacking?
* Be humble
  - you don't know how everything works, and nobody else does
* Verify your assumptions as you go
  - perform lots of experiments
  - incremental development, compile and run very often
* Check for errors ALWAYS
  - will save you from later errors
  - kernel has to "live cleanly"
* Find another part of the kernel that is close to what you want to do.
  Read that code, use their approach where possible
  - You don't understand all the abstractions and assumptions, so
    "pattern match" to minimize trouble
  - Realize that the assumptions behind code you find don't necessarily match
    your own.
Additional Notes : 
Lecture 17: 
COMP 3000
Important notes: 
Install kernel module on the remote vm 
Edit them locally, mount the file locally 
Sshfs : What determines what files I can or can’t create, it is file operations, useful tools to edit modules  
Cd . : what does the . do ? nothing 
. = is the current directory 
.. = parent directory 
It can link to itself 
Compiling and loading all in the vm :  inorder to build ur own modules , you must associate the header to a specific kernel. 
Something is wrong with the metadata ? the files are in the soma not anil class?
Module Professor is walking through , ones:
 1) Extend its functionality , so it can access info about the process which has made the system call. 
2) Output your process ID is blocked 
-> how to make these changes:
1) change the dev 1 name ,  newgetpid, change the make file 
-> 2) #define dev name = newgetpid (renaming the device)
-> 3) get the pid of the user: 
 where is get pid? /kernel/sys.c.  SYS: not a c code, defining a system call that get pid with 0 arguments . 
	Get pid returns? 
Line 37 ) newgetpid_read: these changes are close but it kept repeating itself  
	Typy char *message = “your PID is unknown!\n”;
o	Msglen = 21; 
	Change the for loop 
o	For (I =0; I < msglen; i++){
o	    Put_user(message[i], buf++);
o	}
 *offset : long int , and it was equal to 0 so we had to update the offset 
-1: error
0 : no more bytes to read 
	To figure out if we return the right PID , code a printk
	Convert int to a string presentation , in order to output how? 
o	Line 39 : create a buffer called message char message [100];
o	Snprintf : like printf 
	Why is the number is the pid increasing ? since the cat spawns a process 
	Why not use printf instead of snprintf ? since we don’t have STDOUT, it prints to a buffer. 
Printk : goeas to kernel log
	Reading from a file is just an API
	REST API is similar to the os : since we can read and write anything
	With a device file : ?? //ask the prof 
	To implement a new function, get the major and minor and device name , creat a function by registering a device file 
	Every device as their own read and write functions , they have to be custome defined 
	They are special, and you can do this to any file 
	Fuse : file system user space , to connect 
	Why do we use goto? : we should not use , can only in the same functions , raising an exception essential in kernel code since the kernel has to resource management and allocating resources and in error condition you don’t . 
	Failed_devreg : deallocating 
	Printf : its not defined in the kernel code, it is defined in the c code and we don’t have stdout 
	
Textbook : No more readings for further lectures 
	Active learning rather than passing 
Important Commands: 
Ls vm : to display the directories in the vm
Ls –la vm ?: 
Sshfs: 
Get pid: process ID 
Rm ones*: removing the ones
Sudo rmmod newgetpid : 
Sudo insmode newgetpid.ko: to load the modules?
Sudo insmode|grep new 
Man getpid : information to get pif of  a process
Ls –la : for directories and links
Mkdir : making  directory.
Uname –r?: 
First number on the left : 21 hardlinks : every directory has a sub directory to it. 
Additional Notes : snprintf is essentially a function that redirects the output of printf to a buffer. This is particularly useful for avoiding repetition of a formatted string. 
* Understand the "flow of control" inside the program
  - architecture
  - division of responsibilites
  - purpose of data structures/objects
Scheduling
----------
How does the kernel maintain control?
 - CPU is given entirely to userspace processes to run most of the time
Kernel needs to run when it needs to run
 - and no process should be able to stop it
*The interrupt table*
 - pointers to code that the CPU runs when it receives different hardware
   or software interrupts
 - since the kernel runs first, it gets to set the interrupt table
 - only supervisor code can change the table, not user code
So when the ethernet card receives data...
...the ethernet card sends an interrupt to the kernel
...the CPU calls the kernel code for handling ethernet data
When the clock generates a timer interrupt...
...the CPU calls the kernel code for handling timer interrupts
In order to process an interrupt, an CPU core has to be taken over
 - that core was probably running a userspace process
Scheduling is all about what to do after having kicked a userspace process
off of a core
Normally on a core
* userspace process is running
* interrupt happens
* core swiches to supervisor mode, runs kernel code
* last part of kernel code is scheduler, chooses which userspace code to run
* goto top :-)
Kernel is entered via interrupts, exited via the scheduler
Entry and exit to the kernel has to do low-level tasks like changing CPU modes,
manage CPU registers
 - must be written in assembly
Most OS kernels minimize this low-level code
What criteria should the scheduler use in deciding what to run?
 - "fairness"
   - prevent starvation
   - absent other conditions, equal allocation of resources
 - bias resources towards "foreground" tasks in interactive systems
   - never biased enough
   - always hacks, heuristics
Memory overcommitment
 - it is possible to allocate way more memory than can be ever used
 - goes into "memory debt"
Midterm Review
1. Execve and no it does not create a new process
2. Dynamically linked programs map in dynamically linked libraries
3. Hard links maps filenames to inodes
4. Sleep until the consumer wakes it
5. Using type conversion, read returns number of bytes written. We must convert back to the number of unsigned long, minus one because maximum index of array is one less
6. uses of signals:
- kill process
- find out when child process is terminated
- start/stop a process
- detect when a program has made a ptr error
7. Pointers in C contain virtual memory addresses because the memory of a process is a virtual address space. Each process thinks it has its own stretch of memory.
8. processes make a sys call to allocate memory. execve -> then mmap and/or sbreak to allocate memory
9. shell opens the file output.txt.
10. mmap contents of file can allocate the contents of it into ram.
- comparing 1 mb of both files at a time will be less expensive
- only makes sense to mmap if you will access files multiple times
11. producer and consumer now have no way of talking to each other because they do not share the same memory.
12. no. there is a race condition. while loop loops while sem < 0. someone else could access and modify the variable between the while and the sem--;.
- you need special instructions to test and set the variable at the same time.
- also, int sem parameter isn't a ptr so it's a local variable.
Additional Notes
How do you do kernel hacking? Some tips for low level kernel code:
Be humble; You cannot know how everything works
Verify your assumptions as you go
Perform lots of experiments
Incremental Development, compile and run often
Check for errors ALWAYS or they will cascade and hurt you later
- This will save you from later errors
- Kernel has to live cleanly
- This principle applies to any large scale coding
Find another part of the kernel that is close to what you want to do (Search online! This is a good link: http://elixir.free-electrons.com/linux/v4.4.83/source 
You do not understand all abstractions and assumptions, so "pattern match" to minimize trouble 
- Functionality that you want to implement might already exist in the kernel, look at how they do things and follow the pattern.
- Refine understanding as you go, the other implementation might be for something else. 
 
Realize that the assumptions behind code you don't necessarily match to your own. Spend the time reading other people's code. 
Understand the "flow of control" inside the program
- Architecture
- Division of responsibilities
- Purpose of data structures/objects
For the tutorial 6 he gave us a link for a reason. If you click on it you can search getpid and then dig deeper! For example cut and paste the line inside getuid function. 
Scheduling
Scheduler decides when what process should run at a high level
Questions: How does the scheduler work/kernel control when a program runs/how can you make sure one program does not take over CPU from anyone else?
CPU is given entirely to userspace processes to run most of the time
Kernel needs to run when it needs to run and no process should be able to stop it
- The interrupt table: pointers to code that the CPU runs when it receives different hardware or software interrupts. Since the kernel runs first it gets to set the interrupt table. Only supervisor code can change the table not user code.
When the Ethernet card receives data: 
1) The Ethernet card sends an interrupt to the kernel 
2) The CPU calls the kernel code for handling Ethernet data  
When the clock generates a timer interrupt: 
1) The CPU calls the kernel code for handling timer interrupts 
2) In order to process an interrupt, a CPU core has to be taken over
3) That core was probably running a userspace process 
4) Scheduling is all about what to do after having kicked a userspace process off of a core
Normally on a core:
- Userspace is happening
- Interrupt happens 
- core switches to supervisor mode, runs kernel code
- last part of kernel code is scheduler, chooses which userspace code to run. Repeat the process
Kernel is entered via interrupts, exited via the scheduler
Scheduler needs to be efficient because interrupts are frequent and we do not want to waste resources that could be put towards actual computing
Entry and exit to kernel has to do low-level tasks like changing CPU modes, manage CPU registers. This is Assembly code. 
Go to http://elixir.free-electrons.com/linux/v4.4.83/source
Go into arch: code specific to CPU architecture. Two most commons are arm and x86
Go up one level
Drivers directory have drivers for all kinds of things
Linux Kernel designed for x86 but it now abstracted to support different architectures. 
Go into architecture -> x86 -> entry 
Open entry.s: Large assembly language file. This is the code that runs before system call specific code.
shced.h -> Contains definition of task_struct. Data structure that keeps track of a process.
What criteria should the scheduler use in deciding what to run?
- Fairness: Starvation is when a program does not get the CPU 
 - Everyone should get a turn
- CPU is passed around
- Nobody should be left that they never get the CPU
- Prevent Starvation 
- Absent other conditions, equal allocation of resources
 
- Bias resources towards "foreground" tasks in interactive systems
- never Biased enough
- always hacks, heuristics
 
Memory overcommitment
Memory does not get allocated when you mmap. It is loaded "lazily" in the Linux kernel. It is possible to allocate way more memory than can ever be used. 
Goes into "memory debt"