Operating Systems 2017F Lecture 23: Difference between revisions
| HebaJallad (talk | contribs) | |||
| (7 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| ==Video== | |||
| == Additional Notes == | [http://homeostasis.scs.carleton.ca/~soma/os-2017f/lectures/comp3000-2017f-lec23-07Dec2017.mp4 Lecture 23 Video] | ||
| ==Notes== | |||
| ===In Class=== | |||
| <pre> | |||
| Lecture 23 | |||
| ---------- | |||
| How can you tell when a process has been compromised? | |||
|  - from outside the process | |||
| Classic: signatures | |||
|  - is it running "bad code" | |||
|  - is it doing "bad things" | |||
|    - bad system calls | |||
| For a process to do damage, it has to make "bad" system calls | |||
| How can I tell if a process is making bad system calls? | |||
| I want to be lazy | |||
|  - complex rules are a pain | |||
|  - and, they don't work well either | |||
| Make the computer solve the problem for me of determining what is good and bad | |||
|  - use machine learning | |||
| But I can't teach good versus bad because I don't know bad very well | |||
| But...I do know how systems "normally" behave | |||
| How about teaching the system to differentiate normal from abnormal? | |||
|  - normal is "good" | |||
|  - abnormal may be bad | |||
| abnormal but not bad => false positive | |||
| How do we detect abnormal system calls? | |||
| Learn normal patterns of system calls over time | |||
| once you've learned enough, watch for abnormal system calls | |||
| Since I'm lazy, I want it to learn as it runs | |||
|  - and automatically decide when it has learned enough | |||
| Could I do the learning in a process (or set of processes)? | |||
|  - you could, but all data would have to come from the kernel | |||
| Want something fast and simple | |||
|  - implement in the kernel | |||
| How simple could it be? | |||
| First assumption: ignore arguments | |||
| Second assumption: look at ordering of systems calls on a per-thread, per-process basis | |||
| Third assumption: characterize processes based on the executable they are running | |||
|  - model per executable, each trained on multiple processes | |||
| How to model the trace of system calls coming from a process? | |||
| * frequency analysis? | |||
|   - on a system call basis | |||
|   - high variance | |||
| * what system calls are made (and not made)? | |||
| * short sequences of system calls? | |||
|   6-10 | |||
| </pre> | |||
| Addiional notes : | |||
| Assignment 4 Review: | |||
| 11) Dd write things in blocks so each write is a system call | |||
| If it is read then both bs and ibs are correct  | |||
| You can also dd and then run strace on it and see how many bytes you write | |||
| 12) kernel doesn’t make a system call , processes do. System call transition from kernel to userspace. You can do function system  | |||
| ********************************************************************************* | |||
| Someone messed up a process, how can you identify has been compromised from outside the process? | |||
| -	 > use classes, signatures, pattern matching, running bad code,bad strings etc. | |||
| o	Process doing bad things, such as system calls, this may damage the system | |||
| o	Ex: password programs starts modifying file other than the password | |||
| -	Refine the question, how can I identify if a process is making bad system calls? | |||
| o	Inspecting this while the program is running is inefficient and expensive  | |||
| o	Be lazy and don’t make rules as a scale  | |||
| o	Make the computer solve this problem and determine if it is bad or good by using machine learning | |||
| 	However , I cant tell it the difference between good and bad but I don’t know bad very well | |||
| 	 I know how the computer behaves | |||
| -	Refine the question again: How about teaching the system differentiate normal and abnormal? | |||
| o	Normal is good | |||
| o	Abnormal may be bad | |||
| -	False positives are bad(especially in security) | |||
| -	 Refine the question, how do we detect abnormal system call? | |||
| o	By learning normal patterns of system call over time then watch for abnormal system calls. | |||
| -	System calls are complicates, some have arguments like execve and some don’t like fork. To learn all of their complexity | |||
| -	Learn as it runs: | |||
| o	To automatically decide when it has learned enough  | |||
| -	You can do the learning in a process but you want it to be fast and simple and implement in the kernel.  | |||
| -	You want something really simple and run it as you go, but how simple could it be? | |||
| o	Do strace xclock | |||
| 1)	Ignore arguments | |||
| 2)	Analyze the ordering of system calls on per-thread, per-process basis | |||
| 3)	Characterize system calls behavior , according to the executable they are running   | |||
| a.	Example: xclock cant be based of ls  | |||
| b.	Model per executable (can do frequency analysis) | |||
| -Refine, how to model the trace of system, calls coming from process? | |||
| -> Frequency analysis  | |||
| - ex : Is xclock behaving weird,  | |||
| - run strace of different programs and watch the variation in a the pattern of sequences and it detects if a program has been compromised | |||
| -what system calls are made and not made? | |||
| - Short sequences | |||
| 	- 6 – 10 | |||
| ===Additional Notes=== | |||
| Written solutions for midterm exam are on the course webpage <br> | Written solutions for midterm exam are on the course webpage <br> | ||
| Assignment 4 <br> | Assignment 4 <br> | ||
| Q2: SSH keygen generates the secret key and the public key file. The private key is stored in the private key file: .ssh/id_rsa  <br> | Q2: SSH keygen generates the secret key and the public key file. The private key is stored in the private key file: .ssh/id_rsa  <br> | ||
| Q4: Both lines because first you start from 1 and then increment from there. <br> | Q4: Both lines because first you start from 1 and then increment from there. <br> | ||
| Q12: | Q11: Only bs because it is write. Would be ibs and bs if it was read. <br> | ||
| Q12: Local kernel forwards the write system call but doesn't actually make the system call. Kernels don't make system calls.<br> | |||
| Lecture 23 Prof Notes <br> | |||
| ----------------- | |||
| How can you tell when a process has been compromised. <br> | |||
| - from outside the process <br> | |||
| Use signatures <br> | |||
| - is it running "bad code" <br> | |||
| - is it doing "bad things" <br> | |||
| -- For example: password program start modifying files other than etc/password. You could specify rules to prevent this. <br> | |||
| -- bad system calls<br> | |||
| For a process to do damage, it has to make "bad" system calls <br> | |||
| I want to be lazy <br> | |||
| - complex rules are a pain <br>  | |||
| - and they don't work well either <br>  | |||
| Make the computer solve this problem <br> | |||
| - Use Machine Learning <br> | |||
| I can't teach good versus bad if I don't know what is bad <br> | |||
| But I know how systems behave normally <br> | |||
| How about teaching the system to differentiate normal from abnormal <br> | |||
| - normal is good <br> | |||
| - abnormal may be bad <br> | |||
| abnormal but not bad => false positive <br> | |||
| False positives can be a big issue because they may cause ppl to not trust the machine's detection capabilities <br> | |||
| <br> | |||
| How can we detect abnormal system calls? <br> | |||
| Learn normal patterns of system calls over time <br> | |||
| Once you've learned enough, watch for abnormal system calls <br> | |||
| Since I'm lazy, I want to learn it as it runs <br> | |||
| - and automatically decide when it has learned enough <br> | |||
| <br> | |||
| Could I do the learning in a process (or set of processes)? <br> | |||
| you could, but all data would have to come from the kernel <br> | |||
| Want something fast and simple <br> | |||
| - implement in the kernel <br>  | |||
| How simple could it be? <br> | |||
| <br> | |||
| First assumption: ignore arguments <br> | |||
| Second assumption: look at ordering of system calls on a per-thread, per-process basis <br> | |||
| Third Assumption: characterize processes based on the executable they are running <br> | |||
| model per executable, each trained on multiple processes <br> | |||
| How to model the trace of system calls coming from a procesd <br> | |||
| * frequency analysis? <br> | |||
| - on a system call basis <br> | |||
| - high variance <br> | |||
| * what system calls are made (and not made)? <br> | |||
| * short sequence of system calls? 6-10 calls <br> | |||
|  Lecture 23 | |||
| How can you tell a process has been compromised (i.e. from outside the process)? | |||
| * The process is working on behalf of an attacker | |||
| <br> | |||
| Classic way to do this:  | |||
| * Pattern matching -> signatures | |||
| ::* is the process running bad code? | |||
| ::* is the process doing bad things? | |||
| :::* i.e. /etc/passwd -> password prog. should only be able to access it | |||
| :::* if a process is going to do bad things, it's going to make "bad" system calls | |||
| <br> | |||
| So, how can we tell if a process is making bad system calls? | |||
| <br> | |||
| Don't want to sit and write complex rules to determine: | |||
| * Which programs should make which system calls, etc. | |||
| ::* i.e. policy based systems and sandboxing of processes | |||
| <br> | |||
| Therefore, we want the computer to determine what call is good/bad. | |||
| * i.e. use machine learning | |||
| <br> | |||
| The issue is, we have to demonstrate, not just "good", but also "bad"<br> | |||
| We have lots of examples of "bad", but is not necessarily representative of "bad"<br> | |||
| Difficult to enumerate all possible occurences of "bad"<br> | |||
| <br> | |||
| However, we know how systems "normally" behave | |||
| <br> | |||
| How about teaching the system to differentiate normal from abnormal? | |||
| * Assume: | |||
| ::* normal is "good" | |||
| ::* abnormal may be bad | |||
| :::* there is no guarantee that abnormal is bad, however, if it's bad, but not abnormal... we're in trouble | |||
| :::* false positives are bad (i.e. abnormal but not good) | |||
| <br> | |||
| How do we detect abnormal system calls? | |||
| * a machine learning problem | |||
| * the system should learn as it runs and decide when it has learned "enough" | |||
| ::* learn normal patterns of system calls over time | |||
| :::* once learned enough, watch for abnormal system calls | |||
| <br> | |||
| Could we do the learning within processes? | |||
| * possible, but all data would have to come from the kernel | |||
| <br> | |||
| Want something fast and simple, so it can be implemented in the kernel | |||
| * you're right at ground-level, where decisions are being made | |||
| ::* i.e. if bad system call being made -> can stop it immediatelly | |||
| * don't want to be training a neural network to do this -> too complicated, too much overhead | |||
| <br> | |||
| ''Thu 7 Dec 2017 13:53:01 EST -> Video of observing system calls, ls vs. xclock'' | |||
| <br> | |||
| === First assumption: === | |||
| ignore the arguments system calls are making -> look at the calls themselves | |||
| * but, different processes invoke different calls -> how to compare them? | |||
| * even multi-threaded processes will mirror the structure of the code in the calls it makes | |||
| === Second assumption: === | |||
| look at the ordering of system calls on a per-thread, per-process basis | |||
| * doesn't make sense to think of 'ls' system calls in the context of 'xclock' system calls | |||
| <br> | |||
| Therefore, any profiling will be based on the code being executed | |||
| === Third assumption: === | |||
| characterize processes based on the executable they are running | |||
| * model per executable, with each one trained on multiple processes | |||
| <br> | |||
| How do we model the trace of system calls coming from a process? | |||
| * How often do different system calls hapen? -> frequency analysis | |||
| * high variance -> the calls change frequently | |||
| ::* i.e. ls of a large dir vs. small dir | |||
| ::* What system calls does a process makes or doesn't make? | |||
| ::* Rather than examining if a process does or doesn't make a particular system call, instead look at short sequences of system calls being made. | |||
| :::* What is the variation in the pattern of sequences of calls being made? A compromised program will be detectable. | |||
| :::* Table lookup of sequences made by a program and compare against new sequences | |||
| <br> | |||
| How short is a short sequence of system calls? -> 6 to 10 | |||
| When a program is running, the short sequences define the control flow path of the program<br> | |||
| The short sequences together represent the control flow<br> | |||
| When a program is exploited, an abnormal control flow, an uncommon path, is being used<br> | |||
| <br> | |||
| Try the simple hack first, rather than designing/engineering a complex solution | |||
| * the simple hack will often present valuable insights | |||
Latest revision as of 04:20, 12 December 2017
Video
Notes
In Class
Lecture 23 ---------- How can you tell when a process has been compromised? - from outside the process Classic: signatures - is it running "bad code" - is it doing "bad things" - bad system calls For a process to do damage, it has to make "bad" system calls How can I tell if a process is making bad system calls? I want to be lazy - complex rules are a pain - and, they don't work well either Make the computer solve the problem for me of determining what is good and bad - use machine learning But I can't teach good versus bad because I don't know bad very well But...I do know how systems "normally" behave How about teaching the system to differentiate normal from abnormal? - normal is "good" - abnormal may be bad abnormal but not bad => false positive How do we detect abnormal system calls? Learn normal patterns of system calls over time once you've learned enough, watch for abnormal system calls Since I'm lazy, I want it to learn as it runs - and automatically decide when it has learned enough Could I do the learning in a process (or set of processes)? - you could, but all data would have to come from the kernel Want something fast and simple - implement in the kernel How simple could it be? First assumption: ignore arguments Second assumption: look at ordering of systems calls on a per-thread, per-process basis Third assumption: characterize processes based on the executable they are running - model per executable, each trained on multiple processes How to model the trace of system calls coming from a process? * frequency analysis? - on a system call basis - high variance * what system calls are made (and not made)? * short sequences of system calls? 6-10
Addiional notes :
Assignment 4 Review:
11) Dd write things in blocks so each write is a system call If it is read then both bs and ibs are correct You can also dd and then run strace on it and see how many bytes you write 12) kernel doesn’t make a system call , processes do. System call transition from kernel to userspace. You can do function system
Someone messed up a process, how can you identify has been compromised from outside the process? - > use classes, signatures, pattern matching, running bad code,bad strings etc. o Process doing bad things, such as system calls, this may damage the system o Ex: password programs starts modifying file other than the password - Refine the question, how can I identify if a process is making bad system calls? o Inspecting this while the program is running is inefficient and expensive o Be lazy and don’t make rules as a scale o Make the computer solve this problem and determine if it is bad or good by using machine learning  However , I cant tell it the difference between good and bad but I don’t know bad very well  I know how the computer behaves - Refine the question again: How about teaching the system differentiate normal and abnormal? o Normal is good o Abnormal may be bad - False positives are bad(especially in security) - Refine the question, how do we detect abnormal system call? o By learning normal patterns of system call over time then watch for abnormal system calls. - System calls are complicates, some have arguments like execve and some don’t like fork. To learn all of their complexity - Learn as it runs: o To automatically decide when it has learned enough - You can do the learning in a process but you want it to be fast and simple and implement in the kernel. - You want something really simple and run it as you go, but how simple could it be? o Do strace xclock 1) Ignore arguments 2) Analyze the ordering of system calls on per-thread, per-process basis 3) Characterize system calls behavior , according to the executable they are running a. Example: xclock cant be based of ls b. Model per executable (can do frequency analysis) -Refine, how to model the trace of system, calls coming from process? -> Frequency analysis - ex : Is xclock behaving weird, - run strace of different programs and watch the variation in a the pattern of sequences and it detects if a program has been compromised -what system calls are made and not made? - Short sequences - 6 – 10
Additional Notes
Written solutions for midterm exam are on the course webpage 
Assignment 4 
Q2: SSH keygen generates the secret key and the public key file. The private key is stored in the private key file: .ssh/id_rsa  
Q4: Both lines because first you start from 1 and then increment from there. 
Q11: Only bs because it is write. Would be ibs and bs if it was read. 
Q12: Local kernel forwards the write system call but doesn't actually make the system call. Kernels don't make system calls.
Lecture 23 Prof Notes 
How can you tell when a process has been compromised. 
- from outside the process 
Use signatures 
- is it running "bad code" 
- is it doing "bad things" 
-- For example: password program start modifying files other than etc/password. You could specify rules to prevent this. 
-- bad system calls
For a process to do damage, it has to make "bad" system calls 
I want to be lazy 
- complex rules are a pain 
 
- and they don't work well either 
 
Make the computer solve this problem 
- Use Machine Learning 
I can't teach good versus bad if I don't know what is bad 
But I know how systems behave normally 
How about teaching the system to differentiate normal from abnormal 
- normal is good 
- abnormal may be bad 
abnormal but not bad => false positive 
False positives can be a big issue because they may cause ppl to not trust the machine's detection capabilities 
How can we detect abnormal system calls? 
Learn normal patterns of system calls over time 
Once you've learned enough, watch for abnormal system calls 
Since I'm lazy, I want to learn it as it runs 
- and automatically decide when it has learned enough 
Could I do the learning in a process (or set of processes)? 
you could, but all data would have to come from the kernel 
Want something fast and simple 
- implement in the kernel 
 
How simple could it be? 
First assumption: ignore arguments 
Second assumption: look at ordering of system calls on a per-thread, per-process basis 
Third Assumption: characterize processes based on the executable they are running 
model per executable, each trained on multiple processes 
How to model the trace of system calls coming from a procesd 
- frequency analysis? 
- on a system call basis 
- high variance 
- what system calls are made (and not made)? 
- short sequence of system calls? 6-10 calls 
Lecture 23
How can you tell a process has been compromised (i.e. from outside the process)?
- The process is working on behalf of an attacker
Classic way to do this: 
- Pattern matching -> signatures
- is the process running bad code?
- is the process doing bad things?
 - i.e. /etc/passwd -> password prog. should only be able to access it
- if a process is going to do bad things, it's going to make "bad" system calls
 
 
 
So, how can we tell if a process is making bad system calls?
Don't want to sit and write complex rules to determine:
- Which programs should make which system calls, etc.
- i.e. policy based systems and sandboxing of processes
 
 
Therefore, we want the computer to determine what call is good/bad.
- i.e. use machine learning
The issue is, we have to demonstrate, not just "good", but also "bad"
We have lots of examples of "bad", but is not necessarily representative of "bad"
Difficult to enumerate all possible occurences of "bad"
However, we know how systems "normally" behave
How about teaching the system to differentiate normal from abnormal?
- Assume:
- normal is "good"
- abnormal may be bad
 - there is no guarantee that abnormal is bad, however, if it's bad, but not abnormal... we're in trouble
- false positives are bad (i.e. abnormal but not good)
 
 
 
How do we detect abnormal system calls?
- a machine learning problem
- the system should learn as it runs and decide when it has learned "enough"
- learn normal patterns of system calls over time
 - once learned enough, watch for abnormal system calls
 
 
 
Could we do the learning within processes?
- possible, but all data would have to come from the kernel
Want something fast and simple, so it can be implemented in the kernel
- you're right at ground-level, where decisions are being made
- i.e. if bad system call being made -> can stop it immediatelly
 
 
- don't want to be training a neural network to do this -> too complicated, too much overhead
Thu 7 Dec 2017 13:53:01 EST -> Video of observing system calls, ls vs. xclock
First assumption:
ignore the arguments system calls are making -> look at the calls themselves
- but, different processes invoke different calls -> how to compare them?
- even multi-threaded processes will mirror the structure of the code in the calls it makes
Second assumption:
look at the ordering of system calls on a per-thread, per-process basis
- doesn't make sense to think of 'ls' system calls in the context of 'xclock' system calls
Therefore, any profiling will be based on the code being executed
Third assumption:
characterize processes based on the executable they are running
- model per executable, with each one trained on multiple processes
How do we model the trace of system calls coming from a process?
- How often do different system calls hapen? -> frequency analysis
- high variance -> the calls change frequently
- i.e. ls of a large dir vs. small dir
 
 
- What system calls does a process makes or doesn't make?
 
 
- Rather than examining if a process does or doesn't make a particular system call, instead look at short sequences of system calls being made.
 - What is the variation in the pattern of sequences of calls being made? A compromised program will be detectable.
 
 
 
- Table lookup of sequences made by a program and compare against new sequences
 
 
 
How short is a short sequence of system calls? -> 6 to 10
When a program is running, the short sequences define the control flow path of the program
The short sequences together represent the control flow
When a program is exploited, an abnormal control flow, an uncommon path, is being used
Try the simple hack first, rather than designing/engineering a complex solution
- the simple hack will often present valuable insights