Operating Systems 2017F Lecture 23: Difference between revisions
No edit summary  | 
				|||
| Line 130: | Line 130: | ||
* what system calls are made (and not made)? <br>  | * what system calls are made (and not made)? <br>  | ||
* short sequence of system calls? 6-10 calls <br>  | * short sequence of system calls? 6-10 calls <br>  | ||
 Lecture 23  | |||
How can you tell a process has been compromised (i.e. from outside the process)?  | |||
* The process is working on behalf of an attacker  | |||
<br>  | |||
Classic way to do this:   | |||
* Pattern matching -> signatures  | |||
::* is the process running bad code?  | |||
::* is the process doing bad things?  | |||
:::* i.e. /etc/passwd -> password prog. should only be able to access it  | |||
:::* if a process is going to do bad things, it's going to make "bad" system calls  | |||
<br>  | |||
So, how can we tell if a process is making bad system calls?  | |||
<br>  | |||
Don't want to sit and write complex rules to determine:  | |||
* Which programs should make which system calls, etc.  | |||
::* i.e. policy based systems and sandboxing of processes  | |||
<br>  | |||
Therefore, we want the computer to determine what call is good/bad.  | |||
* i.e. use machine learning  | |||
<br>  | |||
The issue is, we have to demonstrate, not just "good", but also "bad"<br>  | |||
We have lots of examples of "bad", but is not necessarily representative of "bad"<br>  | |||
Difficult to enumerate all possible occurences of "bad"<br>  | |||
<br>  | |||
However, we know how systems "normally" behave  | |||
<br>  | |||
How about teaching the system to differentiate normal from abnormal?  | |||
* Assume:  | |||
::* normal is "good"  | |||
::* abnormal may be bad  | |||
:::* there is no guarantee that abnormal is bad, however, if it's bad, but not abnormal... we're in trouble  | |||
:::* false positives are bad (i.e. abnormal but not good)  | |||
<br>  | |||
How do we detect abnormal system calls?  | |||
* a machine learning problem  | |||
* the system should learn as it runs and decide when it has learned "enough"  | |||
::* learn normal patterns of system calls over time  | |||
:::* once learned enough, watch for abnormal system calls  | |||
<br>  | |||
Could we do the learning within processes?  | |||
* possible, but all data would have to come from the kernel  | |||
<br>  | |||
Want something fast and simple, so it can be implemented in the kernel  | |||
* you're right at ground-level, where decisions are being made  | |||
::* i.e. if bad system call being made -> can stop it immediatelly  | |||
* don't want to be training a neural network to do this -> too complicated, too much overhead  | |||
<br>  | |||
''Thu 7 Dec 2017 13:53:01 EST -> Video of observing system calls, ls vs. xclock''  | |||
<br>  | |||
=== First assumption: ===  | |||
ignore the arguments system calls are making -> look at the calls themselves  | |||
* but, different processes invoke different calls -> how to compare them?  | |||
* even multi-threaded processes will mirror the structure of the code in the calls it makes  | |||
=== Second assumption: ===  | |||
look at the ordering of system calls on a per-thread, per-process basis  | |||
* doesn't make sense to think of 'ls' system calls in the context of 'xclock' system calls  | |||
<br>  | |||
Therefore, any profiling will be based on the code being executed  | |||
=== Third assumption: ===  | |||
characterize processes based on the executable they are running  | |||
* model per executable, with each one trained on multiple processes  | |||
<br>  | |||
How do we model the trace of system calls coming from a process?  | |||
* How often do different system calls hapen? -> frequency analysis  | |||
* high variance -> the calls change frequently  | |||
::* i.e. ls of a large dir vs. small dir  | |||
::* What system calls does a process makes or doesn't make?  | |||
::* Rather than examining if a process does or doesn't make a particular system call, instead look at short sequences of system calls being made.  | |||
:::* What is the variation in the pattern of sequences of calls being made? A compromised program will be detectable.  | |||
:::* Table lookup of sequences made by a program and compare against new sequences  | |||
<br>  | |||
How short is a short sequence of system calls? -> 6 to 10  | |||
When a program is running, the short sequences define the control flow path of the program<br>  | |||
The short sequences together represent the control flow<br>  | |||
When a program is exploited, an abnormal control flow, an uncommon path, is being used<br>  | |||
<br>  | |||
Try the simple hack first, rather than designing/engineering a complex solution  | |||
* the simple hack will often present valuable insights  | |||
Revision as of 16:16, 9 December 2017
Video
Notes
In Class
Lecture 23 ---------- How can you tell when a process has been compromised? - from outside the process Classic: signatures - is it running "bad code" - is it doing "bad things" - bad system calls For a process to do damage, it has to make "bad" system calls How can I tell if a process is making bad system calls? I want to be lazy - complex rules are a pain - and, they don't work well either Make the computer solve the problem for me of determining what is good and bad - use machine learning But I can't teach good versus bad because I don't know bad very well But...I do know how systems "normally" behave How about teaching the system to differentiate normal from abnormal? - normal is "good" - abnormal may be bad abnormal but not bad => false positive How do we detect abnormal system calls? Learn normal patterns of system calls over time once you've learned enough, watch for abnormal system calls Since I'm lazy, I want it to learn as it runs - and automatically decide when it has learned enough Could I do the learning in a process (or set of processes)? - you could, but all data would have to come from the kernel Want something fast and simple - implement in the kernel How simple could it be? First assumption: ignore arguments Second assumption: look at ordering of systems calls on a per-thread, per-process basis Third assumption: characterize processes based on the executable they are running - model per executable, each trained on multiple processes How to model the trace of system calls coming from a process? * frequency analysis? - on a system call basis - high variance * what system calls are made (and not made)? * short sequences of system calls? 6-10
Additional Notes
Written solutions for midterm exam are on the course webpage 
Assignment 4 
Q2: SSH keygen generates the secret key and the public key file. The private key is stored in the private key file: .ssh/id_rsa  
Q4: Both lines because first you start from 1 and then increment from there. 
Q11: Only bs because it is write. Would be ibs and bs if it was read. 
Q12: Local kernel forwards the write system call but doesn't actually make the system call. Kernels don't make system calls.
Lecture 23 Prof Notes 
How can you tell when a process has been compromised. 
- from outside the process 
Use signatures 
- is it running "bad code" 
- is it doing "bad things" 
-- For example: password program start modifying files other than etc/password. You could specify rules to prevent this. 
-- bad system calls
For a process to do damage, it has to make "bad" system calls 
I want to be lazy 
- complex rules are a pain 
 
- and they don't work well either 
 
Make the computer solve this problem 
- Use Machine Learning 
I can't teach good versus bad if I don't know what is bad 
But I know how systems behave normally 
How about teaching the system to differentiate normal from abnormal 
- normal is good 
- abnormal may be bad 
abnormal but not bad => false positive 
False positives can be a big issue because they may cause ppl to not trust the machine's detection capabilities 
How can we detect abnormal system calls? 
Learn normal patterns of system calls over time 
Once you've learned enough, watch for abnormal system calls 
Since I'm lazy, I want to learn it as it runs 
- and automatically decide when it has learned enough 
Could I do the learning in a process (or set of processes)? 
you could, but all data would have to come from the kernel 
Want something fast and simple 
- implement in the kernel 
 
How simple could it be? 
First assumption: ignore arguments 
Second assumption: look at ordering of system calls on a per-thread, per-process basis 
Third Assumption: characterize processes based on the executable they are running 
model per executable, each trained on multiple processes 
How to model the trace of system calls coming from a procesd 
- frequency analysis? 
 
- on a system call basis 
- high variance 
- what system calls are made (and not made)? 
 - short sequence of system calls? 6-10 calls 
 
Lecture 23
How can you tell a process has been compromised (i.e. from outside the process)?
- The process is working on behalf of an attacker
 
Classic way to do this: 
- Pattern matching -> signatures
 
- is the process running bad code?
 - is the process doing bad things?
 
- i.e. /etc/passwd -> password prog. should only be able to access it
 - if a process is going to do bad things, it's going to make "bad" system calls
 
So, how can we tell if a process is making bad system calls?
Don't want to sit and write complex rules to determine:
- Which programs should make which system calls, etc.
 
- i.e. policy based systems and sandboxing of processes
 
Therefore, we want the computer to determine what call is good/bad.
- i.e. use machine learning
 
The issue is, we have to demonstrate, not just "good", but also "bad"
We have lots of examples of "bad", but is not necessarily representative of "bad"
Difficult to enumerate all possible occurences of "bad"
However, we know how systems "normally" behave
How about teaching the system to differentiate normal from abnormal?
- Assume:
 
- normal is "good"
 - abnormal may be bad
 
- there is no guarantee that abnormal is bad, however, if it's bad, but not abnormal... we're in trouble
 - false positives are bad (i.e. abnormal but not good)
 
How do we detect abnormal system calls?
- a machine learning problem
 - the system should learn as it runs and decide when it has learned "enough"
 
- learn normal patterns of system calls over time
 
- once learned enough, watch for abnormal system calls
 
Could we do the learning within processes?
- possible, but all data would have to come from the kernel
 
Want something fast and simple, so it can be implemented in the kernel
- you're right at ground-level, where decisions are being made
 
- i.e. if bad system call being made -> can stop it immediatelly
 
- don't want to be training a neural network to do this -> too complicated, too much overhead
 
Thu 7 Dec 2017 13:53:01 EST -> Video of observing system calls, ls vs. xclock
First assumption:
ignore the arguments system calls are making -> look at the calls themselves
- but, different processes invoke different calls -> how to compare them?
 - even multi-threaded processes will mirror the structure of the code in the calls it makes
 
Second assumption:
look at the ordering of system calls on a per-thread, per-process basis
- doesn't make sense to think of 'ls' system calls in the context of 'xclock' system calls
 
Therefore, any profiling will be based on the code being executed
Third assumption:
characterize processes based on the executable they are running
- model per executable, with each one trained on multiple processes
 
How do we model the trace of system calls coming from a process?
- How often do different system calls hapen? -> frequency analysis
 - high variance -> the calls change frequently
 
- i.e. ls of a large dir vs. small dir
 
- What system calls does a process makes or doesn't make?
 
- Rather than examining if a process does or doesn't make a particular system call, instead look at short sequences of system calls being made.
 
- What is the variation in the pattern of sequences of calls being made? A compromised program will be detectable.
 
- Table lookup of sequences made by a program and compare against new sequences
 
How short is a short sequence of system calls? -> 6 to 10
When a program is running, the short sequences define the control flow path of the program
The short sequences together represent the control flow
When a program is exploited, an abnormal control flow, an uncommon path, is being used
Try the simple hack first, rather than designing/engineering a complex solution
- the simple hack will often present valuable insights