Operating Systems 2017F Lecture 23: Difference between revisions

From Soma-notes
No edit summary
Aleksp (talk | contribs)
Line 130: Line 130:
* what system calls are made (and not made)? <br>
* what system calls are made (and not made)? <br>
* short sequence of system calls? 6-10 calls <br>
* short sequence of system calls? 6-10 calls <br>
Lecture 23
How can you tell a process has been compromised (i.e. from outside the process)?
* The process is working on behalf of an attacker
<br>
Classic way to do this:
* Pattern matching -> signatures
::* is the process running bad code?
::* is the process doing bad things?
:::* i.e. /etc/passwd -> password prog. should only be able to access it
:::* if a process is going to do bad things, it's going to make "bad" system calls
<br>
So, how can we tell if a process is making bad system calls?
<br>
Don't want to sit and write complex rules to determine:
* Which programs should make which system calls, etc.
::* i.e. policy based systems and sandboxing of processes
<br>
Therefore, we want the computer to determine what call is good/bad.
* i.e. use machine learning
<br>
The issue is, we have to demonstrate, not just "good", but also "bad"<br>
We have lots of examples of "bad", but is not necessarily representative of "bad"<br>
Difficult to enumerate all possible occurences of "bad"<br>
<br>
However, we know how systems "normally" behave
<br>
How about teaching the system to differentiate normal from abnormal?
* Assume:
::* normal is "good"
::* abnormal may be bad
:::* there is no guarantee that abnormal is bad, however, if it's bad, but not abnormal... we're in trouble
:::* false positives are bad (i.e. abnormal but not good)
<br>
How do we detect abnormal system calls?
* a machine learning problem
* the system should learn as it runs and decide when it has learned "enough"
::* learn normal patterns of system calls over time
:::* once learned enough, watch for abnormal system calls
<br>
Could we do the learning within processes?
* possible, but all data would have to come from the kernel
<br>
Want something fast and simple, so it can be implemented in the kernel
* you're right at ground-level, where decisions are being made
::* i.e. if bad system call being made -> can stop it immediatelly
* don't want to be training a neural network to do this -> too complicated, too much overhead
<br>
''Thu 7 Dec 2017 13:53:01 EST -> Video of observing system calls, ls vs. xclock''
<br>
=== First assumption: ===
ignore the arguments system calls are making -> look at the calls themselves
* but, different processes invoke different calls -> how to compare them?
* even multi-threaded processes will mirror the structure of the code in the calls it makes
=== Second assumption: ===
look at the ordering of system calls on a per-thread, per-process basis
* doesn't make sense to think of 'ls' system calls in the context of 'xclock' system calls
<br>
Therefore, any profiling will be based on the code being executed
=== Third assumption: ===
characterize processes based on the executable they are running
* model per executable, with each one trained on multiple processes
<br>
How do we model the trace of system calls coming from a process?
* How often do different system calls hapen? -> frequency analysis
* high variance -> the calls change frequently
::* i.e. ls of a large dir vs. small dir
::* What system calls does a process makes or doesn't make?
::* Rather than examining if a process does or doesn't make a particular system call, instead look at short sequences of system calls being made.
:::* What is the variation in the pattern of sequences of calls being made? A compromised program will be detectable.
:::* Table lookup of sequences made by a program and compare against new sequences
<br>
How short is a short sequence of system calls? -> 6 to 10
When a program is running, the short sequences define the control flow path of the program<br>
The short sequences together represent the control flow<br>
When a program is exploited, an abnormal control flow, an uncommon path, is being used<br>
<br>
Try the simple hack first, rather than designing/engineering a complex solution
* the simple hack will often present valuable insights

Revision as of 16:16, 9 December 2017

Video

Lecture 23 Video

Notes

In Class

Lecture 23
----------

How can you tell when a process has been compromised?
 - from outside the process

Classic: signatures
 - is it running "bad code"
 - is it doing "bad things"
   - bad system calls

For a process to do damage, it has to make "bad" system calls

How can I tell if a process is making bad system calls?

I want to be lazy
 - complex rules are a pain
 - and, they don't work well either

Make the computer solve the problem for me of determining what is good and bad
 - use machine learning

But I can't teach good versus bad because I don't know bad very well

But...I do know how systems "normally" behave

How about teaching the system to differentiate normal from abnormal?
 - normal is "good"
 - abnormal may be bad

abnormal but not bad => false positive


How do we detect abnormal system calls?

Learn normal patterns of system calls over time
once you've learned enough, watch for abnormal system calls

Since I'm lazy, I want it to learn as it runs
 - and automatically decide when it has learned enough

Could I do the learning in a process (or set of processes)?
 - you could, but all data would have to come from the kernel

Want something fast and simple
 - implement in the kernel

How simple could it be?

First assumption: ignore arguments


Second assumption: look at ordering of systems calls on a per-thread, per-process basis

Third assumption: characterize processes based on the executable they are running

 - model per executable, each trained on multiple processes

How to model the trace of system calls coming from a process?

* frequency analysis?
  - on a system call basis
  - high variance

* what system calls are made (and not made)?

* short sequences of system calls?
  6-10


Additional Notes

Written solutions for midterm exam are on the course webpage
Assignment 4
Q2: SSH keygen generates the secret key and the public key file. The private key is stored in the private key file: .ssh/id_rsa
Q4: Both lines because first you start from 1 and then increment from there.
Q11: Only bs because it is write. Would be ibs and bs if it was read.
Q12: Local kernel forwards the write system call but doesn't actually make the system call. Kernels don't make system calls.
Lecture 23 Prof Notes


How can you tell when a process has been compromised.
- from outside the process
Use signatures
- is it running "bad code"
- is it doing "bad things"
-- For example: password program start modifying files other than etc/password. You could specify rules to prevent this.
-- bad system calls
For a process to do damage, it has to make "bad" system calls
I want to be lazy
- complex rules are a pain
- and they don't work well either
Make the computer solve this problem
- Use Machine Learning
I can't teach good versus bad if I don't know what is bad
But I know how systems behave normally
How about teaching the system to differentiate normal from abnormal
- normal is good
- abnormal may be bad
abnormal but not bad => false positive
False positives can be a big issue because they may cause ppl to not trust the machine's detection capabilities

How can we detect abnormal system calls?
Learn normal patterns of system calls over time
Once you've learned enough, watch for abnormal system calls
Since I'm lazy, I want to learn it as it runs
- and automatically decide when it has learned enough

Could I do the learning in a process (or set of processes)?
you could, but all data would have to come from the kernel
Want something fast and simple
- implement in the kernel
How simple could it be?

First assumption: ignore arguments
Second assumption: look at ordering of system calls on a per-thread, per-process basis
Third Assumption: characterize processes based on the executable they are running
model per executable, each trained on multiple processes
How to model the trace of system calls coming from a procesd

  • frequency analysis?

- on a system call basis
- high variance

  • what system calls are made (and not made)?
  • short sequence of system calls? 6-10 calls
Lecture 23

How can you tell a process has been compromised (i.e. from outside the process)?

  • The process is working on behalf of an attacker


Classic way to do this:

  • Pattern matching -> signatures
  • is the process running bad code?
  • is the process doing bad things?
  • i.e. /etc/passwd -> password prog. should only be able to access it
  • if a process is going to do bad things, it's going to make "bad" system calls


So, how can we tell if a process is making bad system calls?
Don't want to sit and write complex rules to determine:

  • Which programs should make which system calls, etc.
  • i.e. policy based systems and sandboxing of processes


Therefore, we want the computer to determine what call is good/bad.

  • i.e. use machine learning


The issue is, we have to demonstrate, not just "good", but also "bad"
We have lots of examples of "bad", but is not necessarily representative of "bad"
Difficult to enumerate all possible occurences of "bad"

However, we know how systems "normally" behave
How about teaching the system to differentiate normal from abnormal?

  • Assume:
  • normal is "good"
  • abnormal may be bad
  • there is no guarantee that abnormal is bad, however, if it's bad, but not abnormal... we're in trouble
  • false positives are bad (i.e. abnormal but not good)


How do we detect abnormal system calls?

  • a machine learning problem
  • the system should learn as it runs and decide when it has learned "enough"
  • learn normal patterns of system calls over time
  • once learned enough, watch for abnormal system calls


Could we do the learning within processes?

  • possible, but all data would have to come from the kernel


Want something fast and simple, so it can be implemented in the kernel

  • you're right at ground-level, where decisions are being made
  • i.e. if bad system call being made -> can stop it immediatelly
  • don't want to be training a neural network to do this -> too complicated, too much overhead


Thu 7 Dec 2017 13:53:01 EST -> Video of observing system calls, ls vs. xclock

First assumption:

ignore the arguments system calls are making -> look at the calls themselves

  • but, different processes invoke different calls -> how to compare them?
  • even multi-threaded processes will mirror the structure of the code in the calls it makes

Second assumption:

look at the ordering of system calls on a per-thread, per-process basis

  • doesn't make sense to think of 'ls' system calls in the context of 'xclock' system calls


Therefore, any profiling will be based on the code being executed

Third assumption:

characterize processes based on the executable they are running

  • model per executable, with each one trained on multiple processes


How do we model the trace of system calls coming from a process?

  • How often do different system calls hapen? -> frequency analysis
  • high variance -> the calls change frequently
  • i.e. ls of a large dir vs. small dir
  • What system calls does a process makes or doesn't make?
  • Rather than examining if a process does or doesn't make a particular system call, instead look at short sequences of system calls being made.
  • What is the variation in the pattern of sequences of calls being made? A compromised program will be detectable.
  • Table lookup of sequences made by a program and compare against new sequences


How short is a short sequence of system calls? -> 6 to 10

When a program is running, the short sequences define the control flow path of the program
The short sequences together represent the control flow
When a program is exploited, an abnormal control flow, an uncommon path, is being used

Try the simple hack first, rather than designing/engineering a complex solution

  • the simple hack will often present valuable insights