Operating Systems 2017F Lecture 23

Video

Notes

In Class

Lecture 23
----------

How can you tell when a process has been compromised?
 - from outside the process

Classic: signatures
 - is it running "bad code"
 - is it doing "bad things"
   - bad system calls

For a process to do damage, it has to make "bad" system calls

How can I tell if a process is making bad system calls?

I want to be lazy
 - complex rules are a pain
 - and, they don't work well either

Make the computer solve the problem for me of determining what is good and bad
 - use machine learning

But I can't teach good versus bad because I don't know bad very well

But...I do know how systems "normally" behave

How about teaching the system to differentiate normal from abnormal?
 - normal is "good"
 - abnormal may be bad

abnormal but not bad => false positive


How do we detect abnormal system calls?

Learn normal patterns of system calls over time
once you've learned enough, watch for abnormal system calls

Since I'm lazy, I want it to learn as it runs
 - and automatically decide when it has learned enough

Could I do the learning in a process (or set of processes)?
 - you could, but all data would have to come from the kernel

Want something fast and simple
 - implement in the kernel

How simple could it be?

First assumption: ignore arguments


Second assumption: look at ordering of systems calls on a per-thread, per-process basis

Third assumption: characterize processes based on the executable they are running

 - model per executable, each trained on multiple processes

How to model the trace of system calls coming from a process?

* frequency analysis?
  - on a system call basis
  - high variance

* what system calls are made (and not made)?

* short sequences of system calls?
  6-10

Addiional notes :

Assignment 4 Review:

11) Dd write things in blocks so each write is a system call If it is read then both bs and ibs are correct You can also dd and then run strace on it and see how many bytes you write 12) kernel doesn’t make a system call , processes do. System call transition from kernel to userspace. You can do function system

Someone messed up a process, how can you identify has been compromised from outside the process? - > use classes, signatures, pattern matching, running bad code,bad strings etc. o Process doing bad things, such as system calls, this may damage the system o Ex: password programs starts modifying file other than the password - Refine the question, how can I identify if a process is making bad system calls? o Inspecting this while the program is running is inefficient and expensive o Be lazy and don’t make rules as a scale o Make the computer solve this problem and determine if it is bad or good by using machine learning  However , I cant tell it the difference between good and bad but I don’t know bad very well  I know how the computer behaves - Refine the question again: How about teaching the system differentiate normal and abnormal? o Normal is good o Abnormal may be bad - False positives are bad(especially in security) - Refine the question, how do we detect abnormal system call? o By learning normal patterns of system call over time then watch for abnormal system calls. - System calls are complicates, some have arguments like execve and some don’t like fork. To learn all of their complexity - Learn as it runs: o To automatically decide when it has learned enough - You can do the learning in a process but you want it to be fast and simple and implement in the kernel. - You want something really simple and run it as you go, but how simple could it be? o Do strace xclock 1) Ignore arguments 2) Analyze the ordering of system calls on per-thread, per-process basis 3) Characterize system calls behavior , according to the executable they are running a. Example: xclock cant be based of ls b. Model per executable (can do frequency analysis) -Refine, how to model the trace of system, calls coming from process? -> Frequency analysis - ex : Is xclock behaving weird, - run strace of different programs and watch the variation in a the pattern of sequences and it detects if a program has been compromised -what system calls are made and not made? - Short sequences - 6 – 10

Additional Notes

Written solutions for midterm exam are on the course webpage
Assignment 4
Q2: SSH keygen generates the secret key and the public key file. The private key is stored in the private key file: .ssh/id_rsa
Q4: Both lines because first you start from 1 and then increment from there.
Q11: Only bs because it is write. Would be ibs and bs if it was read.
Q12: Local kernel forwards the write system call but doesn't actually make the system call. Kernels don't make system calls.
Lecture 23 Prof Notes

How can you tell when a process has been compromised.
- from outside the process
Use signatures
- is it running "bad code"
- is it doing "bad things"
-- For example: password program start modifying files other than etc/password. You could specify rules to prevent this.
-- bad system calls
For a process to do damage, it has to make "bad" system calls
I want to be lazy
- complex rules are a pain
- and they don't work well either
Make the computer solve this problem
- Use Machine Learning
I can't teach good versus bad if I don't know what is bad
But I know how systems behave normally
How about teaching the system to differentiate normal from abnormal
- normal is good
- abnormal may be bad
abnormal but not bad => false positive
False positives can be a big issue because they may cause ppl to not trust the machine's detection capabilities

How can we detect abnormal system calls?
Learn normal patterns of system calls over time
Once you've learned enough, watch for abnormal system calls
Since I'm lazy, I want to learn it as it runs
- and automatically decide when it has learned enough

Could I do the learning in a process (or set of processes)?
you could, but all data would have to come from the kernel
Want something fast and simple
- implement in the kernel
How simple could it be?

First assumption: ignore arguments
Second assumption: look at ordering of system calls on a per-thread, per-process basis
Third Assumption: characterize processes based on the executable they are running
model per executable, each trained on multiple processes
How to model the trace of system calls coming from a procesd

frequency analysis?

- on a system call basis
- high variance

what system calls are made (and not made)?
short sequence of system calls? 6-10 calls

Lecture 23

How can you tell a process has been compromised (i.e. from outside the process)?

The process is working on behalf of an attacker

Classic way to do this:

Pattern matching -> signatures

is the process running bad code?
is the process doing bad things?

i.e. /etc/passwd -> password prog. should only be able to access it
if a process is going to do bad things, it's going to make "bad" system calls

So, how can we tell if a process is making bad system calls?
Don't want to sit and write complex rules to determine:

Which programs should make which system calls, etc.

i.e. policy based systems and sandboxing of processes

Therefore, we want the computer to determine what call is good/bad.

i.e. use machine learning

The issue is, we have to demonstrate, not just "good", but also "bad"
We have lots of examples of "bad", but is not necessarily representative of "bad"
Difficult to enumerate all possible occurences of "bad"

However, we know how systems "normally" behave
How about teaching the system to differentiate normal from abnormal?

Assume:

normal is "good"
abnormal may be bad

there is no guarantee that abnormal is bad, however, if it's bad, but not abnormal... we're in trouble
false positives are bad (i.e. abnormal but not good)

How do we detect abnormal system calls?

a machine learning problem
the system should learn as it runs and decide when it has learned "enough"

learn normal patterns of system calls over time

once learned enough, watch for abnormal system calls

Could we do the learning within processes?

possible, but all data would have to come from the kernel

Want something fast and simple, so it can be implemented in the kernel

you're right at ground-level, where decisions are being made

i.e. if bad system call being made -> can stop it immediatelly

don't want to be training a neural network to do this -> too complicated, too much overhead

Thu 7 Dec 2017 13:53:01 EST -> Video of observing system calls, ls vs. xclock

First assumption:

ignore the arguments system calls are making -> look at the calls themselves

but, different processes invoke different calls -> how to compare them?
even multi-threaded processes will mirror the structure of the code in the calls it makes

Second assumption:

look at the ordering of system calls on a per-thread, per-process basis

doesn't make sense to think of 'ls' system calls in the context of 'xclock' system calls

Therefore, any profiling will be based on the code being executed

Third assumption:

characterize processes based on the executable they are running

model per executable, with each one trained on multiple processes

How do we model the trace of system calls coming from a process?

How often do different system calls hapen? -> frequency analysis
high variance -> the calls change frequently

i.e. ls of a large dir vs. small dir

What system calls does a process makes or doesn't make?

Rather than examining if a process does or doesn't make a particular system call, instead look at short sequences of system calls being made.

What is the variation in the pattern of sequences of calls being made? A compromised program will be detectable.

Table lookup of sequences made by a program and compare against new sequences

How short is a short sequence of system calls? -> 6 to 10

When a program is running, the short sequences define the control flow path of the program
The short sequences together represent the control flow
When a program is exploited, an abnormal control flow, an uncommon path, is being used

Try the simple hack first, rather than designing/engineering a complex solution

the simple hack will often present valuable insights