Operating Systems 2017F Lecture 23
Additional Notes
Written solutions for midterm exam are on the course webpage 
Assignment 4 
Q2: SSH keygen generates the secret key and the public key file. The private key is stored in the private key file: .ssh/id_rsa  
Q4: Both lines because first you start from 1 and then increment from there. 
Q11: Only bs because it is write. Would be ibs and bs if it was read. 
Q12: Local kernel forwards the write system call but doesn't actually make the system call. Kernels don't make system calls.
Lecture 23 Prof Notes 
How can you tell when a process has been compromised. 
- from outside the process 
Use signatures 
- is it running "bad code" 
- is it doing "bad things" 
-- For example: password program start modifying files other than etc/password. You could specify rules to prevent this. 
-- bad system calls
For a process to do damage, it has to make "bad" system calls 
I want to be lazy 
- complex rules are a pain 
 
- and they don't work well either 
 
Make the computer solve this problem 
- Use Machine Learning 
I can't teach good versus bad if I don't know what is bad 
But I know how systems behave normally 
How about teaching the system to differentiate normal from abnormal 
- normal is good 
- abnormal may be bad 
abnormal but not bad => false positive 
False positives can be a big issue because they may cause ppl to not trust the machine's detection capabilities 
How can we detect abnormal system calls? 
Learn normal patterns of system calls over time 
Once you've learned enough, watch for abnormal system calls 
Since I'm lazy, I want to learn it as it runs 
- and automatically decide when it has learned enough 
Could I do the learning in a process (or set of processes)? 
you could, but all data would have to come from the kernel 
Want something fast and simple 
- implement in the kernel 
 
How simple could it be? 
First assumption: ignore arguments 
Second assumption: look at ordering of system calls on a per-thread, per-process basis 
Third Assumption: characterize processes based on the executable they are running 
model per executable, each trained on multiple processes 
How to model the trace of system calls coming from a procesd 
- frequency analysis? 
- on a system call basis 
- high variance 
- what system calls are made (and not made)? 
- short sequence of system calls? 6-10 calls