SystemsSec 2018W Lecture 17: Difference between revisions
Created page with "==Audio== [https://homeostasis.scs.carleton.ca/~soma/systemssec-2018w/lectures/comp4108-2018w-lec17-14Mar2018.m4a Lecture 17 Audio] ==Notes==" |
|||
Line 4: | Line 4: | ||
==Notes== | ==Notes== | ||
What is Normal? | |||
It's intuitive from our interactions with the world | |||
Anil's best definition: Anomalies are a time-based set of rare circumstances or events. | |||
In a computer system: | |||
- what is our state, after these events occur? Normal vs Abnormal state? | |||
- Do we see these circumstances commonly? | |||
Weirdness = badness, in the eyes of a sys admin. | |||
The role of an admin: | |||
- Humans are supposed to be in complete control of their systems, so the human aspect of intrusion detection is always there in practice. | |||
What is rare? What does the distribution of these "rare" events look like? | |||
You get a distribution that follows Zipf's Law (1/x distribution) | |||
Zipf's Law is used for the frequency of words in human language (in general). | |||
However, this can also appear in computer systems for frequency index. Why? Certain things will happen all the time, | |||
certain things happen often but less frequently. Then there are events that happen very rarely. | |||
The tail of the distribution is where the anomalies lie in computer systems. | |||
This implies a degree of difficulty, since the tail never goes to 0. | |||
Based on this, the definition of "rare" for an admin depends on how much they want to pay attention to details. | |||
It's the human perspective that determines how many false positives you're going to deal with (How | |||
many alerts will a human deal with on a daily basis). Ideally, you want to deal with at most 1 or 2 anomalies a day. | |||
Finding anomalies takes away valuable time, which generally is an expensive resource. Some organizations, mostly military, | |||
are fine with dedicating large amounts of time to anomaly detection. | |||
With the increase in the scale of cloud bases systems, hardware failures are seen as more common. | |||
Another example: warning prompts. Users have been habituated into thinking warning prompts are useless (way too frequent). | |||
Habituation is important in asserting importance. | |||
If you want to involve humans, ensure the time scale is something that they will work well with. | |||
Signature based: The reason they have low false alarms is because they've tuned out a lot of the false positives. | |||
Specification based: Tune the rules to the point where you include what normally happens in the system, at which point | |||
the human has already figured out the set of normal and abnormal behaviors. (e.g. firewall) | |||
Firewall: Watch and block traffic | |||
Spec-Based Intrusion Detection System (IDS): Generate alerts based on traffic surveillance | |||
Anomaly Detection based IDS: The system learns the behavior, then models the behavior in an attempt to determine "normal" behavior. | |||
Why can't you make it so that only safe things are allowed? | |||
- New Speak (1984 by George Orwell). Redefined language such that it was impossible to say things against the state. | |||
- at the end of the day, security violations are just people doing bad things with computers... we can't do it. | |||
- this is why anomaly detection is the most promising method.. it essentially automates what humans always do. | |||
What do you look at? What do you model? | |||
- Obvious way: Take as much data as possible, throw it in a Machine Learning (ML) program, and see what comes out. | |||
- this is bad | |||
- ML is partitioning based on examples. | |||
- throwing in all the data you can find increases the dimension of the partitioning. Making the problem exponentially harder. | |||
- Feature selection (normally first part of solving a ML problem): | |||
- if you use ML without domain knowledge, it just introduces biases. | |||
- You not only need samples of good behavior, but you also need samples of bad behavior.... which is not feasible | |||
Start with a smaller, but more accurate feature set, then expand as needed. | |||
Programs exhibit high degrees of Locality. If they access a byte, they tend to stay near that location. | |||
Optimization: Certain parts of your code are going to take up most of the execution time. | |||
Following Zipf's Law again. The memory hierarchy supports this. | |||
Programs act weird when their security is violated..... does that make sense? | |||
ex: Code injection: Yes, that's weird. Strange things will happen. Control flow will go crazy. | |||
Backdoor: can circumvent safety measures, thus changing behavior. | |||
Yes. This hypothesis makes sense. | |||
Although there is a tail of weirdness, we can still discover weird behavior that we care about. | |||
Most attacks will change the externally visible aspects of a program. | |||
All programs are put into abstract syntax trees before they are executed, before they're turned into machine code. | |||
Running the program is basically walking a path on the tree. System calls are markers along this path. | |||
If you can model system calls, you can possibly see abnormal behavior... but only if you do it quickly, with little overhead. | |||
You can model system calls with table lookup. |
Latest revision as of 00:50, 15 March 2018
Audio
Notes
What is Normal? It's intuitive from our interactions with the world Anil's best definition: Anomalies are a time-based set of rare circumstances or events.
In a computer system:
- what is our state, after these events occur? Normal vs Abnormal state? - Do we see these circumstances commonly?
Weirdness = badness, in the eyes of a sys admin.
The role of an admin: - Humans are supposed to be in complete control of their systems, so the human aspect of intrusion detection is always there in practice.
What is rare? What does the distribution of these "rare" events look like? You get a distribution that follows Zipf's Law (1/x distribution)
Zipf's Law is used for the frequency of words in human language (in general). However, this can also appear in computer systems for frequency index. Why? Certain things will happen all the time, certain things happen often but less frequently. Then there are events that happen very rarely. The tail of the distribution is where the anomalies lie in computer systems. This implies a degree of difficulty, since the tail never goes to 0.
Based on this, the definition of "rare" for an admin depends on how much they want to pay attention to details. It's the human perspective that determines how many false positives you're going to deal with (How many alerts will a human deal with on a daily basis). Ideally, you want to deal with at most 1 or 2 anomalies a day. Finding anomalies takes away valuable time, which generally is an expensive resource. Some organizations, mostly military, are fine with dedicating large amounts of time to anomaly detection.
With the increase in the scale of cloud bases systems, hardware failures are seen as more common.
Another example: warning prompts. Users have been habituated into thinking warning prompts are useless (way too frequent). Habituation is important in asserting importance.
If you want to involve humans, ensure the time scale is something that they will work well with.
Signature based: The reason they have low false alarms is because they've tuned out a lot of the false positives.
Specification based: Tune the rules to the point where you include what normally happens in the system, at which point the human has already figured out the set of normal and abnormal behaviors. (e.g. firewall)
Firewall: Watch and block traffic Spec-Based Intrusion Detection System (IDS): Generate alerts based on traffic surveillance
Anomaly Detection based IDS: The system learns the behavior, then models the behavior in an attempt to determine "normal" behavior.
Why can't you make it so that only safe things are allowed?
- New Speak (1984 by George Orwell). Redefined language such that it was impossible to say things against the state.
- at the end of the day, security violations are just people doing bad things with computers... we can't do it.
- this is why anomaly detection is the most promising method.. it essentially automates what humans always do.
What do you look at? What do you model?
- Obvious way: Take as much data as possible, throw it in a Machine Learning (ML) program, and see what comes out.
- this is bad
- ML is partitioning based on examples.
- throwing in all the data you can find increases the dimension of the partitioning. Making the problem exponentially harder.
- Feature selection (normally first part of solving a ML problem):
- if you use ML without domain knowledge, it just introduces biases.
- You not only need samples of good behavior, but you also need samples of bad behavior.... which is not feasible
Start with a smaller, but more accurate feature set, then expand as needed.
Programs exhibit high degrees of Locality. If they access a byte, they tend to stay near that location.
Optimization: Certain parts of your code are going to take up most of the execution time.
Following Zipf's Law again. The memory hierarchy supports this.
Programs act weird when their security is violated..... does that make sense?
ex: Code injection: Yes, that's weird. Strange things will happen. Control flow will go crazy.
Backdoor: can circumvent safety measures, thus changing behavior.
Yes. This hypothesis makes sense.
Although there is a tail of weirdness, we can still discover weird behavior that we care about. Most attacks will change the externally visible aspects of a program.
All programs are put into abstract syntax trees before they are executed, before they're turned into machine code. Running the program is basically walking a path on the tree. System calls are markers along this path. If you can model system calls, you can possibly see abnormal behavior... but only if you do it quickly, with little overhead.
You can model system calls with table lookup.