ToDo

Gather data from different IDS observables to show they aren't Gaussian
- system calls (Luc)
- network traffic
- log files
Machine learning
- standard machine learning methods approximate distributions
- approximation works best if Gaussian but has limits (show mathematically)
- non-Gaussian distributions place much harsher restrictions on error rates, they don't go down proportionally to sample size? (more math)
Survey of results in IDS literature

Title

The Enemy of the Good: Re-evaluating Research Directions in Intrusion Detection

Poor results
- datasets do not represent real-world usage or scenarios accurately
- insufficient or misleading tests of false positive rates
- Even when rates are accurate, they are misinterpreted: high FP rates are not considered to be high (wrong time scale, lack of attention to scalability)
- misleading integration of attacks into legitimate behavior
Administrative overhead
- rules that can only be created by experts, but system requires end users to create custom rules
- experts required to interpret output
- insufficient context for even experts to interpret output
- assumption of existence of security personnel that won't even exist in many important contexts
Computational overhead
- can system keep up with normal workloads?
- can system keep up with attacker-generated workloads?
Anomalies versus attacks
- why is one a good proxy for the other?
- why is chosen feature(s) particularly good at detecting attacks?
Out of the box algorithms applied w/o understanding security problem
Attacker evasion: how can attacker manipulate system? Can system lead to environment that is easier to attack?