Title

How to Evaluate Intrusion Detection Systems

Abstract

Evaluating non-adaptive IDSs (signature, specification) is like evaluating a programming language
- quality of individual solutions does not indicate quality of framework
- quality over all solutions might say something, but that is very hard to measure

Sections:

Poor results
- datasets do not represent real-world usage or scenarios accurately
- insufficient or misleading tests of false positive rates
- Even when rates are accurate, they are misinterpreted: high FP rates are not considered to be high (wrong time scale, lack of attention to scalability)
- misleading integration of attacks into legitimate behavior
Administrative overhead
- rules that can only be created by experts, but system requires end users to create custom rules
- experts required to interpret output
- insufficient context for even experts to interpret output
- assumption of existence of security personnel that won't even exist in many important contexts
Computational overhead
- can system keep up with normal workloads?
- can system keep up with attacker-generated workloads?
Anomalies versus attacks
- why is one a good proxy for the other?
- why is chosen feature(s) particularly good at detecting attacks?
Out of the box algorithms applied w/o understanding security problem
Attacker evasion: how can attacker manipulate system? Can system lead to environment that is easier to attack?