Difference between revisions of "Talk:CCS2011: Enemy of the Good"

From Soma-notes
Jump to navigation Jump to search
(Created page with " * For IDS to work, we need very accurate detectors ** base rate fallacy ** specifically, very low false alarm rates * To date, nobody has achieved sufficiently low false alarm r…")
 
Line 1: Line 1:
* For IDS to work, we need very accurate detectors
* For IDS to work, we need very accurate detectors
** base rate fallacy
** base rate fallacy
Line 47: Line 46:
* Relatively low rate of data
* Relatively low rate of data
* Still has persistent false positives _and_ false negatives
* Still has persistent false positives _and_ false negatives
===Limits of ML===
<i>Objective</i>: Argue that, while there are still improvements to be made in ML algorithm development and refinement, both the "law of diminishing returns", and the challenging data realities implicit in IDS indicated our efforts should be refocused.
<i>Question</i>: Do researchers still attempt to apply binary classifiers to IDS?
* If this is the case, and, perhaps, regardless, the ML section can be structured as follows:
ML Section
*Overview of binary and one-class classification
**Discrimination based approach versus a recognition based approach
**The type of results produced
***One-class: Helicopter gearbox, breast cancer mammogram, continuous typist recognition, <i>etc.</i>  Conceptually these all seem approximately Gaussian, and good results were achieved.
**What are the data requirements need in order to achieve these results
***For binary: assume a representative set of data has been drawn from class &omega;<sub>1</sub> and &omega;<sub>2</sub>. The distributions are generally assumed to be stationary, if not, extra consideration has to be given. With these assumptions, we theoretically expect that the degree to which class &omega;<sub>1</sub> and class &omega;<sub>2</sub> overlap will define the minimum error threshold.
*** Based on the fact that neither user behaviour nor attacker behaviour is stationary, and the fact that acquiring a representative set of, even historic, attacker data is extremely challenging (to put it mildly),  the notion of building a necessarily accurate model based on the binary classification paradigm is a formidable one.
***For one-class: assume a representative set of data has been drawn from class &omega;<sub>1</sub>, and, as a result, the distribution can be generalized to a level of accuracy necessary for the model to achieve an acceptably low error rate in future application. Assuming the classes are generally separable, this can be done for parametric distributions, such as Gaussian, and non-parametric distributions, provided they are devoid of a “fat tail”.
**Law of Diminishing Returns
***discussion based on hand
*Conclusion
**For these reasons, and those articulated in the remained of this paper, we believe that the prudent next step in IDS research is a thorough examination of novel techniques for dealing with current degree of FP. More specifically, this thesis arises from the fact that it is clear that the added benefit of the increasingly sophisticated ML algorithms is diminishing, and that, to some degree, FP are, and will remain, a fact of life in IDS.

Revision as of 08:42, 21 March 2011

  • For IDS to work, we need very accurate detectors
    • base rate fallacy
    • specifically, very low false alarm rates
  • To date, nobody has achieved sufficiently low false alarm rates to be universally applicable
    • signature and spec methods can be ad-hoc tuned to be good enough but then have poor coverage of new attacks
    • adaptive methods cannot be sufficiently tuned
  • We argue that we can't get low enough false alarm rates, that there are fundamental limits on IDS performance due to the underlying distributions of legitimate and attacker behavior.
  • Reasons:
    • legit behavior is non-Gaussian, largely power-law like, meaning they have fat tails
    • attacker behavior cannot be sampled sufficiently to learn distribution
    • and besides, attacker behavior keeps changing to follow new attack innovations (more like spread of disease than Gaussian, fundamentally not stationary) and to behave more like legitimate behavior to avoid defenders
    • if we could get good samples of both classes, we might be able to separate them; but instead we must do one-class learning and one-class learning cannot deal well with very long tails.
    • "adaptive concept drift"

IDS Requirements

  • scalability in false alarms
  • detect wide range of attacks
    • realistically won't catch all attacks, but should go significantly beyond "just what I've seen" (otherwise cannot address attacker innovation)
  • low resource usage (network, CPU, storage/IO, user, administrator)
  • Stated this way, looks like a ML problem

Machine Learning

  • many, many techniques
  • basic idea: combine a-priori knowledge built into learning method with observations to create classification model
  • IDS is a binary classification problem
  • most accurate methods require representative set of each class
  • if not both, need at least one representative set
  • to do this, data should have certain characteristics


Legitimate behavior


    • Classifier technology and the illusion of progress[1]

Sections:

  • Problem


Best case scenario: credit card fraud detection

  • Two class learning is possible
  • Relatively low rate of data
  • Still has persistent false positives _and_ false negatives

Limits of ML

Objective: Argue that, while there are still improvements to be made in ML algorithm development and refinement, both the "law of diminishing returns", and the challenging data realities implicit in IDS indicated our efforts should be refocused.

Question: Do researchers still attempt to apply binary classifiers to IDS?

  • If this is the case, and, perhaps, regardless, the ML section can be structured as follows:

ML Section

  • Overview of binary and one-class classification
    • Discrimination based approach versus a recognition based approach
    • The type of results produced
      • One-class: Helicopter gearbox, breast cancer mammogram, continuous typist recognition, etc. Conceptually these all seem approximately Gaussian, and good results were achieved.
    • What are the data requirements need in order to achieve these results
      • For binary: assume a representative set of data has been drawn from class ω1 and ω2. The distributions are generally assumed to be stationary, if not, extra consideration has to be given. With these assumptions, we theoretically expect that the degree to which class ω1 and class ω2 overlap will define the minimum error threshold.
      • Based on the fact that neither user behaviour nor attacker behaviour is stationary, and the fact that acquiring a representative set of, even historic, attacker data is extremely challenging (to put it mildly), the notion of building a necessarily accurate model based on the binary classification paradigm is a formidable one.
      • For one-class: assume a representative set of data has been drawn from class ω1, and, as a result, the distribution can be generalized to a level of accuracy necessary for the model to achieve an acceptably low error rate in future application. Assuming the classes are generally separable, this can be done for parametric distributions, such as Gaussian, and non-parametric distributions, provided they are devoid of a “fat tail”.
    • Law of Diminishing Returns
      • discussion based on hand
  • Conclusion
    • For these reasons, and those articulated in the remained of this paper, we believe that the prudent next step in IDS research is a thorough examination of novel techniques for dealing with current degree of FP. More specifically, this thesis arises from the fact that it is clear that the added benefit of the increasingly sophisticated ML algorithms is diminishing, and that, to some degree, FP are, and will remain, a fact of life in IDS.