SystemsSec 2018W Lecture 19: Difference between revisions

From Soma-notes
Carter2 (talk | contribs)
Pasted notes with no formatting
Carter2 (talk | contribs)
Formatted the notes
 
Line 7: Line 7:
Every security technology has strengths and weaknesses
Every security technology has strengths and weaknesses


Various types of data input for security software includes
'''Various types of data input for security software includes:'''
IP Packets
* IP Packets
System Calls
* System Calls
Log files
* Log files
Emails
* Emails
Os statistics (resource usage)
* Os statistics (resource usage)
Http traffic
* Http traffic
etc
* etc


The representation of the data is more important than the data itself
'''The representation of the data is more important than the data itself.'''
Data representation is a machine learning concept.
* Data representation is a machine learning concept.
Certain type of operations are easier to do on different data representations
* Certain type of operations are easier to do on different data representations
Data can be converted into different types of representations
* Data can be converted into different types of representations
For example if your input data is IP packets, the types of operations you can do are different than if you are able to have email messages extracted from them.
* For example if your input data is IP packets, the types of operations you can do are different than if you are able to have email messages extracted from them.
Converting those IP packets to email messages is called a representation switch
** Converting those IP packets to email messages is called a representation switch
Data represented as emails would have the fields From, to, message, date sent, etc
** Data represented as emails would have the fields From, to, message, date sent, etc
Machine learning algorithms
* Machine learning algorithms
Pattern recognition classification problem
** Pattern recognition classification problem
Inputs you want to label
** Inputs you want to label




Deep learning
=== Deep learning ===
Deep learning can learn its own representation
* Deep learning can learn its own representation
But it requires tons of data to train itself
* But it requires tons of data to train itself
Deep learning takes a ton of time, because it has to process so much data
* Deep learning takes a ton of time, because it has to process so much data
Most problems don’t have enough data for deep learning
* Most problems don’t have enough data for deep learning




Adversarial Machine learning
=== Adversarial Machine learning ===
Deep learning might lot learn the representation that you expect.
* Deep learning might lot learn the representation that you expect.
While we think that it would learn that a stop sign is red, octagonal, contains the word STOP in white text, it could actually just be learning something trivial like what the tops of the letters look like. This would allow someone to deface a stop sign in some minor way such as putting dots above the letter that would make it so that the self driving cars can’t recognise the sign even though humans still could.
* While we think that it would learn that a stop sign is red, octagonal, contains the word STOP in white text, it could actually just be learning something trivial like what the tops of the letters look like. This would allow someone to deface a stop sign in some minor way such as putting dots above the letter that would make it so that the self driving cars can’t recognise the sign even though humans still could.
 




Line 47: Line 46:
For the example of a spam filter, it can have static policies for email addresses or keywords to ignore, and it can also learn from the vast amount of data available. Spam filtering is able to be quite sophisticated because there is a clear indication of success and failure, and there is incentive for both the host and the user to get it correct.
For the example of a spam filter, it can have static policies for email addresses or keywords to ignore, and it can also learn from the vast amount of data available. Spam filtering is able to be quite sophisticated because there is a clear indication of success and failure, and there is incentive for both the host and the user to get it correct.


Log files:
SIEMs (System information event management)
Manage logs from multiple systems
Represents all logs as a common representation for use with analytic tools
Not good in practice because you need domain specific knowledge to understand the logs
CASB
SIEM for cloud applications


=== Log files: ===
* SIEMs (System information event management)
** Manage logs from multiple systems
** Represents all logs as a common representation for use with analytic tools
** Not good in practice because you need domain specific knowledge to understand the logs
*CASB
** SIEM for cloud applications
=== Graylisting ===


Graylisting
When a user firsts attempts to do something the server responds with responds with “try again later” instead of a failure, then after a set amount of time adds it to a list. When that thing is attempted again later the server accepts and adds that action to a white list because a spam bot will not try things twice, but a user will.
When a user firsts attempts to do something the server responds with responds with “try again later” instead of a failure, then after a set amount of time adds it to a list. When that thing is attempted again later the server accepts and adds that action to a white list because a spam bot will not try things twice, but a user will.

Latest revision as of 01:37, 4 April 2018

Audio

Lecture 19 Audio

Notes

Every security technology has strengths and weaknesses

Various types of data input for security software includes:

  • IP Packets
  • System Calls
  • Log files
  • Emails
  • Os statistics (resource usage)
  • Http traffic
  • etc

The representation of the data is more important than the data itself.

  • Data representation is a machine learning concept.
  • Certain type of operations are easier to do on different data representations
  • Data can be converted into different types of representations
  • For example if your input data is IP packets, the types of operations you can do are different than if you are able to have email messages extracted from them.
    • Converting those IP packets to email messages is called a representation switch
    • Data represented as emails would have the fields From, to, message, date sent, etc
  • Machine learning algorithms
    • Pattern recognition classification problem
    • Inputs you want to label


Deep learning

  • Deep learning can learn its own representation
  • But it requires tons of data to train itself
  • Deep learning takes a ton of time, because it has to process so much data
  • Most problems don’t have enough data for deep learning


Adversarial Machine learning

  • Deep learning might lot learn the representation that you expect.
  • While we think that it would learn that a stop sign is red, octagonal, contains the word STOP in white text, it could actually just be learning something trivial like what the tops of the letters look like. This would allow someone to deface a stop sign in some minor way such as putting dots above the letter that would make it so that the self driving cars can’t recognise the sign even though humans still could.


How the data is represented determines what tasks security technology can perform with it

Security technologies can apply static policies or learn patterns. Complicated systems will use both.

For the example of a spam filter, it can have static policies for email addresses or keywords to ignore, and it can also learn from the vast amount of data available. Spam filtering is able to be quite sophisticated because there is a clear indication of success and failure, and there is incentive for both the host and the user to get it correct.


Log files:

  • SIEMs (System information event management)
    • Manage logs from multiple systems
    • Represents all logs as a common representation for use with analytic tools
    • Not good in practice because you need domain specific knowledge to understand the logs
  • CASB
    • SIEM for cloud applications


Graylisting

When a user firsts attempts to do something the server responds with responds with “try again later” instead of a failure, then after a set amount of time adds it to a list. When that thing is attempted again later the server accepts and adds that action to a white list because a spam bot will not try things twice, but a user will.