Difference between revisions of "SystemsSec 2018W Lecture 19"

From Soma-notes
Jump to navigation Jump to search
(Created page with "==Audio== [https://homeostasis.scs.carleton.ca/~soma/systemssec-2018w/lectures/comp4108-2018w-lec19-21Mar2018.m4a Lecture 19 Audio] ==Notes==")
 
(Pasted notes with no formatting)
Line 4: Line 4:


==Notes==
==Notes==
Every security technology has strengths and weaknesses
Various types of data input for security software includes
IP Packets
System Calls
Log files
Emails
Os statistics (resource usage)
Http traffic
etc
The representation of the data is more important than the data itself
Data representation is a machine learning concept.
Certain type of operations are easier to do on different data representations
Data can be converted into different types of representations
For example if your input data is IP packets, the types of operations you can do are different than if you are able to have email messages extracted from them.
Converting those IP packets to email messages is called a representation switch
Data represented as emails would have the fields From, to, message, date sent, etc
Machine learning algorithms
Pattern recognition classification problem
Inputs you want to label
Deep learning
Deep learning can learn its own representation
But it requires tons of data to train itself
Deep learning takes a ton of time, because it has to process so much data
Most problems don’t have enough data for deep learning
Adversarial Machine learning
Deep learning might lot learn the representation that you expect.
While we think that it would learn that a stop sign is red, octagonal, contains the word STOP in white text, it could actually just be learning something trivial like what the tops of the letters look like. This would allow someone to deface a stop sign in some minor way such as putting dots above the letter that would make it so that the self driving cars can’t recognise the sign even though humans still could.
How the data is represented determines what tasks security technology can perform with it
Security technologies can apply static policies or learn patterns. Complicated systems will use both.
For the example of a spam filter, it can have static policies for email addresses or keywords to ignore, and it can also learn from the vast amount of data available. Spam filtering is able to be quite sophisticated because there is a clear indication of success and failure, and there is incentive for both the host and the user to get it correct.
Log files:
SIEMs (System information event management)
Manage logs from multiple systems
Represents all logs as a common representation for use with analytic tools
Not good in practice because you need domain specific knowledge to understand the logs
CASB
SIEM for cloud applications
Graylisting
When a user firsts attempts to do something the server responds with responds with “try again later” instead of a failure, then after a set amount of time adds it to a list. When that thing is attempted again later the server accepts and adds that action to a white list because a spam bot will not try things twice, but a user will.

Revision as of 21:31, 3 April 2018

Audio

Lecture 19 Audio

Notes

Every security technology has strengths and weaknesses

Various types of data input for security software includes IP Packets System Calls Log files Emails Os statistics (resource usage) Http traffic etc

The representation of the data is more important than the data itself Data representation is a machine learning concept. Certain type of operations are easier to do on different data representations Data can be converted into different types of representations For example if your input data is IP packets, the types of operations you can do are different than if you are able to have email messages extracted from them. Converting those IP packets to email messages is called a representation switch Data represented as emails would have the fields From, to, message, date sent, etc Machine learning algorithms Pattern recognition classification problem Inputs you want to label


Deep learning Deep learning can learn its own representation But it requires tons of data to train itself Deep learning takes a ton of time, because it has to process so much data Most problems don’t have enough data for deep learning


Adversarial Machine learning Deep learning might lot learn the representation that you expect. While we think that it would learn that a stop sign is red, octagonal, contains the word STOP in white text, it could actually just be learning something trivial like what the tops of the letters look like. This would allow someone to deface a stop sign in some minor way such as putting dots above the letter that would make it so that the self driving cars can’t recognise the sign even though humans still could.


How the data is represented determines what tasks security technology can perform with it

Security technologies can apply static policies or learn patterns. Complicated systems will use both.

For the example of a spam filter, it can have static policies for email addresses or keywords to ignore, and it can also learn from the vast amount of data available. Spam filtering is able to be quite sophisticated because there is a clear indication of success and failure, and there is incentive for both the host and the user to get it correct.

Log files: SIEMs (System information event management) Manage logs from multiple systems Represents all logs as a common representation for use with analytic tools Not good in practice because you need domain specific knowledge to understand the logs CASB SIEM for cloud applications


Graylisting When a user firsts attempts to do something the server responds with responds with “try again later” instead of a failure, then after a set amount of time adds it to a list. When that thing is attempted again later the server accepts and adds that action to a white list because a spam bot will not try things twice, but a user will.