EvoSec 2025W Lecture 17: Difference between revisions
Created page with "==Readings== * [https://homeostasis.scs.carleton.ca/~soma/pubs/amatrawy-acns-05.pdf Matrawy, "Mitigating Network Denial-of-Service Through Diversity-Based Traffic Management." (ACNS 2005)] * [https://homeostasis.scs.carleton.ca/~soma/pubs/inoue-lisa2007.pdf Inoue, "NetADHICT: A Tool for Understanding Network Traffic." (LISA 2007)] ==Discussion Questions== ==Notes==" |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 5: | Line 5: | ||
==Discussion Questions== | ==Discussion Questions== | ||
* What does it mean for an attacker to "defeat" (p,n)-gram based traffic clustering? | |||
* What do high frequency (p,n)-grams reveal about network traffic? Does this include anything that might compromise user privacy? | |||
* Is ADHIC an anomaly detection algorithm? Can it be used to detect anomalies? | |||
* How fast is ADHIC compared to other standard clustering algorithms? | |||
* Is diversity-based traffic management feasible today given that so much traffic is encrypted? | |||
==Notes== | ==Notes== | ||
<pre> | |||
Lecture 17 | |||
---------- | |||
* Internet protocols | |||
* clustering vs classification | |||
* p,n-grams | |||
best-effort packet delivery | |||
- rather than guaranteed delivery | |||
IP - best effort | |||
TCP - guaranteed delivery | |||
best effort allows for denial of service | |||
- can always eliminate DoS with reservations, but only for the chosen few | |||
Unless you make deliberate choices about who gets service, EVERYONE gets poor service when there is too much demand | |||
So how do we deal with denial of service on the Internet today? | |||
Today we mostly manage DoS through content distribution networks (CDNs) of some kind. | |||
A CDN is its own network of servers (an "overlay network" or entirely separate) that distributes & serves data | |||
How do CDNs route traffic? | |||
- a form of load balancing, but also prioritization | |||
(how much did you pay?) | |||
- tends to be on a per-server basis, not per-client | |||
What is normal for the network? | |||
- constant level of weirdness! | |||
Internet telescope | |||
- reserve a large block of IP addresses that aren't being used | |||
- watch what traffic comes to it | |||
naive anomaly detection on network traffic will have huge false positives | |||
- or, your model will be way too general | |||
Today we mostly do the exact opposite | |||
- deep packet inspection systems | |||
- in the cloud, will analyze decrypted packets | |||
- really try to understand traffic using lots of rules, reconstructing flows | |||
Normally traffic is managed using source IP address, source port, destination IP address, destination port, protocol | |||
- but is that all we can look at? | |||
- can we use this data in a more generic way, without parsing out flows? | |||
So why p,n-grams? | |||
n-grams is a common way to analyze large amounts of data | |||
- n in n-gram is just a length, so a set of fixed-length strings | |||
One idea is to do n-gram analysis on packets (whole packets or just packet headers) | |||
- n-gram analysis is relatively slow, have to search entire packet for a match | |||
network routers go through a lot of effort to not look at every byte in a packet | |||
What do routers look at? | |||
- source and destination IP addresses | |||
Notice that these are 4 byte (or 16 byte) patterns at fixed offsets in a packet header | |||
- p,n-grams are a generalization of source and destination IP addresses | |||
What is the frequency distribution of p,n-grams? | |||
What does it mean for an attacker to "defeat" (p,n)-gram based traffic clustering? | |||
- attacker wants to get maximum bandwidth | |||
- so, have to get their packets into all the queues, or as many as possible | |||
- in order to do that, they have to create packets that have p,n-grams that are being used by every queue (every leaf node in ADHIC) | |||
What do high frequency (p,n)-grams reveal about network traffic? Does this include anything that might compromise user privacy? | |||
- inherently privacy preserving, except for bad actors | |||
Is ADHIC an anomaly detection algorithm? Can it be used to detect anomalies? | |||
How fast is ADHIC compared to other standard clustering algorithms? | |||
Is diversity-based traffic management feasible today given that so much traffic is encrypted? | |||
How does this relate to trust? | |||
</pre> |
Latest revision as of 18:45, 13 March 2025
Readings
- Matrawy, "Mitigating Network Denial-of-Service Through Diversity-Based Traffic Management." (ACNS 2005)
- Inoue, "NetADHICT: A Tool for Understanding Network Traffic." (LISA 2007)
Discussion Questions
- What does it mean for an attacker to "defeat" (p,n)-gram based traffic clustering?
- What do high frequency (p,n)-grams reveal about network traffic? Does this include anything that might compromise user privacy?
- Is ADHIC an anomaly detection algorithm? Can it be used to detect anomalies?
- How fast is ADHIC compared to other standard clustering algorithms?
- Is diversity-based traffic management feasible today given that so much traffic is encrypted?
Notes
Lecture 17 ---------- * Internet protocols * clustering vs classification * p,n-grams best-effort packet delivery - rather than guaranteed delivery IP - best effort TCP - guaranteed delivery best effort allows for denial of service - can always eliminate DoS with reservations, but only for the chosen few Unless you make deliberate choices about who gets service, EVERYONE gets poor service when there is too much demand So how do we deal with denial of service on the Internet today? Today we mostly manage DoS through content distribution networks (CDNs) of some kind. A CDN is its own network of servers (an "overlay network" or entirely separate) that distributes & serves data How do CDNs route traffic? - a form of load balancing, but also prioritization (how much did you pay?) - tends to be on a per-server basis, not per-client What is normal for the network? - constant level of weirdness! Internet telescope - reserve a large block of IP addresses that aren't being used - watch what traffic comes to it naive anomaly detection on network traffic will have huge false positives - or, your model will be way too general Today we mostly do the exact opposite - deep packet inspection systems - in the cloud, will analyze decrypted packets - really try to understand traffic using lots of rules, reconstructing flows Normally traffic is managed using source IP address, source port, destination IP address, destination port, protocol - but is that all we can look at? - can we use this data in a more generic way, without parsing out flows? So why p,n-grams? n-grams is a common way to analyze large amounts of data - n in n-gram is just a length, so a set of fixed-length strings One idea is to do n-gram analysis on packets (whole packets or just packet headers) - n-gram analysis is relatively slow, have to search entire packet for a match network routers go through a lot of effort to not look at every byte in a packet What do routers look at? - source and destination IP addresses Notice that these are 4 byte (or 16 byte) patterns at fixed offsets in a packet header - p,n-grams are a generalization of source and destination IP addresses What is the frequency distribution of p,n-grams? What does it mean for an attacker to "defeat" (p,n)-gram based traffic clustering? - attacker wants to get maximum bandwidth - so, have to get their packets into all the queues, or as many as possible - in order to do that, they have to create packets that have p,n-grams that are being used by every queue (every leaf node in ADHIC) What do high frequency (p,n)-grams reveal about network traffic? Does this include anything that might compromise user privacy? - inherently privacy preserving, except for bad actors Is ADHIC an anomaly detection algorithm? Can it be used to detect anomalies? How fast is ADHIC compared to other standard clustering algorithms? Is diversity-based traffic management feasible today given that so much traffic is encrypted? How does this relate to trust?