SystemsSec 2018W Lecture 23
Network monitoring is a very general concept and performed for many reasons. Your ISP might monitor the network for congestion or bad traffic. An organization may monitor the network for security reasons.
Monitoring networks has become harder recently due to encryption. For example:
- Web traffic is all on TLS - Emails are being encrypted - DNS: Used to be wide open, tried to secure it (DNSSEC) - If left in the clear, open to monitor and manipulation by anyone, which is bad - It gives you more information than just a src/dst IP would. Why? "The Cloud" - Usually multiple IPs for one service, or many services for one IP - DNS will get you what you want, so now efforts to tunnel DNS over TLS
Even if we assume encryption isn't a problem, how do we "secure" (find out if bad things are occuring) a network? How do you understand what the network is doing? Sysadmins care about security, but ultimately, they care about what is going on inside their network.
Enterprises will usually forgo encryption entirely, or have the keys to decrypt due to legal obligations. They may set up a man in the middle by terminating TLS connections at a certain point and re-initiating their own. They can accomplish this by installing their own root certs.
Even with everything in plaintext, understanding a network still isn't easy. What does it even mean to understand a network? It's not so much as a protocol complexity issue, but more of a "everything you connect to does crazy things and it's hard to know what" issue. What we need is a model, so that when things deviate from the model we can tell.
How to we learn a model?
- sit and sift though traffic. (tedious and usually ineffective)
- aggregate things: Classify and look at relationships - But how? Not always easy - Use src/dst ip/ports? When few computers, awesome, but not for modern networks - Modern networks are too dynamic, always changing and too many IPs
- Solution: block classes of traffic via proxy? - Haven't solved the problem, too much traffic still - No Whitelists: too many sites/people - No Blacklists: might partly work, but too many large holes - No Anomaly Detection: too much stuff, too random. - Anomaly rate will be high (tail of curve too fat) - If we cut off tail, now people are mad and circumvent us
The gist of clustering is: I don't know the right aggregates, so let's automatically aggregate it instead. If we combine this with anomaly detection after, maybe we can have some success?
- Clustering -> Aggregates -> Anomaly Detection -> Profit???
Our usual clustering algorithms are O(n^2), not possible with a high traffic volume. Even O(nlogn) is too much. Maybe using sliding window on streaming data?
Remember, we are trying to build a model. What if we could quickly build this model and check it (using machine learning)? If our model building is fast, and we can check in linear time, it could work. But how can we limit our data size?
- Limit sample by using small time window of data
- Build model on time window, adjust over time
Now that we have our sample data, how do we build our model? The first thought is to model the weirdness, but this is wrong. Instead, we should model what is normal, then check for weirdness. Alright, so how do we do this?
N-grams are substrings of content. But we need more than this, we also need the position as well. There we look at pn-grams (position n-grams). These enable us to forgo inspecting whole packets, and instead just look at a few bytes in a specific position. But how do we find these pn-grams?
- Start with all traffic, then take a sample
- In sample, compute frequency of all pn-grams
- Using pn-grams, randomly pick one with a frequency of about 50% (40-60)
- We now have two classes, one with the pn-gram in it, one without
- Do this recursively until we get down to aggregates with about 2% frequency
- Result is a decision tree of pn-grams, where leafs are our aggregates
Turns out that this method classifies traffic in extremely meaningful ways (TCP vs Non-TCP, etc). It works even better if we remove packet headers due to redundancy. Finally, the closer the aggregates are in the tree, the more similar they are. If you don't understand why this occurs, don't worry, some guy did a PhD on this.
For more info, search online for: