EvoSec 2025W Lecture 16

Readings

Discussion Questions

Feel free to only address a subset or none of the following questions in your discussion!

What does it take to define "normal"? In what contexts is it easier to define normal, and where is it harder?
To what extent does improved technology make it easier to distinguish between normal and abnormal behavior in an adversarial context?
When are false alarms okay, and when are they bad? (How often do you get alerts today from security systems and how often are these irrelevant?)
In general, is it better to look at data or metadata when doing anomaly detection?
How does the metadata for modern communication platforms differ from email? How is it similar?

Notes

Lecture 16
----------

G1
 - defining normal: long term, consistent data collected
   - hard to define normal if there isn't enough data (not enough interaction)
     or consistency
   - don't need too much consistency, just need lots of data so the patterns
     can be extracted
 - false alarms
   - notifications from google regarding logins is mostly false alarms but is still useful for maintaining security
   - but too many notifications for regular activity will lead to users to ignore, so frequency matters
   - severity also matters, can stress out users for no reason

G2
 - "window length" idea, how to apply generally?
   - harder to define normal with smaller & smaller window lengths
   - larger behavior space, more possible actions makes it harder to define normal
 - false alarms: intensity of alarms matters, how easy to ignore/how concerning
 - data vs metadata: generally metadata is the way to go
   - context matters, hard to get context from data
     (but not always, e.g., topics)

G3
 - LLM/AI could help with looking at data for determining normal/classification
 - is it good to get all of this data? could be used for impersonation
 - false alarms
    - geographic change-based alerts can be reasonable, for example
    - but could dissuade users from trying new things
    - 2FA on Carleton email, how useful when done on same device?
    - not good when it is hard to access important information quickly, interferes with normal tasks
 - modern platforms track more: typing, geographic info
    - more invasive
    - gets confused - IP address is in Toronto but still in Ottawa
 - assigning tasks/roles - using just email can be too limited
    - but modern platforms are controlled by large companies, so can see
      info across apps

G4
 - does better tech make defining normal easier?
    - not really!
    - newer tech, newer kinds of abnormalities
 - still an open problem in machine learning 20 years later!
    - still not getting great accuracy
 - machine learning black boxes don't help so much for anomaly detection
    - people need to go deeper to derive mathematical relationships
 - have to look at the metadata to determine attacks, don't have ground truth
   (don't know what is really an attack) most of the time
 - if attacker has information (e.g., emails), can mask their attacks, hide from detection


Easy to do anomaly detection wrong
 - focus on modeling everything, rather than what you must model
 - no clear idea of what "normal" will be

machine learning is best used first as a tool for data exploration
 - can use in production, but ONLY after you really understand what it does
 - machine learning isn't always the best at identifying features!
    - because it lacks context

to do security well, you need "normal" to be very consistent
 - which means humans should be able to do the classification relatively easily

So the art of this is to figure out what will be consistent
 - use domain knowledge & machine learning exploration of data

email archive detection
 - attacker evasion is either noticable to automated system or user


Work backwards from attacks!
 - why are they weird?