BioSec: DNMar23
Design Exercise: Phishing
How can we design an anti-phishing strategy that uses a biological approach?
Our theoretical phishing scenario involves:
- banking
- want credentials
- using email
- send an email that looks like it comes from the bank
- link goes to malicious site that looks arbitrarily like the bank
- what does it mean to look like the bank?
 
- user types in credentials, potentially gets transparently redirected to real bank site
Some of the problems that arise in phishing are related to:
- faked email
- link to site that looks like the bank but isn't the bank
- url that looks like the bank's url, but isn't the bank's url
- credentials being entered in wrong domain, wrong page
- misappropriated text and images (both in email, and on the faked website)
- bad/missing/suspect certificate
- certificate/credential combination is suspect
 
Human algorithm:
- is the domain the same for the one where credentials are normally sent?
- not normally in response to email request
- certificate is the same
Think of individual detectors as autonomous:
- how would they be useful?
- how would they work? to detect?
- how should they change system state in the normal case?
Possible anti-phishing system characteristics:
- language checks
- phishing attacks often have poor grammar and spelling
- system could check the spelling and grammar to look for changes
 
- URLs
- phishing URLs are often designed to look like those of the legitimate site (e.g., www.paypa1.com)
- system could check for unusual url characteristics, such as numbers, non-printing characters, characters like "|"
 
- past behaviour
- has the user entered this username/password at this domain before?
- does the user normally follow a link from an email before entering these credentials?
- does the certificate match the one where the user normally enters these credentials?
 
How would the system react to information gathered?
- the system should holistically assess all kinds of information gathered
- gather a rich picture of the email's characteristics, the website characteristics, and the user's behaviour
- there should be a sort of saturation point where enough characteristics point to phishing that the system reacts in such a way as to prevent loss of information
- what should the system do?
 
- should some system characteristics have more weight than others?
- should elements like certificate validity be considered more important and have more effect on the decision?
 
- the system should base this decision on many small indicators
List of individual detectors
- image filename check
- context / semantic word descriptions --> semantic integrity - verifying message / content integrity based on the content itself - even if it is digitally signed.
- spellcheck
- domain / ip address check
- certificate check - issuer name, domain name, client name, date of issue, date of expiry