Design Exercise: Phishing

How can we design an anti-phishing strategy that uses a biological approach?

Our theoretical phishing scenario involves:

banking
want credentials
using email
send an email that looks like it comes from the bank
link goes to malicious site that looks arbitrarily like the bank
- what does it mean to look like the bank?
user types in credentials, potentially gets transparently redirected to real bank site

Some of the problems that arise in phishing are related to:

faked email
link to site that looks like the bank but isn't the bank
url that looks like the bank's url, but isn't the bank's url
credentials being entered in wrong domain, wrong page
misappropriated text and images (both in email, and on the faked website)
bad/missing/suspect certificate
- certificate/credential combination is suspect

Human algorithm:

is the domain the same for the one where credentials are normally sent?
not normally in response to email request
certificate is the same

Think of individual detectors as autonomous:

how would they be useful?
how would they work? to detect?
how should they change system state in the normal case?

Possible anti-phishing system characteristics:

language checks
- phishing attacks often have poor grammar and spelling
- system could check the spelling and grammar to look for changes
URLs
- phishing URLs are often designed to look like those of the legitimate site (e.g., www.paypa1.com)
- system could check for unusual url characteristics, such as numbers, non-printing characters, characters like "|"
past behaviour
- has the user entered this username/password at this domain before?
- does the user normally follow a link from an email before entering these credentials?
- does the certificate match the one where the user normally enters these credentials?

How would the system react to information gathered?

the system should holistically assess all kinds of information gathered
gather a rich picture of the email's characteristics, the website characteristics, and the user's behaviour
there should be a sort of saturation point where enough characteristics point to phishing that the system reacts in such a way as to prevent loss of information
- what should the system do?
should some system characteristics have more weight than others?
- should elements like certificate validity be considered more important and have more effect on the decision?
the system should base this decision on many small indicators

List of individual detectors

image filename check

context / semantic word descriptions --> semantic integrity - verifying message / content integrity based on the content itself - even if it is digitally signed.

spellcheck

domain / ip address check

certificate check - issuer name, domain name, client name, date of issue, date of expiry