BioSec: DNMar23: Difference between revisions
| (3 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| ==  | == Design Exercise: Phishing == | ||
| How can we design an anti-phishing strategy that uses a biological approach? | |||
| Our theoretical phishing scenario involves: | |||
| * banking | |||
| * want credentials | |||
| * using email | |||
| * send an email that looks like it comes from the bank | |||
| * link goes to malicious site that looks arbitrarily like the bank  | |||
| ** what does it mean to look like the bank? | |||
| * user types in credentials, potentially gets transparently redirected to real bank site | |||
| Some of the problems that arise in phishing are related to: | |||
| *  | * faked email | ||
| * link to site that looks like the bank but isn't the bank | * link to site that looks like the bank but isn't the bank | ||
| * url that looks like the bank's url, but isn't the bank's url | |||
| * credentials being entered in wrong domain, wrong page | * credentials being entered in wrong domain, wrong page | ||
| * misappropriated text | * misappropriated text and images (both in email, and on the faked website) | ||
| * bad/missing/suspect certificate | * bad/missing/suspect certificate | ||
| ** certificate/credential combination is suspect | ** certificate/credential combination is suspect | ||
| Line 29: | Line 30: | ||
| * how would they work? to detect? | * how would they work? to detect? | ||
| * how should they change system state in the normal case? | * how should they change system state in the normal case? | ||
| Possible anti-phishing system characteristics: | |||
| * language checks | |||
| ** phishing attacks often have poor grammar and spelling | |||
| ** system could check the spelling and grammar to look for changes | |||
| * URLs | |||
| ** phishing URLs are often designed to look like those of the legitimate site (e.g., www.paypa1.com) | |||
| ** system could check for unusual url characteristics, such as numbers, non-printing characters, characters like "|" | |||
| * past behaviour | |||
| ** has the user entered this username/password at this domain before? | |||
| ** does the user normally follow a link from an email before entering these credentials? | |||
| ** does the certificate match the one where the user normally enters these credentials? | |||
| How would the system react to information gathered? | |||
| * the system should holistically assess all kinds of information gathered | |||
| * gather a rich picture of the email's characteristics, the website characteristics, and the user's behaviour | |||
| * there should be a sort of saturation point where enough characteristics point to phishing that the system reacts in such a way as to prevent loss of information | |||
| ** what should the system do?  | |||
| * should some system characteristics have more weight than others?  | |||
| ** should elements like certificate validity be considered more important and have more effect on the decision? | |||
| * the system should base this decision on many small indicators | |||
| = List of individual detectors =   | = List of individual detectors =   | ||
| * '''Image filename/content sensor''' | |||
| ** A fuzzy hash/fingerprinting technique of the images would be another idea. | |||
| *** Could hook into something like [http://www.tineye.com/commercial_api TinEye] | |||
| * '''Cascading Stylesheet sensor''' -- a sort of visual appearance sensor. | |||
| ** Might give an indication that a page is visually masquerading as another page. | |||
| ** Are the elements of this page styled identically to the elements on my banking website? | |||
| ** Is the CSS file a hash-identical version of the CSS on my banking website? | |||
| * context / semantic word descriptions --> semantic integrity - verifying message / content integrity based on the content itself - even if it is digitally signed. ''(huh? Could the original author of this fragment add more?)'' | |||
| * '''Content depth sensor''' | |||
| ** Is the page a facade with no content except that which is visible and the login form meant to capture credentials? | |||
| ** Many phishing pages will jack the front/login page of a bank and then link all other content back to the original bank. | |||
| *** A detector that scored a page based on the structure/depth of the content it offers, stopping on any cross-server boundaries (i.e. not following links back to the 'real' bank if the phisher has emulated depth of content that way). | |||
| * '''Spellcheck sensor''' | |||
| * '''Domain / ip address sensor''' | |||
| ** Could use more advanced metrics. Is the domain name within a certain [http://en.wikipedia.org/wiki/Levenshtein_distance Levenshtein distance] of a known financial institution? | |||
| *** Of one of the financial institutions that I frequent? | |||
| ** Is the whois lookup of the domain I'm connecting to sensible? | |||
| *** I.e. is it associated to the company I expect. Does it have proper contact information? Do e-mails to this information bounce? | |||
| *''' GeoIP lookup sensor''' | |||
| ** Is the IP address I'm connecting to in the same country as my financial institution? | |||
| * '''Certificate sensor'''  | |||
| ** issuer name, domain name, client name, date of issue, date of expiry | |||
| * '''HTTP Header sensor''' | |||
| ** Does the server reply with the same HTTP headers as were returned in previous visits to my bank? | |||
| ** Does it employ any of the X-headers for things such as content security policy, http-only cookies, etc.? Security features are likely not common to fake websites. | |||
| * '''Web Search sensor''' | |||
| ** If I do a Google, Yahoo, Bing and DuckDuckGo search for the name of the company I'm connecting to does the URL I'm visiting appear in the results? | |||
| ** Does it appear in the top 10 results? | |||
| *''' Load time sensor''' | |||
| ** How long does it take the website to load? | |||
| ** Does it match the ballpark of how long it took me to load the website on prior visits? | |||
| * '''Traceroute sensor''' | |||
| ** What hops do my packets take along the way to the site I'm connecting to? | |||
| ** Is it absurdly different than usual? | |||
| ** Probably a more hair-brained idea. Prone to drift/uncertainty in normal cases... | |||
| * '''Safe browsing sensor''' | |||
| ** Does the URL get flagged when submitted to the [http://code.google.com/apis/safebrowsing/ Google Safebrowsing API]? | |||
| * '''Retrieve the page naturally, and through a proxy''' | |||
| ** See if the information retrieved is different from different network "perspectives" | |||
Latest revision as of 13:50, 28 March 2012
Design Exercise: Phishing
How can we design an anti-phishing strategy that uses a biological approach?
Our theoretical phishing scenario involves:
- banking
- want credentials
- using email
- send an email that looks like it comes from the bank
- link goes to malicious site that looks arbitrarily like the bank
- what does it mean to look like the bank?
 
- user types in credentials, potentially gets transparently redirected to real bank site
Some of the problems that arise in phishing are related to:
- faked email
- link to site that looks like the bank but isn't the bank
- url that looks like the bank's url, but isn't the bank's url
- credentials being entered in wrong domain, wrong page
- misappropriated text and images (both in email, and on the faked website)
- bad/missing/suspect certificate
- certificate/credential combination is suspect
 
Human algorithm:
- is the domain the same for the one where credentials are normally sent?
- not normally in response to email request
- certificate is the same
Think of individual detectors as autonomous:
- how would they be useful?
- how would they work? to detect?
- how should they change system state in the normal case?
Possible anti-phishing system characteristics:
- language checks
- phishing attacks often have poor grammar and spelling
- system could check the spelling and grammar to look for changes
 
- URLs
- phishing URLs are often designed to look like those of the legitimate site (e.g., www.paypa1.com)
- system could check for unusual url characteristics, such as numbers, non-printing characters, characters like "|"
 
- past behaviour
- has the user entered this username/password at this domain before?
- does the user normally follow a link from an email before entering these credentials?
- does the certificate match the one where the user normally enters these credentials?
 
How would the system react to information gathered?
- the system should holistically assess all kinds of information gathered
- gather a rich picture of the email's characteristics, the website characteristics, and the user's behaviour
- there should be a sort of saturation point where enough characteristics point to phishing that the system reacts in such a way as to prevent loss of information
- what should the system do?
 
- should some system characteristics have more weight than others?
- should elements like certificate validity be considered more important and have more effect on the decision?
 
- the system should base this decision on many small indicators
List of individual detectors
- Image filename/content sensor
- A fuzzy hash/fingerprinting technique of the images would be another idea.
- Could hook into something like TinEye
 
 
- A fuzzy hash/fingerprinting technique of the images would be another idea.
- Cascading Stylesheet sensor -- a sort of visual appearance sensor.
- Might give an indication that a page is visually masquerading as another page.
- Are the elements of this page styled identically to the elements on my banking website?
- Is the CSS file a hash-identical version of the CSS on my banking website?
 
- context / semantic word descriptions --> semantic integrity - verifying message / content integrity based on the content itself - even if it is digitally signed. (huh? Could the original author of this fragment add more?)
- Content depth sensor
- Is the page a facade with no content except that which is visible and the login form meant to capture credentials?
- Many phishing pages will jack the front/login page of a bank and then link all other content back to the original bank.
- A detector that scored a page based on the structure/depth of the content it offers, stopping on any cross-server boundaries (i.e. not following links back to the 'real' bank if the phisher has emulated depth of content that way).
 
 
- Spellcheck sensor
- Domain / ip address sensor
- Could use more advanced metrics. Is the domain name within a certain Levenshtein distance of a known financial institution?
- Of one of the financial institutions that I frequent?
 
- Is the whois lookup of the domain I'm connecting to sensible?
- I.e. is it associated to the company I expect. Does it have proper contact information? Do e-mails to this information bounce?
 
 
- Could use more advanced metrics. Is the domain name within a certain Levenshtein distance of a known financial institution?
-  GeoIP lookup sensor
- Is the IP address I'm connecting to in the same country as my financial institution?
 
- Certificate sensor
- issuer name, domain name, client name, date of issue, date of expiry
 
- HTTP Header sensor
- Does the server reply with the same HTTP headers as were returned in previous visits to my bank?
- Does it employ any of the X-headers for things such as content security policy, http-only cookies, etc.? Security features are likely not common to fake websites.
 
- Web Search sensor
- If I do a Google, Yahoo, Bing and DuckDuckGo search for the name of the company I'm connecting to does the URL I'm visiting appear in the results?
- Does it appear in the top 10 results?
 
-  Load time sensor
- How long does it take the website to load?
- Does it match the ballpark of how long it took me to load the website on prior visits?
 
- Traceroute sensor
- What hops do my packets take along the way to the site I'm connecting to?
- Is it absurdly different than usual?
- Probably a more hair-brained idea. Prone to drift/uncertainty in normal cases...
 
- Safe browsing sensor
- Does the URL get flagged when submitted to the Google Safebrowsing API?
 
- Retrieve the page naturally, and through a proxy
- See if the information retrieved is different from different network "perspectives"