BioSec: DNMar23

From Soma-notes
Revision as of 13:50, 28 March 2012 by Afry (talk | contribs) (→‎List of individual detectors)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Design Exercise: Phishing

How can we design an anti-phishing strategy that uses a biological approach?

Our theoretical phishing scenario involves:

  • banking
  • want credentials
  • using email
  • send an email that looks like it comes from the bank
  • link goes to malicious site that looks arbitrarily like the bank
    • what does it mean to look like the bank?
  • user types in credentials, potentially gets transparently redirected to real bank site

Some of the problems that arise in phishing are related to:

  • faked email
  • link to site that looks like the bank but isn't the bank
  • url that looks like the bank's url, but isn't the bank's url
  • credentials being entered in wrong domain, wrong page
  • misappropriated text and images (both in email, and on the faked website)
  • bad/missing/suspect certificate
    • certificate/credential combination is suspect

Human algorithm:

  • is the domain the same for the one where credentials are normally sent?
  • not normally in response to email request
  • certificate is the same

Think of individual detectors as autonomous:

  • how would they be useful?
  • how would they work? to detect?
  • how should they change system state in the normal case?

Possible anti-phishing system characteristics:

  • language checks
    • phishing attacks often have poor grammar and spelling
    • system could check the spelling and grammar to look for changes
  • URLs
    • phishing URLs are often designed to look like those of the legitimate site (e.g., www.paypa1.com)
    • system could check for unusual url characteristics, such as numbers, non-printing characters, characters like "|"
  • past behaviour
    • has the user entered this username/password at this domain before?
    • does the user normally follow a link from an email before entering these credentials?
    • does the certificate match the one where the user normally enters these credentials?

How would the system react to information gathered?

  • the system should holistically assess all kinds of information gathered
  • gather a rich picture of the email's characteristics, the website characteristics, and the user's behaviour
  • there should be a sort of saturation point where enough characteristics point to phishing that the system reacts in such a way as to prevent loss of information
    • what should the system do?
  • should some system characteristics have more weight than others?
    • should elements like certificate validity be considered more important and have more effect on the decision?
  • the system should base this decision on many small indicators

List of individual detectors

  • Image filename/content sensor
    • A fuzzy hash/fingerprinting technique of the images would be another idea.
      • Could hook into something like TinEye
  • Cascading Stylesheet sensor -- a sort of visual appearance sensor.
    • Might give an indication that a page is visually masquerading as another page.
    • Are the elements of this page styled identically to the elements on my banking website?
    • Is the CSS file a hash-identical version of the CSS on my banking website?
  • context / semantic word descriptions --> semantic integrity - verifying message / content integrity based on the content itself - even if it is digitally signed. (huh? Could the original author of this fragment add more?)
  • Content depth sensor
    • Is the page a facade with no content except that which is visible and the login form meant to capture credentials?
    • Many phishing pages will jack the front/login page of a bank and then link all other content back to the original bank.
      • A detector that scored a page based on the structure/depth of the content it offers, stopping on any cross-server boundaries (i.e. not following links back to the 'real' bank if the phisher has emulated depth of content that way).
  • Spellcheck sensor
  • Domain / ip address sensor
    • Could use more advanced metrics. Is the domain name within a certain Levenshtein distance of a known financial institution?
      • Of one of the financial institutions that I frequent?
    • Is the whois lookup of the domain I'm connecting to sensible?
      • I.e. is it associated to the company I expect. Does it have proper contact information? Do e-mails to this information bounce?
  • GeoIP lookup sensor
    • Is the IP address I'm connecting to in the same country as my financial institution?
  • Certificate sensor
    • issuer name, domain name, client name, date of issue, date of expiry
  • HTTP Header sensor
    • Does the server reply with the same HTTP headers as were returned in previous visits to my bank?
    • Does it employ any of the X-headers for things such as content security policy, http-only cookies, etc.? Security features are likely not common to fake websites.
  • Web Search sensor
    • If I do a Google, Yahoo, Bing and DuckDuckGo search for the name of the company I'm connecting to does the URL I'm visiting appear in the results?
    • Does it appear in the top 10 results?
  • Load time sensor
    • How long does it take the website to load?
    • Does it match the ballpark of how long it took me to load the website on prior visits?
  • Traceroute sensor
    • What hops do my packets take along the way to the site I'm connecting to?
    • Is it absurdly different than usual?
    • Probably a more hair-brained idea. Prone to drift/uncertainty in normal cases...
  • Retrieve the page naturally, and through a proxy
    • See if the information retrieved is different from different network "perspectives"