WebFund 2024F Lecture 16

From Soma-notes

Video

Video from the lecture for November 12, 2024 is now available:

Notes

Lecture 16
----------

 - midterm is being graded, should hopefully be done by end of the week
 - Assignment 2 should finally be uploaded later today (sorry!)

 - Q7 on A3: your analysis page should have 5 or 6 extra numbers, one per question (it is 5 or 6 depending on whether you are modifying your code with the extra question or not)
   Q3 has 5 blank submissions
   Q5 has 20 blank submissions

Since you also report on the total number of submissions, with these stats you can see what fraction of students left a question blank

--------------

Today: cookies & TLS

We previously discussed how http is a stateless protocol

But sometimes we need state
 - most commonly, when we "log in" to a website
 - but is also used for site preferences (theme), other tasks
 - and of course, tracking for advertising targeting

Most common mechanism for adding state to http is the "cookie" mechanism

But what is a cookie?
 - data set by the server with a "Set-Cookie:" header
 - later requests to the server get the added header "Cookie: " with the value
   of the cookies that were previously set
 - allows the server to "save" data on the client (browser) that will be given back to it
    - remember, servers talk to many clients, so things like cookies
      help the server distinguish them from each other

 - But remember, cookies are just data included in an HTTP header
    - so anyone can set any cookie
    - only way to be secure is to 1) make sure it isn't sent to the wrong
      sites and 2) make sure it isn't guessable



Web Server                    Browser
                 <---         GET /index.html

contents of index.html  ---->   renders index.html, saves user=bob
 + Content-Type: text/html      cookie for this server
 + Set-Cookie: user=bob


                 <---         GET /index.html
		              + Cookies: user=bob

contents of index.html  --->   
  for the user bob


Any browser can set any cookie
 - if you can guess a cookie that allows you to get confidential information,
   that cookie is insecure (as is the web app)
 - if you can steal the cookie, you have some sort of hijacking attack
    - and can potentially be stolen by just impersonating the right site


So why is it called cookie?
 - this is actually an old term
 - used by the X Window system, which predates the web
 - used as a general term for data stored that is opaque to the storer &
   used for authentication & session management

Note that a browser has NO OBLIGATION to store a cookie for a website
 - it can forget it immediately
 - it can forget it later

Browsers generally shouldn't modify cookies, but they can easily forget them


Why is it a cookie?
 - I think someone was hungry

What does it mean for a cookie to be secure?
 - that is separate from secure cookie handling, which needs other technology
 - essentially, the cookie should not be guessable by unauthorized parties
    - so if you have a cookie that represents that a user is logged in,
      unauthorized users shouldn't be able to guess it, otherwise
      they can be logged in as well, even if they don't know the password

This is why you have to keep logging in on the web
 - everyone is paranoid about cookies being stolen, so they
   make them valid only for a limited period of time

But remember, http is sent in the clear over the Internet
 - anyone could be snooping
 - and if they are listening in, they can grab any important cookies
   and use them for bad purposes
 - doesn't matter if they are secure cookies or not!

This is why almost all web traffic today is encrypted, using the protocol https
 - https is HTTP over TLS
 - TLS = transport layer security
 - used to be SSL = secure sockets layer
   - decided to have a more general name for a more general mechanism

TLS can be used to secure any TCP/IP data stream
 - email (POP3, IMAP, SMTP) uses TLS today
 - basically any regular protocol can be put over TLS to make it "secure"

But the security guarantees of TLS are very specific and have very strict requirements

So if the assumptions of TLS hold, then you get
 - no eavesdropping (confidentiality)
    - even if traffic was recorded, can't be decoded later even if both parties are compromised later (perfect forward secrecy)
 - no undedectable tampering (modifications just end communication with an error) (integrity)
 - one-sided or two-sided authentication
    - you either know who the server is, or you know who both the server & client are
    - on the web today we almost always do one-sided authentication
      (username/password/2FA are used for the other direction, not TLS)

In TLS, entities are identified with certificates. A certificate has
 - the public key of the entity
 - associated metadata (name of organization/server, how long valid, etc.)

In public key cryptography, you have public keys and private keys
 - public keys are given away
 - private keys must be private!

A private key is used to decrypt or sign data
A public key is used to encrypt or verify signatures

If you publish your public key
 - anyone can send you a secret message, but only you can read it
 - only you can sign a document as you, but anyone can verify that signature

Digital signatures are cool because they say who signed it and that the document has not been modified at all
 - much better than signatures on paper!