WebFund 2024F Lecture 2

From Soma-notes
Revision as of 19:01, 10 September 2024 by Soma (talk | contribs)

Video

Video from the lecture given on September 10, 2024 is now available:

Notes

Lecture 2
---------

* Tutorials are released on Thursdays, may be available before but may note be finalized
  (Tutorial will say whether they are finalized as well)

* Same for assignments

* Tutorials are due when the assignment associated with them are due...but remember, you have to get checked off by a TA in their office hours or in a tutorial. You can't just email or message them with your answers. So don't put it off. They won't make extra office hours. (You can talk to a TA virtually, but it has to at least be a call.)

(Will be posting office hours soon.)

* Lecture quiz for today will be posted this afternoon
  - you'll have a week to complete
  - make sure you don't get too far behind
  - will be pretty simple


* Assignments will be released at least a week before they are due
  - they are based on the tutorials, so if you understand them the assignment shouldn't be too hard

My wiki is at https://homeostasis.scs.carleton.ca/wiki

For software, we're going to be using openstack, so you don't need anything
 - but basically we're using Deno, you can install that on your machine to use it.

You can get checked off for tutorials by going to tutorials and attending for 1-1.5 hours. However, this isn't necessarily enough time to understand the material. You'll need to spend the time you need to learn, just as you do in any class.

Can you use a VM on your own machine?
 - yes, but we will be supporting and recommending openstack

Why no final project?
 - can be done without understanding material thanks to modern tools


The Internet
------------
 - because the web runs on the Internet
 - can't understand its design choices without understanding the Internet
 - and I don't think you all have learned about it at a technical level


When we talk about the Internet, we're talking about a network of networks
 - what is a network then? in CS terms, just computers talking to each other

The original Ethernet was pretty simple
 - a wire connecting lots of computers
 - all computers could "listen" to the wire always
 - any computer could start "speaking" over the wire at any time
 - but if two computers "talked" at the same time, jibberish
 - so they had to take turns
    - if someone else was speaking, they would wait a random amount of time before speaking again (if there was a "collision")
    - if problems again, would wait exponentially longer

Xerox PARC made ethernet, and object oriented programming (smalltalk), and GUIs, and laser printers

Now, when we say computers "speak" to each other, what is the method of that conversation (the "protocol")?
 - ethernet is really a protocol for how computers can speak to each other
 - a "low level" protocol, in that it is designed to work one layer up from a physical communication medium

Originally, ethernet was just for wires. But now the ethernet protocol is used for fiber optic cables and for wireless communication
 - Wi-Fi uses ethernet-type messages, just over microwaves

Ethernet is a protocol for how to send and receive ethernet packets
 - a packet is a fixed-sized unit of communication
 - has a header and payload
    - like a "postcard"
    - header has "metadata" - where information is going to, where it is from, length, etc
    - payload is the contents to be communicated

Standard ethernet packets are 1500 bytes (I think)

So to send data to another computer with ethernet, you have to
 - figure out the ethernet address of the destination computer
 - convert data you want to send into a series of packets
 - send the packets

To receive, you just listen in for packets with your address

Traditionally, ethernet was a broadcast protocol
 - all computers saw all packets being transmitted

Ethernet is great for connecting local computers, i.e., making a local area network (LAN)
But it doesn't scale
 - quickly networks get too big
 - so need a way to connect networks

For more: https://en.wikipedia.org/wiki/Local_area_network

The Internet is, in effect, a way of connecting multiple "ethernet" networks together
 (actually, can be any technology, and there are lots, but mostly now they are ethernet-like)

(There are lots of privacy and security issues that come up, we aren't going to discuss now.)

How do we connect networks to each other? Need an interchange language, a protocol that everyone understands

That's the Internet Protocol (IP)

All it is, is a specification for packets: IP packets

IP packets have metadata and data
 - metadata: source, destination, length, checksum (simple hash)
 - plus data

IP packets sound similar to ethernet packets right?
 - but have different namespaces for sources and destinations
 
So what is actually transmitted are ethernet packets that contain IP packets

Isn't that wasteful? Why not just use one or the other?
 - low-level network hardware doesn't understand IP packets
 - and ethernet packets only make sense in a LAN

Your computer wants to send data to Google
 - figure out Google's IP address
 - generate IP packets for data to be sent to Google
 - encapuslate IP packets into ethernet packets to local router
 - send packets to router

So, what is this router?
 - a computer with a connection to multiple networks (at least 2)

So your computer would set the local router as the ethernet destination,
 and the router would send the packets along to another router, and so on,
 until it gets to a Google computer

routers are like sorting centers for shipping

By the way, this tech was all originally developed in one form or another for phone networks
 - and they used to charge for long-distance messages (outside of the "LAN" of a city)

You'll notice a few issues
 - how long do packets take to get somewhere?
 - what if they get lost or damaged in transit?
 - what if they arrive in a different order?

order is easy, just number the packets

lost or damaged, need to be resent by origin
 - hopefully it still has a copy

how long?
 - telecom networks had time arrival guarantees
 - Internet is mostly "best effort"
    - we'll try, but no guarantees

There is this notion of "QoS" that gets applied
 - quality of service

But that is always a preference, almost never a guarantee

This is why sometimes your Internet connection can be amazing and then other times it is crap
 - mostly because other people are using the network too much!


Note the Internet was designed for texting, email, sending files, NOT for interactive video! Amazing that it works! (But same tech underneath!)


What if I just want to send a continuous stream of data
 - need multiple packets
 - in a fixed order
 - with dropped/damaged packets being retransmitted

TCP (transmission control protocol), that's the protocol ON TOP of IP
 - so you have TCP inside IP inside Ethernet

UDP is also used, just IP packets with a port (port denotes which program gets the packet), no provision for retransmission or order, up to the program receiving the data

What is the web in all of this?
 - another protocol, HTTP

So we send

 - HTTP inside a TCP stream
 - TCP stream inside IP packets
 - IP packets inside ethernet packets

Actually for the modern web

 - HTTP inside TLS stream
 - TLS stream inside a TCP stream
 - TCP stream inside IP packets
 - IP packets inside ethernet packets

TLS (was SSL) adds encryption and authentication ("secure" web pages)
 - Transport Layer Security

QUIC is HTTP + TLS + TCP in functionality (well, works with HTTP/3)
 - used by Google and some other folks to make web (and ads) faster

HTTPS is just HTTP over TLS

Cryptography != Security
 - lots to say, ask me later


TLS "guarantees"
 - data not modified in transit by unauthorized parties
 - know the identity of source of data
 - data is secret, not known to outside parties

But that's it


HTTPS was created to allow credit cards to be sent over the Internet (web)
Now HTTPS is being used for everything

Remember that HTTP was a protocol designed to send and receive HTML documents
 - HyperText Transfer protocol versus HyperText Markup Language

That's all the web was originally, HTTP and HTML (running on top of TCP, IP, and Ethernet). But now it is that plus much more.

IP, TCP, Ethernet date from the early to mid 1970's. HTTP and HTML are from the early 1990's.
  - the Internet was much more than the Web
  - but the Web has consumed everything

Why is security such a big issue on the web?
 - any computer talking to any computer, about everything
 - so, how do you know bad guys aren't getting in the middle? and how do you stop them from doing damage?

Resources

Wikipedia has excellent articles on the basics of networking that also link to great resources:

This material is all beyond the scope of this class, but understanding these technologies will help you understand the technical context around the web.