WebFund 2024F Lecture 2: Difference between revisions
No edit summary |
|||
Line 76: | Line 76: | ||
- like a "postcard" | - like a "postcard" | ||
- header has "metadata" - where information is going to, where it is from, length, etc | - header has "metadata" - where information is going to, where it is from, length, etc | ||
CORRECTION: no length in ethernet packets | |||
- payload is the contents to be communicated | - payload is the contents to be communicated | ||
Revision as of 19:07, 10 September 2024
Video
Video from the lecture given on September 10, 2024 is now available:
Notes
Lecture 2 --------- * Tutorials are released on Thursdays, may be available before but may note be finalized (Tutorial will say whether they are finalized as well) * Same for assignments * Tutorials are due when the assignment associated with them are due...but remember, you have to get checked off by a TA in their office hours or in a tutorial. You can't just email or message them with your answers. So don't put it off. They won't make extra office hours. (You can talk to a TA virtually, but it has to at least be a call.) (Will be posting office hours soon.) * Lecture quiz for today will be posted this afternoon - you'll have a week to complete - make sure you don't get too far behind - will be pretty simple * Assignments will be released at least a week before they are due - they are based on the tutorials, so if you understand them the assignment shouldn't be too hard My wiki is at https://homeostasis.scs.carleton.ca/wiki For software, we're going to be using openstack, so you don't need anything - but basically we're using Deno, you can install that on your machine to use it. You can get checked off for tutorials by going to tutorials and attending for 1-1.5 hours. However, this isn't necessarily enough time to understand the material. You'll need to spend the time you need to learn, just as you do in any class. Can you use a VM on your own machine? - yes, but we will be supporting and recommending openstack Why no final project? - can be done without understanding material thanks to modern tools The Internet ------------ - because the web runs on the Internet - can't understand its design choices without understanding the Internet - and I don't think you all have learned about it at a technical level When we talk about the Internet, we're talking about a network of networks - what is a network then? in CS terms, just computers talking to each other The original Ethernet was pretty simple - a wire connecting lots of computers - all computers could "listen" to the wire always - any computer could start "speaking" over the wire at any time - but if two computers "talked" at the same time, jibberish - so they had to take turns - if someone else was speaking, they would wait a random amount of time before speaking again (if there was a "collision") - if problems again, would wait exponentially longer Xerox PARC made ethernet, and object oriented programming (smalltalk), and GUIs, and laser printers Now, when we say computers "speak" to each other, what is the method of that conversation (the "protocol")? - ethernet is really a protocol for how computers can speak to each other - a "low level" protocol, in that it is designed to work one layer up from a physical communication medium Originally, ethernet was just for wires. But now the ethernet protocol is used for fiber optic cables and for wireless communication - Wi-Fi uses ethernet-type messages, just over microwaves Ethernet is a protocol for how to send and receive ethernet packets - a packet is a fixed-sized unit of communication - has a header and payload - like a "postcard" - header has "metadata" - where information is going to, where it is from, length, etc CORRECTION: no length in ethernet packets - payload is the contents to be communicated Standard ethernet packets are 1500 bytes (I think) So to send data to another computer with ethernet, you have to - figure out the ethernet address of the destination computer - convert data you want to send into a series of packets - send the packets To receive, you just listen in for packets with your address Traditionally, ethernet was a broadcast protocol - all computers saw all packets being transmitted Ethernet is great for connecting local computers, i.e., making a local area network (LAN) But it doesn't scale - quickly networks get too big - so need a way to connect networks For more: https://en.wikipedia.org/wiki/Local_area_network The Internet is, in effect, a way of connecting multiple "ethernet" networks together (actually, can be any technology, and there are lots, but mostly now they are ethernet-like) (There are lots of privacy and security issues that come up, we aren't going to discuss now.) How do we connect networks to each other? Need an interchange language, a protocol that everyone understands That's the Internet Protocol (IP) All it is, is a specification for packets: IP packets IP packets have metadata and data - metadata: source, destination, length, checksum (simple hash) - plus data IP packets sound similar to ethernet packets right? - but have different namespaces for sources and destinations So what is actually transmitted are ethernet packets that contain IP packets Isn't that wasteful? Why not just use one or the other? - low-level network hardware doesn't understand IP packets - and ethernet packets only make sense in a LAN Your computer wants to send data to Google - figure out Google's IP address - generate IP packets for data to be sent to Google - encapuslate IP packets into ethernet packets to local router - send packets to router So, what is this router? - a computer with a connection to multiple networks (at least 2) So your computer would set the local router as the ethernet destination, and the router would send the packets along to another router, and so on, until it gets to a Google computer routers are like sorting centers for shipping By the way, this tech was all originally developed in one form or another for phone networks - and they used to charge for long-distance messages (outside of the "LAN" of a city) You'll notice a few issues - how long do packets take to get somewhere? - what if they get lost or damaged in transit? - what if they arrive in a different order? order is easy, just number the packets lost or damaged, need to be resent by origin - hopefully it still has a copy how long? - telecom networks had time arrival guarantees - Internet is mostly "best effort" - we'll try, but no guarantees There is this notion of "QoS" that gets applied - quality of service But that is always a preference, almost never a guarantee This is why sometimes your Internet connection can be amazing and then other times it is crap - mostly because other people are using the network too much! Note the Internet was designed for texting, email, sending files, NOT for interactive video! Amazing that it works! (But same tech underneath!) What if I just want to send a continuous stream of data - need multiple packets - in a fixed order - with dropped/damaged packets being retransmitted TCP (transmission control protocol), that's the protocol ON TOP of IP - so you have TCP inside IP inside Ethernet UDP is also used, just IP packets with a port (port denotes which program gets the packet), no provision for retransmission or order, up to the program receiving the data What is the web in all of this? - another protocol, HTTP So we send - HTTP inside a TCP stream - TCP stream inside IP packets - IP packets inside ethernet packets Actually for the modern web - HTTP inside TLS stream - TLS stream inside a TCP stream - TCP stream inside IP packets - IP packets inside ethernet packets TLS (was SSL) adds encryption and authentication ("secure" web pages) - Transport Layer Security QUIC is HTTP + TLS + TCP in functionality (well, works with HTTP/3) - used by Google and some other folks to make web (and ads) faster HTTPS is just HTTP over TLS Cryptography != Security - lots to say, ask me later TLS "guarantees" - data not modified in transit by unauthorized parties - know the identity of source of data - data is secret, not known to outside parties But that's it HTTPS was created to allow credit cards to be sent over the Internet (web) Now HTTPS is being used for everything Remember that HTTP was a protocol designed to send and receive HTML documents - HyperText Transfer protocol versus HyperText Markup Language That's all the web was originally, HTTP and HTML (running on top of TCP, IP, and Ethernet). But now it is that plus much more. IP, TCP, Ethernet date from the early to mid 1970's. HTTP and HTML are from the early 1990's. - the Internet was much more than the Web - but the Web has consumed everything Why is security such a big issue on the web? - any computer talking to any computer, about everything - so, how do you know bad guys aren't getting in the middle? and how do you stop them from doing damage?
Resources
Wikipedia has excellent articles on the basics of networking that also link to great resources:
This material is all beyond the scope of this class, but understanding these technologies will help you understand the technical context around the web.