WebFund 2016W Lecture 3
Video
The video from the lecture given on January 14, 2015 is now available.
Notes
In-class Notes
Lecture 3 --------- What is the web? Web != Internet So, what is the Internet?! Network of networks What is a network? Networks allow computers to talk to each other WiFi, Ethernet, LTE - different standards for computer networks Layers of networking (OSI) * Physical layer * Data link layer * Network layer <---- * Application layer (and such) Internet Protocol (IP) - packet-based protocol (NOT a stream or continuous protocol) Old POTS - wire from your house to a switching station - wires between switching stations A and B ... ... - wires between switching station Y to Z - wire from switching station to house Packet switching networks multiplex physical wires over time. (Multiple communication connections over one wire by them taking turns.) The "turn" is the packet IP protocol is a "best effort" protocol. NO error correction or retransmission. TCP means Transmission Control Protocol turns packets into a reliable data data stream Network Firewalls block "unwanted traffic" - really, block all those protocols that aren't safe on the open Internet TCP adds a "port" to the IP address - the port identifies which program to talk to - some are ephemeral (temporary) - others are "well known", meaning protocol on that port is standardized Port 25 is for SMTP (email) 80: HTTP 443: HTTPS 22: ssh Web is transmitted over HTTP Major HTTP commands * GET: get documents from server - can be CACHED * POST: send form contents to server - are not cached You can GET almost anything - MS Word .doc - tiff - PDF But the real web is - JPEG, GIF, PNG for images - CSS for style sheets - HTML for content - JavaScript for code .swf is flash. Flash is not a web standard. AVOID
Student Notes
The Web
Web != Internet
We have to talk about networking...
- The internet is a network of networks (inter networks)
- What is a network?
- Networks allow computers to talk to each other
- There are many different technologies and standards used to connect computers. They work in different ways. This is partially due to frequency rights and legacy issues. Some well-known ones include:
- Ethernet
- WiFi
- LTE (Long Term Evolution)
- Layers of networking (OSI model)
- Physical Layer (e.g. wire, or frequency)
- Data Link Layer
- Network Layer
- Interconnection standard
- one standard = INTERNET PROTOCOL!!
- TCP/IP
- all layers have to transport IP (internet protocol)
- Interconnection standard
- Transport Layer
- Session Layer
- Presentation Layer
- Application Layer
Internet Protocol (IP)
- Packet-based protocol (how do you interpret bits in and out)
- NOT a stream or continuous protocol like POTS: plain old telephone system
- Old POTS:
- Wire from origin home to switching station
- Wires between switching stations
- ...
- Wire from switch station to destination home
- Problem: there's a 1:1 ratio of wires to connections. This is inefficient.
- Goal is to share 1 wire for 10 phone calls!
- Packet switching networks multiplex physical wires over time (Multiple connections by them taking turns)
- People have to take turns
- The "turn" is the packet
- It's actually pretty good if not all people are talking at once, then it slows down
- https://en.wikipedia.org/wiki/Network_packet
IP Addresses
- Any computer that's on a network has an IP address
- We currently use IPv4 addresses but we are transitioning towards IPv6
- Packets have an IP address source and destination
- The address for localhost is 127.0.0.1
- That's the computer itself
- Domain name system (DNS) translates between IP addresses and "English" (e.g. www.google.ca)
TCIP/IP
- What happens when packets gets lots or corrupt? (electrical interference/microwave/etc)
- The IP protocol is a "best effort" protocol. NO error correction or re-transmission
- For this, we use TCP/IP, not just IP
- TCP means Transmission Control Protocol
- Turns packets into a RELIABLE data stream
- TCP is layer that takes care of re-transmitting as necessary. It'll come through reliably in the same order.
Firewalls and Ports
- A Network Firewall blocks "unwanted traffic"
- Really blocks all those protocols that aren't safe on the open internet
- Firewalls blocked everything except "The Web"
- Not that it's so great, it's just the only thing that wasn't blocked on the open internet.
- TCP adds a port to the IP addresses on both sides
- The port identifies which program to talk to
- Some ports are temporary (ephemeral)
- Others are "well known", meaning the protocol on that port is standardized
- Port 25: SMTP (email)
- Port 80: HTTP
- Port 443: HTTPS
- Port 22: SSH (secure shell)
- Modern web browsers don't show the HTTP (protocol) or port number (:80 for HTTP or :443 for HTTPS)
- Try localhost:631 //on Linux
- "Standard" ports are below 1024
- [www.iana.org www.iana.org] //these are the guys that decide the protocols for the ports
- Port 80 is hypertext transfer protocol, the web is transmitted over HTTP (protocol)
- HTTP over TCP over IP over datalink layers, over physical layers
HTTP
- Browsers (clients) communicate to servers through HTTP requests (typically GET requests)
- GET requests have request headers which consist of key-value pairs
- Responses also have headers of key-value pairs
- Over the history of the web, different companies implemented different standards
- Like IE6, Web pages would be required to send different info, web pages would break
- Normally, in Web dev, you're working a level above HTTP (you use the existing standard)
- What are the major HTTP requests:
- GET
- Client GETs resource from the server
- Can be CACHED (people in the middle can hold copies)
- POST
- POST (send) information (such as form contents) to the web server
- Are NOT CACHED
- GET
- Doing a reload on a GET is safe
- The resource is simply requested again
- This usually results in a cached copy being used
- Doing a reload on a POST isn't safe (you typically do not want to do this action twice)
- They permanently alter the server state
- You can GET almost anything!
- .doc
- .tiff
- .anything!
- Typical file types that are requested include:
- HTML for content
- CSS (style sheets)
- JavaScript for code
- JPEG, GIF, PNG for images
- .swf is flash
- We don't like flash (bad security, performance hog, new browsers don't have support for plugins)
- NOT STANDARD, AVOID
- Plugins inject arbitrary code into browsers, but "real web content" (HTML CSS, JS, Media) is ENOUGH.
- Modern JS has fully-fledged Socket Support
HTML Page Example
In a new .html file:
<!DOCTYPE html> <!-- this isn't there -->
<html>
<head> <!-- header section -->
<title>A simple web page</title>
</head>
<body>
<h1>A SIMPLE WEB PAGE</h1>
<p>It isn't so bad.</p>
</body>
</html>