WebFund 2016W Lecture 3

From Soma-notes

Video

The video from the lecture given on January 14, 2015 is now available.

Notes

In-class Notes

Lecture 3
---------
What is the web?

Web != Internet

So, what is the Internet?!

Network of networks

What is a network?

Networks allow computers to talk to each other

WiFi, Ethernet, LTE - different standards for computer networks

Layers of networking (OSI)
* Physical layer
* Data link layer
* Network layer <---- 
* Application layer (and such)

Internet Protocol (IP)
 - packet-based protocol (NOT a stream or continuous protocol)


Old POTS
 - wire from your house to a switching station
 - wires between switching stations A and B
 ...
 ...
 - wires between switching station Y to Z
 - wire from switching station to house

Packet switching networks multiplex physical wires over time. (Multiple communication connections over one wire by them taking turns.)

The "turn" is the packet

IP protocol is a "best effort" protocol. NO error correction or retransmission.

TCP means Transmission Control Protocol
turns packets into a reliable data data stream

Network Firewalls block "unwanted traffic"
 - really, block all those protocols that aren't safe on the open Internet

TCP adds a "port" to the IP address
 - the port identifies which program to talk to
 - some are ephemeral (temporary)
 - others are "well known", meaning protocol on that port is standardized

Port 25 is for SMTP (email)
80: HTTP
443: HTTPS
22: ssh


Web is transmitted over HTTP

Major HTTP commands
 * GET: get documents from server
  - can be CACHED
 * POST: send form contents to server
  - are not cached

You can GET almost anything
 - MS Word .doc
 - tiff
 - PDF

But the real web is
 - JPEG, GIF, PNG for images
 - CSS for style sheets
 - HTML for content
 - JavaScript for code

.swf is flash.  Flash is not a web standard.  AVOID


Student Notes

The Web

Web != Internet

We have to talk about networking...

  • The internet is a network of networks (inter networks)
  • What is a network?
    • Networks allow computers to talk to each other
  • There are many different technologies and standards used to connect computers. They work in different ways. This is partially due to frequency rights and legacy issues. Some well-known ones include:
    • Ethernet
    • WiFi
    • LTE (Long Term Evolution)
  • Layers of networking (OSI model)
    • Physical Layer (e.g. wire, or frequency)
    • Data Link Layer
    • Network Layer
      • Interconnection standard
        • one standard = INTERNET PROTOCOL!!
        • TCP/IP
        • all layers have to transport IP (internet protocol)
    • Transport Layer
    • Session Layer
    • Presentation Layer
    • Application Layer
Internet Protocol (IP)
  • Packet-based protocol (how do you interpret bits in and out)
  • NOT a stream or continuous protocol like POTS: plain old telephone system
  • Old POTS:
    • Wire from origin home to switching station
    • Wires between switching stations
    • ...
    • Wire from switch station to destination home
  • Problem: there's a 1:1 ratio of wires to connections. This is inefficient.
  • Goal is to share 1 wire for 10 phone calls!
  • Packet switching networks multiplex physical wires over time (Multiple connections by them taking turns)
IP Addresses
  • Any computer that's on a network has an IP address
  • We currently use IPv4 addresses but we are transitioning towards IPv6
  • Packets have an IP address source and destination
  • The address for localhost is 127.0.0.1
    • That's the computer itself
  • Domain name system (DNS) translates between IP addresses and "English" (e.g. www.google.ca)
TCIP/IP
  • What happens when packets gets lots or corrupt? (electrical interference/microwave/etc)
  • The IP protocol is a "best effort" protocol. NO error correction or re-transmission
  • For this, we use TCP/IP, not just IP
  • TCP means Transmission Control Protocol
  • Turns packets into a RELIABLE data stream
  • TCP is layer that takes care of re-transmitting as necessary. It'll come through reliably in the same order.
Firewalls and Ports
  • A Network Firewall blocks "unwanted traffic"
    • Really blocks all those protocols that aren't safe on the open internet
  • Firewalls blocked everything except "The Web"
    • Not that it's so great, it's just the only thing that wasn't blocked on the open internet.
  • TCP adds a port to the IP addresses on both sides
  • The port identifies which program to talk to
    • Some ports are temporary (ephemeral)
    • Others are "well known", meaning the protocol on that port is standardized
      • Port 25: SMTP (email)
      • Port 80: HTTP
      • Port 443: HTTPS
      • Port 22: SSH (secure shell)
  • Modern web browsers don't show the HTTP (protocol) or port number (:80 for HTTP or :443 for HTTPS)
  • Try localhost:631 //on Linux
  • "Standard" ports are below 1024
  • [www.iana.org www.iana.org] //these are the guys that decide the protocols for the ports
  • Port 80 is hypertext transfer protocol, the web is transmitted over HTTP (protocol)
    • HTTP over TCP over IP over datalink layers, over physical layers
HTTP
  • Browsers (clients) communicate to servers through HTTP requests (typically GET requests)
    • GET requests have request headers which consist of key-value pairs
    • Responses also have headers of key-value pairs
  • Over the history of the web, different companies implemented different standards
    • Like IE6, Web pages would be required to send different info, web pages would break
  • Normally, in Web dev, you're working a level above HTTP (you use the existing standard)
  • What are the major HTTP requests:
    • GET
      • Client GETs resource from the server
      • Can be CACHED (people in the middle can hold copies)
    • POST
      • POST (send) information (such as form contents) to the web server
      • Are NOT CACHED
  • Doing a reload on a GET is safe
    • The resource is simply requested again
    • This usually results in a cached copy being used
  • Doing a reload on a POST isn't safe (you typically do not want to do this action twice)
    • They permanently alter the server state
  • You can GET almost anything!
    • .doc
    • .tiff
    • .anything!
  • Typical file types that are requested include:
    • HTML for content
    • CSS (style sheets)
    • JavaScript for code
    • JPEG, GIF, PNG for images
  • .swf is flash
    • We don't like flash (bad security, performance hog, new browsers don't have support for plugins)
    • NOT STANDARD, AVOID
  • Plugins inject arbitrary code into browsers, but "real web content" (HTML CSS, JS, Media) is ENOUGH.
    • Modern JS has fully-fledged Socket Support
HTML Page Example

In a new .html file:


<!DOCTYPE html>	<!-- this isn't there -->
<html>
	<head> <!-- header section -->
		<title>A simple web page</title>
	</head>

	<body>
		<h1>A SIMPLE WEB PAGE</h1>
		<p>It isn't so bad.</p>
	</body>
</html>