WebFund 2016W Lecture 21

From Soma-notes

Video

The video for the lecture given on March 29, 2016 is now available.

Notes

In Class

Lecture 21
----------

* interactive-graphs-sol will be posted after lecture
* study session on April 11th, time TBD (final is on April 12th) but will be in the morning.
  (Earliest is 10 AM).

What did I leave out?

First, what did we cover?
 - JavaScript
 - Some HTML, CSS
 - clients versus servers
 - HTTP
 - synchronous versus async code
   - callbacks
 - Node, Jade, Express
 - browser developer tools
 - closures, lexical scoping
 - a little networking
 - databases
   - noSQL, MongoDB
   - queries
 - a little crypto (SSL, certificates, password hashing)
 - a little security
   (cross-site scripting, input validation
 - client side versus server side JavaScript
   - jQuery
 - DOM basics
 - basic AJAX


Left out...
 - much more to HTML, CSS
 - much more to the DOM
 - more to AJAX, client-server communication
   (e.g., WebSockets)
 - SQL
   - transactions
   - indexing *
 - PHP
   - language + HTML templates
   - instead we learned Node + Jade
 - Meteor, AngularJS, ...
   client-side JS libraries and frameworks
 - other server-side languages & frameworks
   - ASP.NET, .NET, Java-based frameworks,
     Ruby on Rails
   - every programming language has multiple
     web frameworks
 
So, is this how the "big players" do the web?
 - facebook, google, etc.

They have
 - more servers
 - challenge is how to *use* them

HTTP is inherently stateless.  How do you scale stateless?
 - replicate, load balance
 - to store state, make state store bigger
   - need bigger database!

Big websites have some mix of:
 - scalable databases
 - replicate application servers
 - caching for static content

E.g., ecommerce
 - while shopping, users get static content
    - easy to scale
 - but purchases have to hit the database
   - and have to generate writes

To scale you have to engineer the system
 - but you don't need to change your basic application


Security?
 - input validation/sanitization
   - don't let the user talk to the browser, server,
     or database directly - no code injection
 - use reasonable crypto
   - communications security (SSL/TLS)
   - authentication (passwords, etc)
 - security policy
   - what is allowed, what is not
   - how to enforce

Networking?
 - latency versus bandwidth*
 - IPv6
 - firewalls
 - NAT


Cloud computing
 - remote servers where there are LOTS of remote servers
 - challenge is in coordinating applications across
   all of those servers
 - basic tools is virtualization
    - easier to manage a virtual server than a real one
    - Give me 10,000 servers for 2 hours
 - cloud servers run web apps, almost entirely


Amazon is the real "parent" of cloud computing
 - they got into the cloud because of Christmas
 - they had servers that were doing nothing 11 months
   out of the year
   - so why not rent them out
 - they provide more and more infrastructure for
   building large web applications, based on their
   internal tech


Networking

The Internet is a network of networks
  SCS network is connected to the Carleton backbone
  which is connected to the Unicentre's network.  The
  backbone also connects to Carleton's ISPs (ISPs)

Firewalls on the Internet block all kinds of network traffic.
Mainly for 'security reasons'

Now, arbitrary network traffic gets routed through HTTP.
Because it can reach all of the Internet
 - other protocols get squeezed into HTTP

WebRTC

Learn networking but realize modern networks just ship HTTP

* Databases, indexes
* Security
  - web security issues
* The future

Posting Assignment 5 solutions now

Student Notes

Announcements

  • There might be an exam review on April 7th. There will also be a review session on April 11th (time TBD, probably AM). Final is on April 12th (9:00 AM).
  • Solutions for Assignment 5 (Interactive Graphs) have been posted on the course website.

What We Covered in the Course

  • First, what did we cover in the course so far?
    • JavaScript
    • Some HTML, CSS
    • Clients versus servers
    • HTTP
    • Synchronous versus asynchronous code (callbacks)
    • Node, Jade, Express
    • Browser developer tools
    • Closures, lexical scoping
    • A little bit of networking
    • Databases (noSQL, MongoDB)
      • Queries
    • A little bit of cryptography (SSL, certificates, password hashing)
    • A little security (cross-site scripting, input validation)
    • Client-side versus server-side JavaScript (jQuery)
    • DOM basics
    • Basic AJAX
  • This is a good amount of material, especially given that COMP 2406 is an introductory class. However, it is only a small portion of the technologies upon which the modern web is based.

What did we leave out?

  • There's much more to HTML and CSS
    • The stuff we left out wasn't essential to an introductory class like 2406, and can be learned easily through online resources like MDN or Code Academy. You might have to learn them on your own if you intend to work on an actual web development project. Thanks to technologies like Express, however, you don't really have to know a lot of HTML and CSS to get a basic website up and running.
  • SQL
    • The way databases are accessed
    • Transactions
    • Indexing (how to make database access faster)
  • PHP: language + HTML templates
    • There are a lot of serious web applications, both proprietary and open source, that use PHP. If you understand Java and JavaScript well, PHP isn't very hard to learn. It's basically a language combined with HTML templates.
    • Instead, we learned Node + Jade to generate HTML
  • Meteor, AngularJS, and other client-side JS libraries and frameworks
    • There are a lot of JS libraries: low-level as well as high-level
    • There are JS libraries to do almost everything under the sun
  • Other server-side languages and frameworks
    • E.g. ASP.NET, .NET, Java-based frameworks, Ruby on Rails
    • There's even a COBOL-based server-side language
    • Learning each of these frameworks takes time and energy, but it might be worthwhile if you end up working with them.
    • Many fundamental concepts will be the same across the different languages
  • There's much more to the DOM
    • The DOM is a very complicated data structure; there's much more to learn than what we covered in class.
  • There's more to AJAX
    • AJAX provides a way for the client and server to talk in the background

Scaling

  • So, is this how the "big players" do web development?
    • They have more servers
    • The challenge is how to use the powerful servers
    • Scalability becomes a key issue
  • The big players have to be able to serve a massive number of people at the same time. Not only do they have to have more servers, they also need to utilize the additional hardware efficiently. The tough part is figuring out how to make use of the extra computational resources.
  • HTTP is inherently stateless. How do you scale stateless machines?
    • Replicate, load-balance
      • These things can be done automatically for you if you use the cloud. For example: AWS.
  • To store state, you'd need a bigger database
    • The database has to be able to scale to whatever load you put on it
    • Companies like Oracle sell big, scalable databases
  • Big websites have some mix of:
    • Scalable databases
      • Databases must be designed for concurrency
    • Replication of application on servers (many instances running)
      • Allows you to serve large number of clients concurrently
    • Caching for static content
      • Preemptive loading of some content to reduce dynamic database access
    • The state of your app is not stored on server, it is stored in the database
    • Good databases are designed for concurrency
    • Good databases are designed to optimize write performance and store data reliably.
    • However, as a web designer, your job is not to reinvent the wheel. Pick the proper DB option for your particular app.
    • Your app should try to minimize DB accesses in order to increase performance
  • E.g.: e-commerce
    • While shopping, you're getting static content. That's because the static parts are easy to scale.
    • However, purchases have to hit the database and have to invoke writes. Writes are much more expensive than reads.
  • To scale up, you don't need to change your basic application. But you'll have to engineer the system so that it can handle a large amount of simultaneous database queries and writes.

Security

  • Input validation/sanitization
    • Don't let users talk to the browser, server, or database directly (no code injection)
  • Use good cryptography
    • Communications security (SSL/TLS)
    • Authentication (passwords, etc.)
  • Security policy (what is allowed, what isn't allowed)
  • Ways of enforcing the security policy
    • Policy specification and enforcement are both difficult

Cloud Computing

  • Remote servers that handle a lot of data and processing for clients
  • The challenge is to coordinate applications across all the servers
  • Basic tool is virtualization
    • Easier to manage a virtual server than a real server
  • It's easy to forget that cloud servers run big web apps, almost entirely

Amazon

  • Amazon is the real "parent" of cloud computing
  • They got into the cloud because of Christmas
    • They had servers that were doing nothing 11 months out of the year
    • Why not rent them out?
    • That's how Amazon got into the cloud computing business!
    • They provide more and more infrastructure for building large web apps, based on their internal technologies.
    • They sell a scalable cloud computing service called AWS Elastic Beanstalk
      • "elastic" refers to the scalability aspect of the service
      • Handles load-balancing and instance replication automatically

Networking

  • Latency versus badwidth
    • Latency: how many hops to destination?
    • Bandwidth: max total download/upload speed
  • IPv6
  • Firewalls
  • NAT
  • The internet is a "network of networks". For instance, the SCS network is connected to the Carleton backbone, which is connected to the Unicentre's network. The backbone is also connected to Carleton's ISPs.
  • Firewalls on the internet block all kinds of network traffic, mainly due to "security concerns". However, firewalls always let web traffic through.
  • Now, arbitrary network traffic gets routed through HTTP. This is mainly done for security. This means that other protocols can be squeezed into HTTP. Learn networking, but realize that modern networks just ship all kind of data within HTTP.

Topics for Next Lecture

  • Databases, indexes
  • Security (web security issues)