WebFund 2016W Lecture 23
Video
The video from the lecture given on April 5, 2016 is now available.
Notes
In class
Lecture 23
----------
Scalability
* You replicate your web application, should be
  "embarassingly parallel" (no direct interaction)
* Communication between servers happens through the
  backend database
Why not have the web servers talk directly to each other?
 - you then have to figure out how to do
   synchronization/concurrency right
 - that's what databases are for!
So how in the world do you scale up databases?
First answer: use a minimal solution
 - only get the functionality that you want
First rule of scalability
 - you can't do everything at scale
So, you have to choose what you will do
Why are sacrifices necessary?
latency versus bandwidth
bandwith: bits transferred per second on average
latency: time to get first bit of response after request
Consider a large truck full of hard disks driving
across Canada.
  - very, very high bandwidth
  - very, very high latency as well!
    (2 weeks to get first bit of response)
Ideally, you want high bandwidth and low latency
 - bandwidth you get through parallelism
 - latency has to be engineered
A "supercomputer" is one with low-latency memory access,
for LOTS of memory
  - so it has to have fast interconnects
  - thus, accesses to different nodes aren't much
    slower than local accesses
Challeng for large web apps is having the database
answer queries with low latency
But some amount of latency is inevitable
 - speed of light is finite
So if you want fast access to your webserver worldwide
 - you need to replicate across the globe
 - be close to your clients
NoSQL databases became popular because of latency
concerns
 - you needed to be as fast as possible,
 - so strip it to the bone
Use an in-memory key-value store if it is sufficient
  - lowest latency
  - least functionality
If you have to, use an SQL database
  - highest latency
  - most functionality
Or use something in between (MongoDB)
Once you choose the type of database, you OPTIMIZE
 - minimize I/O and computation required per access
   (read or write)
 - example: query optimization
 - how you form the query
   - how database is organized
Count the number of web pages that have the word
 "amazing" in them
How?
 - first, need a database with a copy of the web pages
 - then, you could do linear search through all
   of the web pages...
I ask this because a web search is a massive challenge
in query optimization
 - need to limit scope as early as possible in query
 - organize data so queries are quick to be answered
    - precompute as much as possible
The best you can do is table lookup. So have the right
tables ready!
Key tool is making an INDEX
 - table of search term and pointers to data
E.g., you have a table of customers sorted by ID
 - have an index of names, so a table of names versus
   IDs
Code
Note this version has node_modules removed; copy this directory from analyzeLogs-sol or run "npm install".