WebFund 2016W Lecture 23
Video
The video from the lecture given on April 5, 2016 is now available.
Notes
In class
Lecture 23 ---------- Scalability * You replicate your web application, should be "embarassingly parallel" (no direct interaction) * Communication between servers happens through the backend database Why not have the web servers talk directly to each other? - you then have to figure out how to do synchronization/concurrency right - that's what databases are for! So how in the world do you scale up databases? First answer: use a minimal solution - only get the functionality that you want First rule of scalability - you can't do everything at scale So, you have to choose what you will do Why are sacrifices necessary? latency versus bandwidth bandwith: bits transferred per second on average latency: time to get first bit of response after request Consider a large truck full of hard disks driving across Canada. - very, very high bandwidth - very, very high latency as well! (2 weeks to get first bit of response) Ideally, you want high bandwidth and low latency - bandwidth you get through parallelism - latency has to be engineered A "supercomputer" is one with low-latency memory access, for LOTS of memory - so it has to have fast interconnects - thus, accesses to different nodes aren't much slower than local accesses Challeng for large web apps is having the database answer queries with low latency But some amount of latency is inevitable - speed of light is finite So if you want fast access to your webserver worldwide - you need to replicate across the globe - be close to your clients NoSQL databases became popular because of latency concerns - you needed to be as fast as possible, - so strip it to the bone Use an in-memory key-value store if it is sufficient - lowest latency - least functionality If you have to, use an SQL database - highest latency - most functionality Or use something in between (MongoDB) Once you choose the type of database, you OPTIMIZE - minimize I/O and computation required per access (read or write) - example: query optimization - how you form the query - how database is organized Count the number of web pages that have the word "amazing" in them How? - first, need a database with a copy of the web pages - then, you could do linear search through all of the web pages... I ask this because a web search is a massive challenge in query optimization - need to limit scope as early as possible in query - organize data so queries are quick to be answered - precompute as much as possible The best you can do is table lookup. So have the right tables ready! Key tool is making an INDEX - table of search term and pointers to data E.g., you have a table of customers sorted by ID - have an index of names, so a table of names versus IDs
Code
Note this version has node_modules removed; copy this directory from analyzeLogs-sol or run "npm install".