|     |   | 
| Line 1: | Line 1: | 
|  | ==Notes== |  | ==Notes== | 
|  | 
 |  | 
|  | <pre>
 |  | 
|  | Lecture 10
 |  | 
|  | ----------
 |  | 
|  | BigTable & MapReduce
 |  | 
|  | 
 |  | 
|  | Questions
 |  | 
|  | 
 |  | 
|  | BigTable
 |  | 
|  |  - single row transactions
 |  | 
|  |  - sparse semi-structured data, what do they mean?
 |  | 
|  | 
 |  | 
|  | MapReduce
 |  | 
|  |  - what happens when the master goes down?
 |  | 
|  |  - restarting tasks seems wasteful, why do that?
 |  | 
|  |  - single master, failure unlikely how?!
 |  | 
|  | 
 |  | 
|  | What's the problem MapReduce is trying to solve?
 |  | 
|  |  - how to process *large* amounts of data quickly
 |  | 
|  |    - really, summarize data
 |  | 
|  |  - think of this as web crawl data
 |  | 
|  | 
 |  | 
|  | example problem: what's the frequency of words on the entire web (as crawled by google?)
 |  | 
|  | 
 |  | 
|  | What's the strategy?
 |  | 
|  |  - map: analyze data in small chunks in parallel,
 |  | 
|  |         i.e., run a program on each bit of data separately
 |  | 
|  | 
 |  | 
|  |         (example: have a program calculate word frequencies in
 |  | 
|  | 	 a web page)
 |  | 
|  |  - reduce: combine results to summarize
 |  | 
|  |     - must be an operation that doesn't depend on order
 |  | 
|  |       of operations
 |  | 
|  |     - get a list, produce a smaller list where every element is
 |  | 
|  |       of the same type as before
 |  | 
|  |     - example: combine word frequencies for individual pages
 |  | 
|  |       to get aggregate word frequencies
 |  | 
|  | 
 |  | 
|  | When fully reduced, you have word frequencies for the entire web.
 |  | 
|  | 
 |  | 
|  | Can every problem be expressed as a map and a reduce?
 |  | 
|  |  - no!
 |  | 
|  | 
 |  | 
|  | Can a lot of problems that people care about be expressed this way?
 |  | 
|  |  - yes!
 |  | 
|  | 
 |  | 
|  | map is embarassingly parallel
 |  | 
|  |   - so can be maximally sped up
 |  | 
|  |   - think of this as extracting the features you care about
 |  | 
|  | 
 |  | 
|  | reduce can be slow, but normally it shouldn't be that hard
 |  | 
|  |  - can't have order dependency though
 |  | 
|  |  - here we process the features, but in a non-order-dependent way
 |  | 
|  | 
 |  | 
|  | Why do pieces "start over" when a node crashes during a mapreduce operation?
 |  | 
|  |  - we don't have access to the local state that was computed
 |  | 
|  |     - local storage is way faster for this sort of thing!
 |  | 
|  |  - and even if you did, you can't trust that state
 |  | 
|  |     - because it could have been messed up for a while
 |  | 
|  | 
 |  | 
|  | What about master crashing?
 |  | 
|  |  - if it is a long running job, the app developer should
 |  | 
|  |    do their own checkpointing
 |  | 
|  |  - if it isn't, we don't care!
 |  | 
|  |  - and we can generally scale the machines used
 |  | 
|  |    (at a place like google) to make individual runs pretty quick
 |  | 
|  | 
 |  | 
|  | MapReduce is everywhere now
 |  | 
|  |  - sometimes too much, it can be used when the paradigm
 |  | 
|  |    doesn't really fit the task
 |  | 
|  |     - anything that has more data dependencies, i.e. depends on
 |  | 
|  |       order of execution
 |  | 
|  |     
 |  | 
|  | https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
 |  | 
|  |  - "dwarves" of parallel computation
 |  | 
|  |  - they talk about what can be parallized and what can't
 |  | 
|  |  - most of what we talk about in this class is embararringly parallel
 |  | 
|  |     - except for the parts that aren't
 |  | 
|  | 
 |  | 
|  | BigTable
 |  | 
|  |  - "database" that makes use of parallel storage, i.e., GFS
 |  | 
|  |  - overall very parallel, but certain parts are not
 |  | 
|  |     - what is very centralized?
 |  | 
|  |     - look at Figure 4, most data is in SSTables stored in GFS, but the root isn't, it is in Chubby
 |  | 
|  | 
 |  | 
|  | Chubby gives robust config information
 |  | 
|  |  - the state of BigTable at a high level
 |  | 
|  |  - has to be very centralized, very available
 |  | 
|  | 
 |  | 
|  | Note that Chubby *doesn't* scale well, it only scales a bit
 |  | 
|  |  - like 5 machines
 |  | 
|  |  - but it scales enough to do what it needs to
 |  | 
|  |  - it is the part of the state of the system that
 |  | 
|  |    you can NEVER allow to be inconsistent
 |  | 
|  | 
 |  | 
|  | Parallel systems are hard because they can get partitioned in various ways and state can become inconsistent
 |  | 
|  |  - and communication to maintain consistency goes up exponentially with size if done naively
 |  | 
|  | 
 |  | 
|  | Chubby is a fully connected graph amongst its nodes
 |  | 
|  |  - all messages are seen by all nodes
 |  | 
|  | 
 |  | 
|  | But efficient parallel systems can't be fully connected
 |  | 
|  |  - way, way too much communication
 |  | 
|  | 
 |  | 
|  | How does BigTable handle mutable state?
 |  | 
|  |  - can you update arbitrary entries?  not directly!
 |  | 
|  |     - data is immutable
 |  | 
|  |  - if you want to change something, you have to write it to a
 |  | 
|  |    new table and eventually consolidate to get rid of old data
 |  | 
|  |     - a bit like a log-structured filesystem
 |  | 
|  | 
 |  | 
|  | immutability is a very powerful idea in distributed systems
 |  | 
|  |  - if state can't change, you never have to communicate updates
 |  | 
|  | 
 |  | 
|  | Does this architecture have privacy implications?
 |  | 
|  |  - deletes don't immediately delete data
 |  | 
|  |    - modern solution: delete encryption key
 |  | 
|  | </pre>
 |  |