Notes
Bigtable & MapReduce
--------------------
When you think about BigTable, focus on figure 1 (to understand what it is doing) and Figure 4 (to understand how).
Remember that GFS requires structured information to be stored (because data can be duplicated), BigTable is one of the ways GFS files can be organized
To what extent is BigTable a database?
For MapReduce, think about the kind of tasks Google wanted to perform on its web crawls
- generating an index, for example
- gather statistics
Consider the complexity of tasks that you could do with map, and you could do with reduce, noting that map is embarassingly parallel and reduce isn't.
Answer in group
- To what extent is BigTable a database?
- how does the design of GFS influence the implementation of BigTable
- what problems can be solved with MapReduce? What problems can't (efficiently)?