DistOS 2023W 2023-03-15

From Soma-notes

Notes

Bigtable & MapReduce
--------------------

When you think about BigTable, focus on figure 1 (to understand what it is doing) and Figure 4 (to understand how).

Remember that GFS requires structured information to be stored (because data can be duplicated), BigTable is one of the ways GFS files can be organized

To what extent is BigTable a database?


For MapReduce, think about the kind of tasks Google wanted to perform on its web crawls
 - generating an index, for example
 - gather statistics

Consider the complexity of tasks that you could do with map, and you could do with reduce, noting that map is embarassingly parallel and reduce isn't.

Answer in group
 - To what extent is BigTable a database?
 - how does the design of GFS influence the implementation of BigTable
 - what problems can be solved with MapReduce?  What problems can't (efficiently)?

AFTER GROUP DISCUSSIONS
 - BigTable is a database, but not a relational one
 - really, a structured key-value store

Google builds infrastructure by combining systems, building on top of what they have
 - very little open sourced
 - but duplicates have been created as open source, this is much of Hadoop

BigTable & GFS
 - note how BigTable is oriented around immutability and appends
   - not how standard databases are built!