DistOS 2023W 2023-03-15: Difference between revisions
Created page with "==Notes== <pre> Bigtable & MapReduce -------------------- When you think about BigTable, focus on figure 1 (to understand what it is doing) and Figure 4 (to understand how). Remember that GFS requires structured information to be stored (because data can be duplicated), BigTable is one of the ways GFS files can be organized To what extent is BigTable a database? For MapReduce, think about the kind of tasks Google wanted to perform on its web crawls - generating an..." |
No edit summary |
||
Line 23: | Line 23: | ||
- what problems can be solved with MapReduce? What problems can't (efficiently)? | - what problems can be solved with MapReduce? What problems can't (efficiently)? | ||
AFTER GROUP DISCUSSIONS | |||
- BigTable is a database, but not a relational one | |||
- really, a structured key-value store | |||
Google builds infrastructure by combining systems, building on top of what they have | |||
- very little open sourced | |||
- but duplicates have been created as open source, this is much of Hadoop | |||
BigTable & GFS | |||
- note how BigTable is oriented around immutability and appends | |||
- not how standard databases are built! | |||
</pre> | </pre> |
Latest revision as of 16:56, 15 March 2023
Notes
Bigtable & MapReduce -------------------- When you think about BigTable, focus on figure 1 (to understand what it is doing) and Figure 4 (to understand how). Remember that GFS requires structured information to be stored (because data can be duplicated), BigTable is one of the ways GFS files can be organized To what extent is BigTable a database? For MapReduce, think about the kind of tasks Google wanted to perform on its web crawls - generating an index, for example - gather statistics Consider the complexity of tasks that you could do with map, and you could do with reduce, noting that map is embarassingly parallel and reduce isn't. Answer in group - To what extent is BigTable a database? - how does the design of GFS influence the implementation of BigTable - what problems can be solved with MapReduce? What problems can't (efficiently)? AFTER GROUP DISCUSSIONS - BigTable is a database, but not a relational one - really, a structured key-value store Google builds infrastructure by combining systems, building on top of what they have - very little open sourced - but duplicates have been created as open source, this is much of Hadoop BigTable & GFS - note how BigTable is oriented around immutability and appends - not how standard databases are built!