DistOS 2019F 2019-11-27
Readings
SCOPE & Yarn
- Chaiken et al., "SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets" (PVLDB 2008)
- Vavilapalli et al., "Apache Hadoop YARN: Yet Another Resource Negotiator" (SoCC 2013)
- Optional: Isard et al., "Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks" (Eurosys 2007)
Discussion Questions
- Both papers start off with a criticism of MapReduce but then go off in very different directions. What are those directions?
- How close is SCOPE to SQL? Why didn't they just use SQL?
- What is Cosmos? How is it related to SCOPE? Dryad?
- How similar is Cosmos to other systems we have discussed?
- What were the requirements for YARN, and how were they achieved?
- Why do you think Microsoft didn't commercialize SCOPE or Cosmos? See this question comparing Cosmos/SCOPE to Data Lake Store and U-SQL, which apparently are built on YARN and WebHDFS.