Difference between revisions of "DistOS 2014W Lecture 21"

From Soma-notes
Jump to navigation Jump to search
(Niad)
Line 72: Line 72:
* Their model is super complicated. It doesn't minimize our cognitive load.
* Their model is super complicated. It doesn't minimize our cognitive load.
* Doesn't scale at all. After about 40 nodes, there is no improvement in performance. MapReduce can scale to thousands of nodes and scales forever.
* Doesn't scale at all. After about 40 nodes, there is no improvement in performance. MapReduce can scale to thousands of nodes and scales forever.
* Nobody wants to use it because the abstraction sucks.
* Nobody wants to use it because the abstraction is complicated.

Revision as of 22:57, 19 April 2014

Presentation

Marking

  • marked mostly on presentation, not content
  • basically we want to communicate the basic structure of the paper, and do so in a way that isn't boring

Content

  • concrete, not "head in the clouds"
  • present the area
  • compare and contrast the papers
  • 10 minutes talk, 5 minutes feedback
  • basic argument
  • basic references

Form

  • show the work we've done on paper
  • try to get feedback
  • think of it as a rough draft
  • try to get people to read the paper
  • enthusiasm
  • powerpoints are easier
  • don't read slides
  • no whole sentences on slides
  • look at talks by Mark Shuttleworth

MapReduce

A clever observation that a simple solution could solve most distributed problems. It's all about programming to an abstraction that is efficiently parallelizable. Note that it's not actually a simple solution, because it sits atop a mountain of code. It requires something like BigTable which requires something like GFS, which requires something like Chubby. Despite this, it allows for programmers to easily do distributed computation using a simple framework that hides the messy details of parrallelization.

  • Restricted programming model
  • Interestingly large scale problems can be implemented with this
  • Easy to program, powerful for certain classes of problems, it scales well.
  • MapReduce job model is VERY limited though. You can't do things like simulations.
  • MapReduce is problem specific.
    • Naiad is less problem specific and allows you to do more.

Programming to an abstraction that is efficiently parllel. We have learnt all about infrastructure until now. Classic OS abstractions were about files. Now we used programming abstraction.

Example: word frequency in a document.


How does it work?

  • Two steps. Map and Reduce. The user writes theses.
    • Map takes a single input key-value pair (eg. a named document) and converts it to an intermediate (k,v) representation. A list of new key-values.
    • Reduce: Take the intermediate representation and merge the values.

Implementation

  • Uses commodity HW and GFS.
  • Master/Slave relationship amongst machines. Master delegates tasks to slaves.
  • Intermediate representation saved as files.
  • Many MapReduce jobs can happen in sequence.

Naiad

Where MapReduce was suited for a specific family of solutions, Naiad tries to generalize the solution to apply parallelization to a much wider family. Naiad supports MapReduce style solutions, but also many other solutions. However, the tradeoff was simplicity. It's like we took MapReduce and took away its low barrier to entry. The idea is to create a constrained graph that can easily be parallelized.

  • More complicated than Map Reduce
  • Talks about Timely dataflow graphs
  • Its all about Graph algorithms - Graph abstraction
  • Restrictions on graphs so that they can be mapped to parllel computation
  • How to fit anything to this model is a big question.
  • More general than map reduce.
  • After reading the MapReduce paper, you could easily write a map reduce job. After reading the Naiad, you can't. Naiad is super complicated.
  • Their model is super complicated. It doesn't minimize our cognitive load.
  • Doesn't scale at all. After about 40 nodes, there is no improvement in performance. MapReduce can scale to thousands of nodes and scales forever.
  • Nobody wants to use it because the abstraction is complicated.