DistOS 2014W Lecture 21: Difference between revisions

From Soma-notes
Cdelahou (talk | contribs)
MapReduce stuff
Cdelahou (talk | contribs)
More on M/R
Line 34: Line 34:
* Restricted programming model
* Restricted programming model
* Interestingly large scale problems can be implemented with this
* Interestingly large scale problems can be implemented with this
* Easy to program, powerful for certain classes, it scales like no ones business.  
* Easy to program, powerful for certain classes of problems, it scales well.
* Kind of empoweraged model
* MapReduce job model is VERY limited though. You can't do things like simulations.
* MapReduce is problem specific.  
** Naiad is less problem specific and allows you to do more.
 
Programming to an abstraction that is efficiently parllel. We have learnt all about infrastructure until now.  
Programming to an abstraction that is efficiently parllel. We have learnt all about infrastructure until now.  
Classic OS abstractions were about files. Now we used programming abstraction.
Classic OS abstractions were about files. Now we used programming abstraction.


Example: word frequency in a document.
Example: word frequency in a document.


=== How does it work? ===
=== How does it work? ===

Revision as of 02:43, 20 April 2014

Presentation

Marking

  • marked mostly on presentation, not content
  • basically we want to communicate the basic structure of the paper, and do so in a way that isn't boring

Content

  • concrete, not "head in the clouds"
  • present the area
  • compare and contrast the papers
  • 10 minutes talk, 5 minutes feedback
  • basic argument
  • basic references

Form

  • show the work we've done on paper
  • try to get feedback
  • think of it as a rough draft
  • try to get people to read the paper
  • enthusiasm
  • powerpoints are easier
  • don't read slides
  • no whole sentences on slides
  • look at talks by Mark Shuttleworth

MapReduce

A clever observation that a simple solution could solve most distributed problems. It's all about programming to an abstraction that is efficiently parallelizable. Note that it's not actually a simple solution, because it sits atop a mountain of code. It requires something like BigTable which requires something like GFS, which requires something like Chubby. Despite this, it allows for programmers to easily do distributed computation using a simple framework that hides the messy details of parrallelization.

  • Restricted programming model
  • Interestingly large scale problems can be implemented with this
  • Easy to program, powerful for certain classes of problems, it scales well.
  • MapReduce job model is VERY limited though. You can't do things like simulations.
  • MapReduce is problem specific.
    • Naiad is less problem specific and allows you to do more.

Programming to an abstraction that is efficiently parllel. We have learnt all about infrastructure until now. Classic OS abstractions were about files. Now we used programming abstraction.

Example: word frequency in a document.


How does it work?

  • Two steps. Map and Reduce. The user writes theses.
    • Map takes a single input key-value pair (eg. a named document) and converts it to an intermediate (k,v) representation. A list of new key-values.
    • Reduce: Take the intermediate representation and merge the values.

Implementation

  • Uses commodity HW and GFS.
  • Master/Slave relationship amongst machines. Master delegates tasks to slaves.
  • Intermediate representation saved as files.
  • Many MapReduce jobs can happen in sequence.

Naiad

Where MapReduce was suited for a specific family of solutions, Naiad tries to generalize the solution to apply parallelization to a much wider family. Naiad supports MapReduce style solutions, but also many other solutions. However, the tradeoff was simplicity. It's like we took MapReduce and took away its low barrier to entry. The idea is to create a constrained graph that can easily be parallelized.

  • Complicated than Map Reduce
  • Talks about Timely dataflow graphs
  • Its all about Graph algorithms - Graph abstraction
  • Restrictions on graphs so that they can be mapped to parllel computation
  • How to fit anything to this model is a big question.
  • More general than map reduce
  • Not very useful.