DistOS 2014W Lecture 21

Presentation

Marking

marked mostly on presentation, not content
basically we want to communicate the basic structure of the paper, and do so in a way that isn't boring

Content

concrete, not "head in the clouds"
present the area
compare and contrast the papers
10 minutes talk, 5 minutes feedback
basic argument
basic references

Form

show the work we've done on paper
try to get feedback
think of it as a rough draft
try to get people to read the paper
enthusiasm
powerpoints are easier
don't read slides
no whole sentences on slides
look at talks by Mark Shuttleworth

MapReduce

MapReduce is a programming model and an associated implementation for processing and generating large data sets.A clever observation that a simple solution could solve most distributed problems. It's all about programming to an abstraction that is efficiently parallelizable. Note that it's not actually a simple solution, because it sits atop a mountain of code. It requires something like BigTable which requires something like GFS, which requires something like Chubby. Despite this, it allows for programmers to easily do distributed computation using a simple framework that hides the messy details of parrallelization.

Restricted programming model
Interestingly large scale problems can be implemented with this
Easy to program, powerful for certain classes of problems, it scales well.
MapReduce job model is VERY limited though. You can't do things like simulations.
MapReduce is problem specific.
- Naiad is less problem specific and allows you to do more.

Programming to an abstraction that is efficiently parllel. We have learnt all about infrastructure until now. Classic OS abstractions were about files. Now we used programming abstraction.

Example: word frequency in a document.

How does it work?

Two steps. Map and Reduce. The user writes theses.
- Map takes a single input key-value pair (eg. a named document) and converts it to an intermediate (k,v) representation. A list of new key-values.
- Reduce: Take the intermediate representation and merge the values.

Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication.

Execution

When the user program calls the MapReduce function, the following sequence of actions occurs.

splits the input �files into M pieces of typically
starts up many copies of the program on a cluster of machines
One of the copies of the program is special the master. The rest are workers that are assigned work by the master.
A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-de�fined Map function.
Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers.
The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user's Reduce function.
When all map tasks and reduce tasks have been completed, the master wakes up the user program.
After successful completion, the output of the mapreduce execution is available in the R output �files.

The master keeps several data structures. For each map task and reduce task, it stores the state (idle, in-progress, or completed), and the identity of the worker machine (for non-idle tasks).

Fault Tolerance

The master pings every worker periodically. If no response is received from a worker in a certain amount of time, the master marks the worker as failed. Any map tasks completed by the worker are reset back to their initial idle state, and therefore become eligible for scheduling on other workers.

In case of Master Failure, It is easy to make the master write periodic checkpoints of the master data structures described above. If the master task dies, a new copy can be started from the last checkpointed state.

Implementation

Uses commodity HW and GFS.
Master/Slave relationship amongst machines. Master delegates tasks to slaves.
Intermediate representation saved as files.
Many MapReduce jobs can happen in sequence.

Naiad

Where MapReduce was suited for a specific family of solutions, Naiad tries to generalize the solution to apply parallelization to a much wider family. Naiad supports MapReduce style solutions, but also many other solutions. However, the tradeoff was simplicity. It's like we took MapReduce and took away its low barrier to entry. The idea is to create a constrained graph that can easily be parallelized.

More complicated than Map Reduce
Talks about Timely dataflow graphs
Its all about Graph algorithms - Graph abstraction
Restrictions on graphs so that they can be mapped to parllel computation
How to fit anything to this model is a big question.
More general than map reduce.

After reading the MapReduce paper, you could easily write a map reduce job. After reading the Naiad, you can't. Naiad is super complicated.
Their model is super complicated. It doesn't minimize our cognitive load.
Doesn't scale at all. After about 40 nodes, there is no improvement in performance. MapReduce can scale to thousands of nodes and scales forever.
Nobody wants to use it because the abstraction is complicated.