MapReduce, Globus, BOINC: Difference between revisions

← Older edit Newer edit →

Revision as of 19:14, 26 March 2008

Readings

Ian Foster and Carl Kesselman, "Computational Grids" (1998)

Ian Foster, "Globus Toolkit Version 4: Software for Service-Oriented Systems" (2006)

David P. Anderson, "BOINC: A System for Public-Resource Computing and Storage" (2004)

Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simpliﬁed Data Processing on Large Clusters" (2004)

Notes

BOINC

Premise? Local client on your machine downloads a 'workunit', churns the data, dumps the results and downloads a new 'workunit'
Why are we caring?
- Entertainment?
- How is this an OS paradigm? What is it useful for?
  - It isn't really an OS, just a method to have your mass computation done
  - More of a distributed scheduler?
    - Not even, central scheduler, but mass computation
  - How many systems have we seen that have accomplished mass computation on millions of uncontrolled computers?
    - ummm... none?
  - As an OS?
    - An OS is something that is created to run programs
    - This is a special case allowing us to run specific programs (BUT IS IT AN OS?)
  - Useful for "embarassingly parallel programs"
Perfect for large scale simulation?
- But then you need LOTS of communication, and this system does not have interconnects
The type of problems that we most care about tend not to be THAT parallel

So what would a distributed OS be for?
- Shared communication!
  - But we don't have much in the way that works well.
An OS typically provides a lot of services, together in one package
- We have been seeing that there are no complete packages, just pieces and parts. Why?
  - Computers are changing too fast? Same *NIX OS, same TCP/IP stack... so more of the same, why no true solution?
  - Communication is unreliable? Yes, but that is also nothing new

If people found that distributed file systems were successful, they would be in use all the time, but they aren't. Reason? PERFORMANCE

Take away message?
Can't handle communication - how do you abstract access to resources when driven through a network?
- As a result, we have many many specialized solutions for particular workloads.
If you are willing to not have communication between nodes, you gain a HUGE amount of computation.

The most reliable systems are the one that forget communication.

@@ Line 29: / Line 29: @@
 *The type of problems that we most care about tend not to be THAT parallel
-*So what would a distrbuted OS be for?
+*So what would a distributed OS be for?
 **Shared communication!
 ***But we don't have much in the way that works well.
 *An OS typically provides a lot of services, together in one package
 **We have been seeing that there are no complete packages, just pieces and parts.  Why?
-***Computers are changing too fast?  Same *NIX OS, same tcp/ip stack... so more of the same, why no true solution?
+***Computers are changing too fast?  Same *NIX OS, same TCP/IP stack... so more of the same, why no true solution?
 ***Communication is unreliable? Yes, but that is also nothing new
-*If people found that distributed file systems were succesful, they would be in use all the time, but they aren't.  Reason? PERFORMANCE
+*If people found that distributed file systems were successful, they would be in use all the time, but they aren't.  Reason? PERFORMANCE
 *Take away message?
 *Can't handle communication - how do you abstract access to resources when driven through a network?
-**As a result, we have many many specialised solutions for particular workloads.
+**As a result, we have many many specialized solutions for particular workloads.
-*If you are willing to not have communication between nodes, you gain a HUGE amount of computation
+*If you are willing to not have communication between nodes, you gain a HUGE amount of computation.
+*The most reliable systems are the one that forget communication.