Difference between revisions of "MapReduce, Globus, BOINC"

From Soma-notes
Jump to navigation Jump to search
(notes in class)
Line 29: Line 29:
*The type of problems that we most care about tend not to be THAT parallel
*The type of problems that we most care about tend not to be THAT parallel


*So what would a distrbuted OS be for?
*So what would a distributed OS be for?
**Shared communication!
**Shared communication!
***But we don't have much in the way that works well.
***But we don't have much in the way that works well.
*An OS typically provides a lot of services, together in one package
*An OS typically provides a lot of services, together in one package
**We have been seeing that there are no complete packages, just pieces and parts.  Why?
**We have been seeing that there are no complete packages, just pieces and parts.  Why?
***Computers are changing too fast?  Same *NIX OS, same tcp/ip stack... so more of the same, why no true solution?
***Computers are changing too fast?  Same *NIX OS, same TCP/IP stack... so more of the same, why no true solution?
***Communication is unreliable? Yes, but that is also nothing new
***Communication is unreliable? Yes, but that is also nothing new


*If people found that distributed file systems were succesful, they would be in use all the time, but they aren't.  Reason? PERFORMANCE
*If people found that distributed file systems were successful, they would be in use all the time, but they aren't.  Reason? PERFORMANCE


*Take away message?
*Take away message?
*Can't handle communication - how do you abstract access to resources when driven through a network?
*Can't handle communication - how do you abstract access to resources when driven through a network?
**As a result, we have many many specialised solutions for particular workloads.
**As a result, we have many many specialized solutions for particular workloads.
*If you are willing to not have communication between nodes, you gain a HUGE amount of computation
*If you are willing to not have communication between nodes, you gain a HUGE amount of computation.
 
*The most reliable systems are the one that forget communication.

Revision as of 15:14, 26 March 2008

Readings

Ian Foster and Carl Kesselman, "Computational Grids" (1998)

Ian Foster, "Globus Toolkit Version 4: Software for Service-Oriented Systems" (2006)

David P. Anderson, "BOINC: A System for Public-Resource Computing and Storage" (2004)

Jeffrey Dean and Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters" (2004)


Notes

BOINC

  • Premise? Local client on your machine downloads a 'workunit', churns the data, dumps the results and downloads a new 'workunit'
  • Why are we caring?
    • Entertainment?
    • How is this an OS paradigm? What is it useful for?
      • It isn't really an OS, just a method to have your mass computation done
      • More of a distributed scheduler?
        • Not even, central scheduler, but mass computation
      • How many systems have we seen that have accomplished mass computation on millions of uncontrolled computers?
        • ummm... none?
      • As an OS?
        • An OS is something that is created to run programs
        • This is a special case allowing us to run specific programs (BUT IS IT AN OS?)
      • Useful for "embarassingly parallel programs"
  • Perfect for large scale simulation?
    • But then you need LOTS of communication, and this system does not have interconnects
  • The type of problems that we most care about tend not to be THAT parallel
  • So what would a distributed OS be for?
    • Shared communication!
      • But we don't have much in the way that works well.
  • An OS typically provides a lot of services, together in one package
    • We have been seeing that there are no complete packages, just pieces and parts. Why?
      • Computers are changing too fast? Same *NIX OS, same TCP/IP stack... so more of the same, why no true solution?
      • Communication is unreliable? Yes, but that is also nothing new
  • If people found that distributed file systems were successful, they would be in use all the time, but they aren't. Reason? PERFORMANCE
  • Take away message?
  • Can't handle communication - how do you abstract access to resources when driven through a network?
    • As a result, we have many many specialized solutions for particular workloads.
  • If you are willing to not have communication between nodes, you gain a HUGE amount of computation.
  • The most reliable systems are the one that forget communication.