DistOS 2015W Session 9

From Soma-notes
Revision as of 01:39, 12 March 2015 by Ksherif (talk | contribs) (→‎BOINC)

BOINC

  • Public Resource Computing Platform
  • Gives scientists the ability to use large amounts of computation resources.
  • The clients do not connect directly with each other but instead they talk to a central server located at Berkley
  • The goals of Boinc are:
  • 1) reduce the barriers of entry
  • 2) Share resources among autonomous projects
  • 3) Support diverse applications
  • 4) Reward participants.
A BOINC application can be identified by a single master URL, 
which serves as the homepage as well as the directory of the servers.

SETI@Home

  • Uses public resource computing to analyze radio signals to find extraterrestrial intelligence
  • Need good quality telescope to search for radio signals, and lots of computational power, which was unavailable locally
  • It has not yet found extraterrestrial intelligence, but its has established credibility of public resource computing projects which are given by the public
  • Uses BONIC as a backbone for the project
  • Uses relational database to store information on a large scale, further it uses a multi-threaded server to distribute work to clients
  • Quality of data in this architecture is untrustworthy, the main incentive to use it, however, is that it is a cheap and easy way of scaling the work exponentially.
  • Provided social incentives to encourage users to join the system.
  • This computation model still exists but not in the legitimate world.

MapReduce

  • A programming model presented by Google to do large scale parallel computations
  • Uses the Map() and Reduce() functions from functional style programming languages
  • Map (Filtering)
  • Takes a function and applies it to all elements of the given data set
  • Reduce (Summary)
  • Accumulates results from the data set using a given function

Naiad

  • A programming model similar to MapReduce but with streaming capabilities so that data results are almost instantaneous
  • A distributed system for executing data parallel cyclic dataflow programs offering high throughput and low latency
  • Aims to provide a general purpose system which will fulfill the requirements and the will also support wide variety of high level programming models.
  • Real Time Applications:
  • Batch iterative Machine Learning:

VW, an open source distributed machine learning performs iteration in 3 phases: each process updates local state; processes independently training on local data; and process jointly performed global average which is All Reduce.

  • Streaming Acyclic Computation

When compared to a system called Kineograph ( also done by Microsoft ), which processes twitter handles and provides counts of the occurrence of hashtags as well as links between popular tags, was written using Naiad in 26 lines of code and ran close to 2X faster.

  • Naiad paper won the best paper award in SOSP 2013, check-out this link in Microsoft Research website http://research.microsoft.com/en-us/projects/naiad/ . Down in this page you can see some videos that explains naiad including Derek's Murray presentation at SOSP 2013.