DistOS 2014W Lecture 16

From Soma-notes

Public Resource Computing


Outline for upcoming lectures

All the papers that would be covered in upcoming lectures have been posted on Wiki. These papers will be more difficult in comparison to the papers we have covered so far, so we should be prepared to allot more time for studying these papers and come prepared in class. We may abandon the way of discussing the papers in group and instead everyone would ask the questions about what,they did not understand from paper so it would allow us to discuss the technical detail better. Professor will not be taking the next class, instead our TA would discuss the two papers on how to conduct a literature survey, which should help with our projects. The rest of the papers will deal with many closely related systems. In particular, we will be looking at distributed hash tables and systems that use distributed hash tables.


Project proposal

There were 11 proposals and out of which professor found 4 to be in the state of getting accepted and has graded them 10/10. professor has mailed to everyone with the feedback about the project proposal so that we can incorporate those comments and submit the project proposals by coming Saturday ( the extended deadline). the deadline has been extended so that every one can work out the flaws in their proposal and get the best grades (10/10). Project Presentation are to be held on 1st and 3rd april. People who got 10/10 should be ready to present on Tuesday as they are ahead and better prepared for it, there should be 6 presentation on Tuesday and rest on Thursday. Under-grad will have their final exam on 24th April. 24th April is also the date to turn-in the final project report.

Public Resource Computing

The paper assigned for readings were on SETI and BOINC. BOINC is the system SETI is built upon, there are other projects running on the same system like Folding@home etc. In particular, we want to discuss the following: What is public resource computing? How does public resource computing relate to the various computational models and systems that we have seen this semester? How are they similar in design, purpose, and technologies? How is it different?

The main purpose of public resource computing was to have a universally accessible, easy-to-use, way of sharing resources. This is interesting as it differs from some of the systems we have looked at that deal with the sharing of information rather than resources.

For computational parallelism, you need a highly parallel problem. SETI@home and folding@home give examples of such problems. In public resource computing, particularly with the BOINC system, you divide the problem into work units. People voluntarily install the clients on their machines, running the program to work on work units that are sent to their clients in return for credits. In the past, it has been institutes, such as universities, running services with other people connecting in to use said service. Public resource computing turns this use case on its head, having the institute (e.g., the university) being the one using the service while other people are contributing to said service voluntarily. In the files systems we have covered so far, people would want access to the files stored in a network system, here a system wants to access people's machines to utilize the processing power of their machine. Since they are contributing voluntarily, how do you make these users care about the system if something were to happen? The gamification of the system causes many users to become invested in the system. People are doing work for credits and those with the most credits are showcased as major contributors. They can also see the amount of resources (e.g., process cycles) they have devoted to the cause on the GUI of the installed client. When the client produces results for the work unit it was processing, it sends the result to the server.

For fault tolerance, such as malicious clients are faulty processors, redundant computing is done. Work units are processed multiple times. Work units are later taken off of the clients as dictated by the following two cases:

  1. They receive the number of results, n, for a certain work unit, they take the answer that the majority gave.
  2. They have transmitted a work unit m times and have not gotten back the n expected responses.

It should be noted that, in doing this, it is possible that some work units are never processed. The probability of this happening can be reduced by increasing the value of m, though.

General Discussion

So, given all this, how would we generally define public resource computing/public interest computing? It is essentially using the public as a resource--you are voluntarily giving up your extra compute cycles for projects (this is a little like donating blood--public resource computing is a vampire). Looking at public resource computing like this, we can contrast it with a botnet. What is the difference? Both system are utilizing client machines to perform/aid in some task. The answer: consent. You are consensually contributing to a project rather than being (unknowingly) forced to. Other differences are the ends/resources that you want as well as reliability. With a botnet, you can trust that a higher proportion of your users are following your commands exactly (as they have no idea they are performing them). Whereas, in public resource computing, how can you guarantee that clients are doing what you want?

Comparisons

Basic Comparison with other File systems , we have covered so far -

1) Use-Cases have been turned on their head. In the files systems we have covered so far, People would want access to the files stored in a network system, here a system wants to access people's machines to utilize the processing power of their machine. 2) In other file systems it was about many clients sharing the data, here it is more about sharing the processing power. In Folding@home, the system can store some of its data at client's storage but that is not the public resource computing's main focus. 3) It is nothing like systems like OceanStore where there is no centralized authority, in BOINC the master/slave relation between the centralized server and the clients installed across users' machine can still be visualized and it is more like GFS in that sense because GFS also had a centralized metadata server. 4) Public resource systems are like BOTNETs but people install these clients with consent and there is no need for communication between the clients ( it is not peer to peer network). It could be made to communicate at peer to peer level but it would risk security as clients are not trusted in the network

Trust Model and Fault Tolerance

Reliability - How does SETI address the questions of fault tolerance ? They use replication for reliability, work units are assigned to multiple clients and the results that are returned to server can be analyzed to find the outliers in order to detect the malicious users but that addresses the situations of fault tolerance from client perspective. SETI has a centralized server, which can go down and when it does, it uses exponential back off mechanism to push back the clients and ask them to wait before sending the result again but whenever a server comes back up many clients may try to access the server at once and may crash the server once again, this may cause the ddos manufactured by the server's own inadequacies.The Exponential backup approach is similar to the one adopted in resolving the TCP congestion. public resource computing don't need to be very highly reliable because it is used by scientists/ researchers who can bring the system back up, in case it goes down and start again. There are however few measures discussed within SETI like read-only data back-up etc . Compare this to highly reliable systems like ceph or oceanstore , which could recover the data in case of node crashes.

Skype was modelled much like a public resource computing network (before Microsoft took over) as the network would choose super nodes to act as routers. These super nodes would be the machines with higher reliability and better processing powers. After MS' takeover the supernodes have been centralized and the election of supernodes functionality has been removed from the system.

SETI is an example of "embarrassingly parallel" workload where the problem has inherent nature to lend itself to be divided into work-units and be computed in-parallel without any need to consolidate the results, it is called "embarrassingly parallel" because there is little to no efforts required to distribute the work load in parallel and you don't get much praise for doing it. one more example of "embarrassingly parallel" from the file systems that we have covered so far could be web-indexing in GFS. Any file system that we have discussed so far, which doesnt trust the clients can be modeled to work as public sharing system.

Note: Public resource computing is also very similar to mapreduce, which we will be discussing later in the course. Make sure to keep public resource computing in mind when we reach this.