Difference between revisions of "DistOS 2014W Lecture 16"

From Soma-notes
Jump to navigation Jump to search
Line 4: Line 4:
== Outline for upcoming lectures ==
== Outline for upcoming lectures ==


All the papers that would be covered in upcoming lectures have been posted on Wiki. These paper will be more difficult in comparison to the papers we have covered so far, so we should be prepared to allot more time for studying these papers and come prepared in class. We may abandon the way of discussing the papers in group and instead everyone would ask the questions about what,they did not understand from paper so it would allow us to discuss the technical detail better.
All the papers that would be covered in upcoming lectures have been posted on Wiki. These papers will be more difficult in comparison to the papers we have covered so far, so we should be prepared to allot more time for studying these papers and come prepared in class. We may abandon the way of discussing the papers in group and instead everyone would ask the questions about what,they did not understand from paper so it would allow us to discuss the technical detail better.
Professor will not be taking the next class, instead our TA would discuss the two papers on how to conduct a literature survey, which should help with our projects.  
Professor will not be taking the next class, instead our TA would discuss the two papers on how to conduct a literature survey, which should help with our projects.  
The rest of the papers will deal with many closely related systems. In particular, we will be looking at distributed hash tables and systems that use distributed hash tables.




Line 15: Line 16:
== Public Resource Computing  ==
== Public Resource Computing  ==


The paper assigned for readings were on SETI and BOINC. BOINC is the system SETI is built upon, there are other projects running on the same system like Folding@home etc.  
The paper assigned for readings were on SETI and BOINC. BOINC is the system SETI is built upon, there are other projects running on the same system like Folding@home etc. In particular, we want to discuss the following:
What is public resource computing and can we compare it to the file systems we have studied so far in this course - 
What is public resource computing? How does public resource computing relate to the various computational models and systems that we have seen this semester? How are they similar in design, purpose, and technologies? How is it different?
   
   
In public resource system, you divide the problem in to work units and people voluntarily install the clients on their machines to have the program run to work upon the work unit assigned to their client in return for credits, people with most credits get to be showcased as major contributors.People can see the amount of resources (process cycles etc) they have devoted for the cause on the GUI of the client installed.
The main purpose of public resource computing was to have a universally accessible, easy-to-use, way of sharing resources. This is interesting as it differs from some of the systems we have looked at that deal with the sharing of information rather than resources.
When client produces results for the work unit it was working upon, it sends the result to the server.
 
For computational parallelism, you need a highly parallel problem. SETI@home and folding@home give examples of such problems. In public resource computing, particularly with the BOINC system, you divide the problem into work units. People voluntarily install the clients on their machines, running the program to work on work units that are sent to their clients in return for credits. In the past, it has been institutes, such as universities, running services with other people connecting in to use said service. Public resource computing turns this use case on its head, having the institute (e.g., the university) being the one using the service while other people are contributing to said service voluntarily. In the files systems we have covered so far, people would want access to the files stored in a network system, here a system wants to access people's machines to utilize the processing power of their machine. Since they are contributing voluntarily, how do you make these users care about the system if something were to happen? The gamification of the system causes many users to become invested in the system. People are doing work for credits and those with the most credits are showcased as major contributors. They can also see the amount of resources (e.g., process cycles) they have devoted to the cause on the GUI of the installed client. When the client produces results for the work unit it was processing, it sends the result to the server.
For fault tolerance, such as malicious clients are faulty processors, redundant computing is done. Work units are processed multiple times.
Work units are later taken off of the clients as dictated by the following two cases:
# They receive the number of results, '''n''', for a certain work unit, they take the answer that the majority gave.
# They have transmitted a work unit '''m''' times and have not gotten back the n expected responses.
It should be noted that, in doing this, it is possible that some work units are never processed. The probability of this happening can be reduced by increasing the value of '''m''', though.
 
So, given all this, how would we generally define public resource computing/public interest computing? It is essentially using the public as a resource--you are voluntarily giving up your extra compute cycles for projects (this is a little like donating blood--public resource computing is a vampire). Looking at public resource computing like this, we can contrast it with a botnet. What is the difference? Both system are utilizing client machines to perform/aid in some task. The answer: consent. You are consensually contributing to a project rather than being (unknowingly) forced to. Other differences are the ends/resources that you want as well as reliability. With a botnet, you can trust that a higher proportion of your users are following your commands exactly (as they have no idea they are performing them). Whereas, in public resource computing, how can you guarantee that clients are doing what you want?
    
    
Comparison with other File systems , we have covered so far -
Basic Comparison with other File systems , we have covered so far -


1) Use-Cases have been turned on their head. In the files systems we have covered so far, People would want access to the files stored in a network system, here a system wants to access people'machine to utilize the processing power of their machine.
1) Use-Cases have been turned on their head. In the files systems we have covered so far, People would want access to the files stored in a network system, here a system wants to access people's machines to utilize the processing power of their machine.
2) In other file systems it was about many clients sharing the data, here it is more about sharing the processing power. In Folding@home, system can store some of its data at client's storage but that is not the public resource computing's main focus.
2) In other file systems it was about many clients sharing the data, here it is more about sharing the processing power. In Folding@home, the system can store some of its data at client's storage but that is not the public resource computing's main focus.
3) It is nothing like systems like OceanStore where there is no centralized authority, in BOINC the master/slave relation between the centralized server and the clients installed across users' machine can still be visulaized and it is more like GFS in that sense because GFS also had a centralized metadata server.
3) It is nothing like systems like OceanStore where there is no centralized authority, in BOINC the master/slave relation between the centralized server and the clients installed across users' machine can still be visualized and it is more like GFS in that sense because GFS also had a centralized metadata server.
4) Public resource systems are like BOTNETs but people install these clients with consent and there is no need for communication between the clients ( it is not peer to peer network). It could be made to communicate at peer to peer level but it would risk security as clients are not trusted in the network
4) Public resource systems are like BOTNETs but people install these clients with consent and there is no need for communication between the clients ( it is not peer to peer network). It could be made to communicate at peer to peer level but it would risk security as clients are not trusted in the network


Line 31: Line 40:
public resource computing don't need to be very highly reliable because it is used by scientists/ researchers who can bring the system back up, in case it goes down and start again. There are however few measures discussed within SETI like read-only data back-up etc . Compare this to highly reliable systems like ceph or oceanstore , which could recover the data in case of node crashes.  
public resource computing don't need to be very highly reliable because it is used by scientists/ researchers who can bring the system back up, in case it goes down and start again. There are however few measures discussed within SETI like read-only data back-up etc . Compare this to highly reliable systems like ceph or oceanstore , which could recover the data in case of node crashes.  
      
      
Skype was modeled like public resource computing network( before Microsoft took over), network would choose super nodes to act as routers, these super nodes would be the machines with higher reliability and better processing powers. After MS' takeover the supernodes have been centralized and the election of supernode part has been removed from the system.
Skype was modelled much like a public resource computing network (before Microsoft took over) as the network would choose super nodes to act as routers. These super nodes would be the machines with higher reliability and better processing powers. After MS' takeover the supernodes have been centralized and the election of supernodes functionality has been removed from the system.
 
SETI is an example of "embarrassingly parallel" workload where the problem has inherent nature to lend itself to be divided into work-units and be computed in-parallel without any need to consolidate the results, it is called "embarrassingly parallel" because there is little to no efforts required to distribute the work load in parallel and you don't get much praise for doing it. one more example of "embarrassingly parallel" from the file systems that we have covered so far could be web-indexing in GFS. Any file system that we have discussed so far, which doesnt trust the clients can be modeled to work as public sharing system.
SETI is an example of "embarrassingly parallel" workload where the problem has inherent nature to lend itself to be divided into work-units and be computed in-parallel without any need to consolidate the results, it is called "embarrassingly parallel" because there is little to no efforts required to distribute the work load in parallel and you don't get much praise for doing it. one more example of "embarrassingly parallel" from the file systems that we have covered so far could be web-indexing in GFS. Any file system that we have discussed so far, which doesnt trust the clients can be modeled to work as public sharing system.
Note: Public resource computing is also very similar to mapreduce, which we will be discussing later in the course. Make sure to keep public resource computing in mind when we reach this.

Revision as of 15:34, 12 March 2014

Public Resource Computing


Outline for upcoming lectures

All the papers that would be covered in upcoming lectures have been posted on Wiki. These papers will be more difficult in comparison to the papers we have covered so far, so we should be prepared to allot more time for studying these papers and come prepared in class. We may abandon the way of discussing the papers in group and instead everyone would ask the questions about what,they did not understand from paper so it would allow us to discuss the technical detail better. Professor will not be taking the next class, instead our TA would discuss the two papers on how to conduct a literature survey, which should help with our projects. The rest of the papers will deal with many closely related systems. In particular, we will be looking at distributed hash tables and systems that use distributed hash tables.


Project proposal

There were 11 proposals and out of which professor found 4 to be in the state of getting accepted and has graded them 10/10. professor has mailed to everyone with the feedback about the project proposal so that we can incorporate those comments and submit the project proposals by coming Saturday ( the extended deadline). the deadline has been extended so that every one can work out the flaws in their proposal and get the best grades (10/10). Project Presentation are to be held on 1st and 3rd april. People who got 10/10 should be ready to present on Tuesday as they are ahead and better prepared for it, there should be 6 presentation on Tuesday and rest on Thursday. Under-grad will have their final exam on 24th April. 24th April is also the date to turn-in the final project report.

Public Resource Computing

The paper assigned for readings were on SETI and BOINC. BOINC is the system SETI is built upon, there are other projects running on the same system like Folding@home etc. In particular, we want to discuss the following: What is public resource computing? How does public resource computing relate to the various computational models and systems that we have seen this semester? How are they similar in design, purpose, and technologies? How is it different?

The main purpose of public resource computing was to have a universally accessible, easy-to-use, way of sharing resources. This is interesting as it differs from some of the systems we have looked at that deal with the sharing of information rather than resources.

For computational parallelism, you need a highly parallel problem. SETI@home and folding@home give examples of such problems. In public resource computing, particularly with the BOINC system, you divide the problem into work units. People voluntarily install the clients on their machines, running the program to work on work units that are sent to their clients in return for credits. In the past, it has been institutes, such as universities, running services with other people connecting in to use said service. Public resource computing turns this use case on its head, having the institute (e.g., the university) being the one using the service while other people are contributing to said service voluntarily. In the files systems we have covered so far, people would want access to the files stored in a network system, here a system wants to access people's machines to utilize the processing power of their machine. Since they are contributing voluntarily, how do you make these users care about the system if something were to happen? The gamification of the system causes many users to become invested in the system. People are doing work for credits and those with the most credits are showcased as major contributors. They can also see the amount of resources (e.g., process cycles) they have devoted to the cause on the GUI of the installed client. When the client produces results for the work unit it was processing, it sends the result to the server. For fault tolerance, such as malicious clients are faulty processors, redundant computing is done. Work units are processed multiple times. Work units are later taken off of the clients as dictated by the following two cases:

  1. They receive the number of results, n, for a certain work unit, they take the answer that the majority gave.
  2. They have transmitted a work unit m times and have not gotten back the n expected responses.

It should be noted that, in doing this, it is possible that some work units are never processed. The probability of this happening can be reduced by increasing the value of m, though.

So, given all this, how would we generally define public resource computing/public interest computing? It is essentially using the public as a resource--you are voluntarily giving up your extra compute cycles for projects (this is a little like donating blood--public resource computing is a vampire). Looking at public resource computing like this, we can contrast it with a botnet. What is the difference? Both system are utilizing client machines to perform/aid in some task. The answer: consent. You are consensually contributing to a project rather than being (unknowingly) forced to. Other differences are the ends/resources that you want as well as reliability. With a botnet, you can trust that a higher proportion of your users are following your commands exactly (as they have no idea they are performing them). Whereas, in public resource computing, how can you guarantee that clients are doing what you want?

Basic Comparison with other File systems , we have covered so far -

1) Use-Cases have been turned on their head. In the files systems we have covered so far, People would want access to the files stored in a network system, here a system wants to access people's machines to utilize the processing power of their machine. 2) In other file systems it was about many clients sharing the data, here it is more about sharing the processing power. In Folding@home, the system can store some of its data at client's storage but that is not the public resource computing's main focus. 3) It is nothing like systems like OceanStore where there is no centralized authority, in BOINC the master/slave relation between the centralized server and the clients installed across users' machine can still be visualized and it is more like GFS in that sense because GFS also had a centralized metadata server. 4) Public resource systems are like BOTNETs but people install these clients with consent and there is no need for communication between the clients ( it is not peer to peer network). It could be made to communicate at peer to peer level but it would risk security as clients are not trusted in the network

Reliability - How does SETI address the questions of fault tolerance ? They use replication for reliability, work units are assigned to multiple clients and the results that are returned to server can be analyzed to find the outliers in order to detect the malicious users but that addresses the situations of fault tolerance from client perspective. SETI has a centralized server, which can go down and when it does, it uses exponential back off mechanism to push back the clients and ask them to wait before sending the result again but whenever a server comes back up many clients may try to access the server at once and may crash the server once again, this may cause the ddos manufactured by the server's own inadequacies.The Exponential backup approach is similar to the one adopted in resolving the TCP congestion. public resource computing don't need to be very highly reliable because it is used by scientists/ researchers who can bring the system back up, in case it goes down and start again. There are however few measures discussed within SETI like read-only data back-up etc . Compare this to highly reliable systems like ceph or oceanstore , which could recover the data in case of node crashes.

Skype was modelled much like a public resource computing network (before Microsoft took over) as the network would choose super nodes to act as routers. These super nodes would be the machines with higher reliability and better processing powers. After MS' takeover the supernodes have been centralized and the election of supernodes functionality has been removed from the system.

SETI is an example of "embarrassingly parallel" workload where the problem has inherent nature to lend itself to be divided into work-units and be computed in-parallel without any need to consolidate the results, it is called "embarrassingly parallel" because there is little to no efforts required to distribute the work load in parallel and you don't get much praise for doing it. one more example of "embarrassingly parallel" from the file systems that we have covered so far could be web-indexing in GFS. Any file system that we have discussed so far, which doesnt trust the clients can be modeled to work as public sharing system.

Note: Public resource computing is also very similar to mapreduce, which we will be discussing later in the course. Make sure to keep public resource computing in mind when we reach this.