DistOS-2011W Public Goods: Difference between revisions
Line 38: | Line 38: | ||
**To avoid a [http://en.wikipedia.org/wiki/Tragedy_of_the_commons tragedy of the commons] situation with both the storage and computation resources, only predetermined, agreed upon computation will take place which has a net benefit to everyone participating. This computation is based on the data stored at each node and the results (metadata) will be stored locally. This computation can be done by using idle cycles, much like [http://en.wikipedia.org/wiki/Berkeley_Open_Infrastructure_for_Network_Computing BOINC] projects. | **To avoid a [http://en.wikipedia.org/wiki/Tragedy_of_the_commons tragedy of the commons] situation with both the storage and computation resources, only predetermined, agreed upon computation will take place which has a net benefit to everyone participating. This computation is based on the data stored at each node and the results (metadata) will be stored locally. This computation can be done by using idle cycles, much like [http://en.wikipedia.org/wiki/Berkeley_Open_Infrastructure_for_Network_Computing BOINC] projects. | ||
**Must allow for querying of metadata to allow users to effectively search processed data. | **Must allow for querying of metadata to allow users to effectively search processed data. | ||
** Maybe we need to also consider the movement of data between client machines to fully utilize available resources. For instance, let's assume that a given machine has a relatively low amount of available storage but a lot of free computation cycles. In this situation, once data has been processed it should be moved to a client machine with available storage and data from a machine with a relatively low a mount of free computation cycles should be moved to the original machine. | |||
*Administration | *Administration | ||
**Main issue: how are services agreed upon? Once a service is implemented (ie. image store) distributing it getting it running isn't a major issue. | **Main issue: how are services agreed upon? Once a service is implemented (ie. image store) distributing it getting it running isn't a major issue. | ||
**Maybe users should submit potential services to be run on the stored data and "the system" should decide what to do. This could/should be done based on available computation cycles, amount of generated metadata and the overall "popularity" of the service. | **Maybe users should submit potential services to be run on the stored data and "the system" should decide what to do. This could/should be done based on available computation cycles, amount of generated metadata and the overall "popularity" of the service. | ||
***[http://portal.acm.org/citation.cfm?id=43930 Distributed decision making: a research agenda] | ***[http://portal.acm.org/citation.cfm?id=43930 Distributed decision making: a research agenda] |
Revision as of 15:51, 8 March 2011
Members
- Lester Mundt - lmundt at connect.carleton.ca
- Fahim Rahman - frahman at connect.carleton.ca
- Andrew Schoenrock - aschoenr at scs.carleton.ca
Tuesday March 1
Key components:
- Distributed File System
- Can use something previously presented in class
- Distributed computation
- Administration
- How much does a person need to contribute to the system?
- How will users submit (small) services they would like to have run?
- How can very large services be established (from idea to implementation)?
Todo for March 8th:
- Find papers on distributed computation and administration
Thursday March 3
- Seek out papers on specific topic of distributed web cache.
- Discussed two other interesting public goods or services.
- Image registry
- DNA registry - found one article that suggests uncompressed the human genome is between 1.5 and 30 terrabytes but more efficient formats exist that bring it down to 1.5GB
http://www.genetic-future.com/2008/06/how-much-data-is-human-genome-it.html
One example paper Improving Web Server Performance by Caching Dynamic Data
Tuesday March 8
Currently our idea of public goods consists of:
- Distributed file storage
- Ceph seems to be an ideal candidate for this
- Distributed Computation
- To avoid a tragedy of the commons situation with both the storage and computation resources, only predetermined, agreed upon computation will take place which has a net benefit to everyone participating. This computation is based on the data stored at each node and the results (metadata) will be stored locally. This computation can be done by using idle cycles, much like BOINC projects.
- Must allow for querying of metadata to allow users to effectively search processed data.
- Maybe we need to also consider the movement of data between client machines to fully utilize available resources. For instance, let's assume that a given machine has a relatively low amount of available storage but a lot of free computation cycles. In this situation, once data has been processed it should be moved to a client machine with available storage and data from a machine with a relatively low a mount of free computation cycles should be moved to the original machine.
- Administration
- Main issue: how are services agreed upon? Once a service is implemented (ie. image store) distributing it getting it running isn't a major issue.
- Maybe users should submit potential services to be run on the stored data and "the system" should decide what to do. This could/should be done based on available computation cycles, amount of generated metadata and the overall "popularity" of the service.