Difference between revisions of "DistOS 2023W 2023-03-01"

From Soma-notes
Jump to navigation Jump to search
 
Line 13: Line 13:
* What is the threat model for BOINC-type systems?  Does this threat model make it suitable or unsuitable for some applications?
* What is the threat model for BOINC-type systems?  Does this threat model make it suitable or unsuitable for some applications?
* Why aren't BOINC-style systems so popular anymore?
* Why aren't BOINC-style systems so popular anymore?
==Notes==
<pre>
BOINC & OceanStore
------------------
BOINC
- what is it good for?  (what problems?)
  - embarassingly parallel problems
  - "work units" can be checked and combined relatively cheaply
Note that with BOINC bandwidth is at a premium
- want to send relatively small chunks of data to clients that require a significant amount of analysis
A classic supercomputer is very good at low-latency, high bandwidth communication.  BOINC isn't!
BOINC is an example of "volunteer computing", people allowing their computers to be used for tasks for which they get no direct benefit
- what is the malicious form of this? a botnet, mining cryptocurrency
Why did volunteer computing become popular?
- dream of making use of vast computing resources of computers attached to the Internet that are otherwise not doing much else
  - note most were desktop systems that were left on for long periods of time running screensavers
- "academic nature" of the past Internet
But nowadays we care about power, we worry about malware, and researchers aren't demanding access to these resources (they can have their own computers that can do their own data analysis much more easily)
But even with BOINC, we have a trust issue
- clients may not trust projects, but that isn't an issue in practice that much
- but projects cannot trust clients...because of gamification
    - some will want to get higher scores so they can get to the top of leaderboards
    - so project must be able to verify whether clients did the work they say
      they've done
    - general solution: send work units to mulitple computers, make sure they agree with their answers
      - don't have to do it for all work units, just enough to detect cheaters
BOINC makes use of lots of untrusted computers to do large-scale computation
OceanStore makes use of lots of untrusted computers to do large-scale storage
Trust model for OceanStore is that storage computers aren't trusted
- data is replicated and encrypted, spread across mulitple systems
In some sense OceanStore is very familiar
- we store data on giant sets of remote systems
- but the trust model is VERY different
  - the party you pay is the one that controls all of the storage
If OceanStore was proposed today, storage would be paid for with cyptocurrency probably.  But it had no monitization model, assumed people would just make contracts to figure it out.  (To be fair it was a research project not a commercial endeavor.)
It turns out that in business, it is mostly best to just have one entity handle the payment and the work.  So that's why we have giant cloud providers and not distributed computing and storage.
If your infrastructure is trusted (i.e., the one you pay controls the resources and can give you guarantees), you don't need all the overhead of dealing with untrusted systems.
Cryptocurrency, blockchain-based systems are trying to bring back some of this decentralization, but at huge efficiency costs and many other problems.
- the trust issue never goes away in distributed systems
</pre>

Latest revision as of 00:53, 2 March 2023

Discussion Questions

You have until 12:10 to discuss the following:

OceanStore

  • What is the threat model underlying OceanStore security?
  • How does this compare to the threat model of modern cloud storage providers?
  • Would you use OceanStore? Why or why not?

BOINC

  • What was the original inspiration for this work?
  • What kind of problems is this style of computing suitable for? What problems is it not suitable for?
  • What is the threat model for BOINC-type systems? Does this threat model make it suitable or unsuitable for some applications?
  • Why aren't BOINC-style systems so popular anymore?

Notes

BOINC & OceanStore
------------------

BOINC
 - what is it good for?  (what problems?)
   - embarassingly parallel problems
   - "work units" can be checked and combined relatively cheaply

Note that with BOINC bandwidth is at a premium
 - want to send relatively small chunks of data to clients that require a significant amount of analysis

A classic supercomputer is very good at low-latency, high bandwidth communication.  BOINC isn't!

BOINC is an example of "volunteer computing", people allowing their computers to be used for tasks for which they get no direct benefit
 - what is the malicious form of this? a botnet, mining cryptocurrency

Why did volunteer computing become popular?
 - dream of making use of vast computing resources of computers attached to the Internet that are otherwise not doing much else
   - note most were desktop systems that were left on for long periods of time running screensavers
 - "academic nature" of the past Internet

But nowadays we care about power, we worry about malware, and researchers aren't demanding access to these resources (they can have their own computers that can do their own data analysis much more easily)

But even with BOINC, we have a trust issue
 - clients may not trust projects, but that isn't an issue in practice that much
 - but projects cannot trust clients...because of gamification
    - some will want to get higher scores so they can get to the top of leaderboards
    - so project must be able to verify whether clients did the work they say
      they've done
    - general solution: send work units to mulitple computers, make sure they agree with their answers
       - don't have to do it for all work units, just enough to detect cheaters


BOINC makes use of lots of untrusted computers to do large-scale computation
OceanStore makes use of lots of untrusted computers to do large-scale storage

Trust model for OceanStore is that storage computers aren't trusted
 - data is replicated and encrypted, spread across mulitple systems

In some sense OceanStore is very familiar
 - we store data on giant sets of remote systems
 - but the trust model is VERY different
   - the party you pay is the one that controls all of the storage


If OceanStore was proposed today, storage would be paid for with cyptocurrency probably.  But it had no monitization model, assumed people would just make contracts to figure it out.  (To be fair it was a research project not a commercial endeavor.)

It turns out that in business, it is mostly best to just have one entity handle the payment and the work.  So that's why we have giant cloud providers and not distributed computing and storage.

If your infrastructure is trusted (i.e., the one you pay controls the resources and can give you guarantees), you don't need all the overhead of dealing with untrusted systems.

Cryptocurrency, blockchain-based systems are trying to bring back some of this decentralization, but at huge efficiency costs and many other problems.
 - the trust issue never goes away in distributed systems