DistOS 2014W Lecture 14

From Soma-notes

OceanStore

What is the dream?

The dream was to create a persistent storage system that had high availability and was universally accessibly--a global, ubiquitous persistent data storage solution. OceanStore was meant to be utility managed by multiple parties, with no one party having total control/monopoly over the system. To support the goal of high availability, there was a high amount of redundancy and fault-tolerance. For high persistence, everything was archived--nothing was ever truly deleted. This can be likened to working in version control with "Commits". This is possibly due to the realization that the easier it is to delete things, the easier it is to lose things.

The basic assumption made by the designers of OceanStore, however, was that none of the servers could be trusted. To support this, the system held only opaque/encrypted data. As such, the system could be used for more than files (e.g., for whole databases).

The system utilized nomadic data, meaning that data could be cached anywhere, unlike with NFS and AFS where only specific servers can cache the data.

Why did the dream die?

The biggest reason that caused the OceanStore dream to die was the assumption of mistrusting all the actors--everything else they did was right. This assumption, however, caused the system to become needlessly complicated as they had to rebuild everything to accommodate this assumption. This was also unrealistic as this is not an assumption that is generally made (i.e., it is normally assumed that at least some of the actors can be trusted). Other successful distributed systems are built on a more trusted model. In short, the solution that accommodates untrusted actors assumption is just too expensive.

Technology

  • The trust model is the most attractive feature which ultimately killed it.
    • The untrusted assumption was a huge burden on the system. Forced technical limitations made them uncompetitive.
    • It is just easier to trust a given system. More convenient.
    • Every system is compromisable despite this mistrust
  • Pub key system reduces usability
    • If you loose your key, you're S.O.L.
    • If you wanted to remove someone' access over an object, you would have to re-encrypt the object with a new key and provide the key to user who wtill have access to object
  • security
    • there is no security mechanism in servers side.
    • can not now who access the data
  • economic side
    • The economic model is unconvincing as defined. The authors suggest that a collection of companies will host OceanStore servers, and consumers will buy capacity (not unlike web-hosting of today).

Use Cases

  • Subset of the features already exist
    • Blackberry and Google offer similar services.
    • These current services owned by one company, not many providers.
    • Can not sell back your services as a user.
      • ex. Can not sell your extra storage back to the utility.

Pond: What insights?

  • They actually built it.
  • Can't assume the use of any infrastructure, so they rebuild everything!
    • Built over the internet.
    • Tapestry (routing).
    • GUID for object indentification. Object naming scheme.

Benchmarks

  • Really good read speed, really bad write speed.

Storage overhead

  • How much are they increasing the storage needed to implement their storage model.
  • Factor of 4.8x the space needed (you'll have 1/5th the storage)
  • Expensive, but good value (data is backed up, replicated, etc..)
  • Considerations of importance before making an update
    • burn more storage space as more updates are made

Update performance

  • No data is mutated. It is diffed and archived.
  • Creating a new version of an object and distributing that object.

Benchmarks in a nutshell

  • Everything is expensive!
  • High latency

Other stuff

  • Byzantine fault tolerance
    • byzantine fault tolerant network replicates the data in such a way that even if m nodes out of total n nodes,in a network,fail, you would still be able to recover the whole data. but as you increase the value of number m, the required network messages to be exchanges also increases, so there is a tradeoff.
    • Assuming certain actors are malicious
  • Bitcoin
    • Trusted vs Untrusted.
    • It is considered to be untrusted but it takes huge amount of trust when exchanges are made.

What's worth salvaging from the dream?

  • Using spare resources in other locations.
  • Similar routing system are used in large peer to peer systems.

How to read a research paper

  • Start with Intro
    • Figure out what the problem is
  • then see the related work for context
  • then go to conclusion. Focus on results.
  • then fill in the gaps by reading specific parts of the body