DistOS 2014W Lecture 14
OceanStore
What is the dream?
The dream was to create a persistent storage system that had high availability and was universally accessibly--a global, ubiquitous persistent data storage solution. OceanStore was meant to be utility managed by multiple parties, with no one party having total control/monopoly over the system. To support the goal of high availability, there was a high amount of redundancy and fault-tolerance. For high persistence, everything was archived--nothing was ever truly deleted. This can be likened to working in version control with "Commits". This is possibly due to the realization that the easier it is to delete things, the easier it is to lose things.
The basic assumption made by the designers of OceanStore, however, was that none of the servers could be trusted. To support this, the system held only opaque/encrypted data. As such, the system could be used for more than files (e.g., for whole databases).
The system utilized nomadic data, meaning that data could be cached anywhere, unlike with NFS and AFS where only specific servers can cache the data.
Why did the dream die?
The biggest reason that caused the OceanStore dream to die was the assumption of mistrusting all the actors--everything else they did was right. This assumption, however, caused the system to become needlessly complicated as they had to rebuild everything to accommodate this assumption. This was also unrealistic as this is not an assumption that is generally made (i.e., it is normally assumed that at least some of the actors can be trusted). Other successful distributed systems are built on a more trusted model. In short, the solution that accommodates untrusted actors assumption is just too expensive.
Technology
As outlined above, the trust model (read: fundamentally untrusted model) is the most attractive feature which ultimately killed it. The untrusted assumption introduced a huge burden on the system, forcing technical limitations which made OceanStore uncompetitive in comparison to other solutions. It is just much more easy and convenient to trust a given system. It should be noted that every system is compromisable, despite this mistrust.
The public key system also reduces usability--if a user loses their key, they are completely out of luck and would need to acquire a new key. This also means that, if you wanted to remove their access over an object, you would have to re-encrypt the object with a new key and provide that key to said user, who would then have access to the object.
With regards to the security, there is no security mechanism on the server side. The server can not know who is accessing the data. On the economic side, the economic model is unconvincing with the way it is defined. The authors suggest that a collection of companies will host OceanStore servers and consumers will buy capacity (not unlike web-hosting today).
Use Cases
- Subset of the features already exist
- Blackberry and Google offer similar services.
- These current services owned by one company, not many providers.
- Can not sell back your services as a user.
- ex. Can not sell your extra storage back to the utility.
Pond: What insights?
- They actually built it.
- Can't assume the use of any infrastructure, so they rebuild everything!
- Built over the internet.
- Tapestry (routing).
- GUID for object indentification. Object naming scheme.
Benchmarks
- Really good read speed, really bad write speed.
Storage overhead
- How much are they increasing the storage needed to implement their storage model.
- Factor of 4.8x the space needed (you'll have 1/5th the storage)
- Expensive, but good value (data is backed up, replicated, etc..)
- Considerations of importance before making an update
- burn more storage space as more updates are made
Update performance
- No data is mutated. It is diffed and archived.
- Creating a new version of an object and distributing that object.
Benchmarks in a nutshell
- Everything is expensive!
- High latency
Other stuff
- Byzantine fault tolerance
- byzantine fault tolerant network replicates the data in such a way that even if m nodes out of total n nodes,in a network,fail, you would still be able to recover the whole data. but as you increase the value of number m, the required network messages to be exchanges also increases, so there is a tradeoff.
- Assuming certain actors are malicious
- Bitcoin
- Trusted vs Untrusted.
- It is considered to be untrusted but it takes huge amount of trust when exchanges are made.
What's worth salvaging from the dream?
- Using spare resources in other locations.
- Similar routing system are used in large peer to peer systems.
How to read a research paper
- Start with Intro
- Figure out what the problem is
- then see the related work for context
- then go to conclusion. Focus on results.
- then fill in the gaps by reading specific parts of the body