DistOS-2011W Wuala

From Soma-notes

Introduction

We live in an unprecedented era of communication with most modern computers being connected to each other via the internet. This interconnectivity provides an age of instant communication and information sharing. This same connectivity should protect users from file loss as the user's files should not have to exist solely on local storage, the files should also exist in storage in the internet. With files stored in the internet these same files should then be available from any computer a user deign's to sit at. This is a vision that others share and have started to implement; there are now abundant online services that help provide online storage. However these system's all impose a central authority that allows access and provides the storage itself. We prefer the concept of a service ran between peers where user's give up space and bandwidth from their local system and in turn receive fast, secure and reliable distributed online storage. Currently the system that comes closest to this is Wuala.

The rest of the document is composed as in the following: section two briefly discusses alterative solutions to Wuala. In section three we explore the design behind Wuala, section four provides a brief usage report. Section five provides discussion on the Wuala, section six contains the paper's conclusion and section seven contains the references cited in this paper.

Previous Work/Alternative Solutions

In the area of online/distributed storage there are a number of existing works. The can be seperated into two main divisions; distributed file systems and service oriented systems.

Distributed file systems include academic offerings such as OpenAFS (Andrew File System), OceanStore, Ceph and the commercially driven GFS( Google File System ) and it's open source implementation HFS(Hadoop File System). The weaknesses and strengths and viability of these solutions have been examined thoroughly in class and I won't rehash those discussions here.

The other division of solutions can be visualized as a small two dimensional matrix with one axis consisting of open source and closed and the second axis being divided into backup only and backup and sync. The number of closed source options is exceptionally high including SugarSync, Syncplicity, SpiderOak, BOX.Net, Tonido, Unilium, BackBlaze, Mozy, Carbonite, UbuntuOne, Wuala and the prolific of the bunch Dropbox. On the open source side there is Cyber Duck, iFolder, RubyDrop, and SparkleShare. Neither of these lists are exhaustive but both provide a large portion of the well known offerings.

Common to the majority of these offerings is that storage is centralized. Some of the corporations behind the closed source solutions run their own data centres but, most rely on the storage services of current cloud providers. The exceptions are RubyDrop and SparkleShare both of which rely on GIT which is a distributed version control system. Unluckily neither are ready for general use and are still in the early stages of development. In contrast Wuala is a working system that uses the storage of participants machines for it's storage requirements.

System Description

Preamble

Wuala's designers intended it to be fast, secure and reliable. With these goals they researched, implemented and published for three year's before public testing. Wuala is an academic project turned commercial offering. Wuala's roots can be traced to a group of graduate students at Swiss Federal Institute of Technology Zurich. The students involved now run the service though their company Celeido Inc. As such there is no source code available but due to Wuala's academic roots there are a number of papers and presentations that present the engineering and idea's behind Wuala.

Overview

Wuala is a distributed storage system. Wuala can be thought of as a peer-to-peer system with a both a cryptographic overlay and a distribution overlay. Unlike most other storage systems Wuala leverages the storage and bandwidth of it's client's systems. As a system that relies on the contributions of it's users Wuala has a subsystem that ensures fairness and discourages freeloading.

Wuala provides private user storage but it also provides a fairly rich infrastructure for sharing files. Users can have three levels that they can share files on, individuals, groups and public. These permissions are stored as metadata along with the data it represents.

Clients both put and get files from the Wuala cloud ( network ). The cloud itself can be conceived as being composed of client, storage and super nodes<ref> Gromilund Dominik. 2007. Wuala - A Distributed File System. Computer Engineering and Networks Laboratory, ETH Zurich. http://www.youtube.com/watch?v=3xKZ4KGkQY8</ref>. In this oversimplified view super nodes are responsible for the routing of requests, storage nodes are responsible for storing the files and clients are only responsible for consumption.

Increasing in detail: in Wuala all stored files are broken into fragments for speed and reliability. These fragments are stored on a number of storage nodes requiring significant routing solutions to provide quick lookup for storage and retrieval.

Describing the two basic scenarios: when a file is added to Wuala it is encrypted on the client system then broken into fragments which are given to a supernode that then distributes the fragments using a predictable algorithm to other supernodes that in turn send them to storage nodes. Upon a client's request for a file the request is sent to one of the supernodes that the client knows about and the request is examined and routed to other supernodes that finally forward the request to the storage nodes that contain fragments of the file. These storage nodes then communicate directly with the client to along concurrent downloads of the fragments. The fragments are then reassembled and finally unencrypted by the client.

Speed

One of the primary design goals of Wuala is speed. The designers of Wuala wanted a way for files to be recovered from multiple sources allowing parallel downloads. The designer's of Wuala decided to store files as a number of fragments across the Wuala cloud as this would provide the data in a manner that would allow concurrent access<ref> Gromilund Dominik. 2007. Wuala - A Distributed File System. Computer Engineering and Networks Laboratory, ETH Zurich. http://www.youtube.com/watch?v=3xKZ4KGkQY8</ref>.

This fragmented structure is used by Wuala to ensure fast access in unusual circumstances as well. If a single storage node contains a fragment that suddenly becomes in high demand the storage node can redirect requests to other clients that recently downloaded that fragment. This ensures the system scales to high demand for a single file and is similar to other peer-to-peer systems.

Minimizing write times is another way of ensuring fast access. This is one reason the designers of Wuala went with erasure codes for the fragments of data. Replication would cause the entire file to be copied a number of times across the Wuala cloud. This would increase the amount of time for a write to finish and maintenance actions to complete. Erasure codes decrease the amount of data that needs to be stored to provide reliable storage.

While files are on a storage node they are residing in untrusted storage. The fragments are encrypted and as such reside in an encrypted file system on the storage node. Wuala has access rights similar to other files systems but most cryptographic storage methods make changing these rights very slow and entire portions of the file system need to be re-encrypted <ref> Gromilund Dominik, Meisser Luzius, Schmid Stefan, Wattenhofer Roger. 2006. Cryptree: A Folder Tree Structure for Cryptographic File Systems. Computer Engineering and Networks Laboratory, ETH Zurich. http://dcg.ethz.ch/publications/srds06.pdf</ref>. The designers of Wuala have designed a hierarchical encrypted file system that allows the changing of permissions in constant time by allowing the permissions to be inheritable in the structure.

Routing is the last significant area that the designers of Wuala focused on to obtain speed. Wuala's routing can be thought of a structure overlay network<ref> Gromilund Dominik, Miller Peter. 2007. A Pattern Language for Overlay Networks in Peer-to-Peer Systems. Department of Computer Science, ETH Zurich. http://people.inf.ethz.ch/lehnerh/pm/publications/getpdf.php?bibname=Own&id=GrolimundMueller06.pdf</ref> which is a system that given a key finds the node that contains the information associated with that key. In Wuala the routing table stores paths between super nodes. Each super node is connected to it's immediate neighbours but additionally each super node is connected to some randomly selected super nodes. These random connections reduce the amount of hops it takes for a request to get routed to O(log n) where n is the number of super nodes in the system.

Security

Wuala was designed with security as another primary goal. All files are encrypted on the client system with 128-bit AES before being fragmented and stored in the Wuala cloud. This allows the client's system to provide all the computation resources for decrypting and encrypting and it also provides additional security as the user's password which is encrypted with 2048 bit RSA never leaves the client's system.

Wuala also allows users to share their files with different levels of granularity. These levels are user, group, and world. These permissions exist within the metadata that is stored with the fragments. One potential leak of information that could occur is that users who have access to shared files could potentially see who else has access to the same files. This information leak has been accounted for an Wuala's underlying enrypted file system cryptree<ref> Gromilund Dominik, Meisser Luzius, Schmid Stefan, Wattenhofer Roger. 2006. Cryptree: A Folder Tree Structure for Cryptographic File Systems. Computer Engineering and Networks Laboratory, ETH Zurich. http://dcg.ethz.ch/publications/srds06.pdf</ref> is designed to avoid this potential security hole.

Routing is a potential source of attack. Super nodes are drawn from the client population as such it is possible for a malicious user to reverse engineer the method of election and set up a subset of super nodes. In a traditional routing system the super nodes would have static routes often only knowing about their immediate neighbours, this could leave the system susceptible to a partition attack as the a malicious user could remove all there super nodes at once thus fracturing the network. Wuala's routing doesn't consist solely of the immediate neighbours or any other static information. While Wuala's routing does contain immediate neighbours it also includes a number of random super nodes. These randomly connected super nodes also change over time as all requests contain routing information that can be examined by any routing super nodes. The routing super nodes can in turn opt to add a connection to any of the super nodes mentioned in this information and do so periodically.

Reliability

To join the Wuala cloud as a member the client's system must be online a minimum of 17% of the time which works out to about 4 hours a day. This is considered the baseline for reliability in Wuala's calculations.

Erasure codes are used instead of raw replication. Erasure codes are a well known method of ensuring data integrity with minimal redundancy and have been used by RAID ( Redundant Array of Independant Disks). Simply explained, erasure codes allow a file to be split into n fragments that contain data so that you only need m fragments to reconstruct the original file. This increases reliability as it decreases the number of replications across the Wuala cloud that need to occur to ensure availability of the file.

Clients are responsible for checking the health of their files. Periodically the client software will check to see if any of the nodes that host the client's fragments have permanently gone offline. If some of the hosting storage nodes are missing the client will produce the fragments that were on the missing nodes and resubmit them to the Wuala cloud. This has the additional benefit of no centralized server.

Fairness

Experiences/Comparison (multiple sections)

In multiple sections, describe what you learned.

Discussion

What was interesting? What was surprising? Here you can go out on tangents relating to your work

Conclusion

Summarize the report, point to future work.

References

<ref> Gromilund Dominik, Meisser Luzius, Schmid Stefan, Wattenhofer Roger. 2006. Cryptree: A Folder Tree Structure for Cryptographic File Systems. Computer Engineering and Networks Laboratory, ETH Zurich. http://dcg.ethz.ch/publications/srds06.pdf</ref>

<ref> Gromilund Dominik, Meisser Luzius, Schmid Stefan, Wattenhofer Roger. 2006. Havelaar: A Robust and Efficient Reputation System for Active Peer-to-Peer Systems. Computer Engineering and Networks Laboratory, ETH Zurich. http://dcg.ethz.ch/publications/netecon06.pdf</ref>

<ref> Gromilund Dominik, Miller Peter. 2007. A Pattern Language for Overlay Networks in Peer-to-Peer Systems. Department of Computer Science, ETH Zurich. http://people.inf.ethz.ch/lehnerh/pm/publications/getpdf.php?bibname=Own&id=GrolimundMueller06.pdf</ref>

<ref> Gromilund Dominik. 2007. Wuala - A Distributed File System. Computer Engineering and Networks Laboratory, ETH Zurich. http://www.youtube.com/watch?v=3xKZ4KGkQY8</ref>

<references/> Give references in proper form (not just URLs if possible, give dates of access).