DistOS-2011W Wuala

From Soma-notes

Introduction

We live in an unprecedented era of communication with most modern computers being connected to each other via the internet. This interconnectivity provides an age of instant communication and information sharing. This same connectivity should protect users from file loss as the user's files should not have to exist solely on local storage, the files should also exist in storage in the internet. With files stored in the internet these same files should then be available from any computer a user deign's to sit at. This is a vision that others share and have started to implement; there are now abundant online services that help provide online storage. However these system's all impose a central authority that allows access and provides the storage itself. We prefer the concept of a service ran between peers where user's give up space and bandwidth from their local system and in turn receive fast, secure and reliable distributed online storage. Currently the system that comes closest to this is Wuala.

The rest of the document is composed as in the following: section two briefly discusses alterative solutions to Wuala. In section three we explore the design behind Wuala, section four provides a brief usage report. Section five provides discussion on the Wuala, section six contains the paper's conclusion and section seven contains the references cited in this paper.

Previous Work/Alternative Solutions

In the area of online/distributed storage there are a number of existing works. The can be seperated into two main divisions; distributed file systems and service oriented systems.

Distributed file systems include academic offerings such as OpenAFS (Andrew File System), OceanStore, Ceph and the commercially driven GFS( Google File System ) and it's open source implementation HFS(Hadoop File System). The weaknesses and strengths and viability of these solutions have been examined thoroughly in class and I won't rehash those discussions here.

The other division of solutions can be visualized as a small two dimensional matrix with one axis consisting of open source and closed and the second axis being divided into backup only and backup and sync. The number of closed source options is exceptionally high including SugarSync, Syncplicity, SpiderOak, BOX.Net, Tonido, Unilium, BackBlaze, Mozy, Carbonite, UbuntuOne, Wuala and the prolific of the bunch Dropbox. On the open source side there is Cyber Duck, iFolder, RubyDrop, and SparkleShare. Neither of these lists are exhaustive but both provide a large portion of the well known offerings.

Common to the majority of these offerings is that storage is centralized. Some of the corporations behind the closed source solutions run their own data centres but, most rely on the storage services of current cloud providers. The exceptions are RubyDrop and SparkleShare both of which rely on GIT which is a distributed version control system. Unluckily neither are ready for general use and are still in the early stages of development. In contrast Wuala is a working system that uses the storage of participants machines for it's storage requirements.

System Description

Preamble

Wuala's designers intended it to be fast, secure and reliable. With these goals they researched, implemented and published for three year's before public testing. Wuala is an academic project turned commercial offering. Wuala's roots can be traced to a group of graduate students at Swiss Federal Institute of Technology Zurich. The students involved now run the service though their company Celeido Inc. As such there is no source code available but due to Wuala's academic roots there are a number of papers and presentations that present the engineering and idea's behind Wuala.

Overview

Wuala is a distributed storage system. Wuala can be thought of as a peer-to-peer system with a both a cryptographic overlay and a distribution overlay. Unlike most other storage systems Wuala leverages the storage and bandwidth of it's client's systems. As a system that relies on the contributions of it's users Wuala has a subsystem that ensures fairness and discourages freeloading.

Wuala provides private user storage but it also provides a fairly rich infrastructure for sharing files. Users can have three levels that they can share files on, individuals, groups and public. These permissions are stored as metadata along with the data it represents.

Clients both put and get files from the Wuala cloud ( network ). The cloud itself can be conceived as being composed of client, storage and super nodes<ref name="video"> Gromilund Dominik. 2007. Wuala - A Distributed File System. Computer Engineering and Networks Laboratory, ETH Zurich. http://www.youtube.com/watch?v=3xKZ4KGkQY8</ref>. In this oversimplified view super nodes are responsible for the routing of requests, storage nodes are responsible for storing the files and clients are only responsible for consumption.

Increasing in detail: in Wuala all stored files are broken into fragments for speed and reliability. These fragments are stored on a number of storage nodes requiring significant routing solutions to provide quick lookup for storage and retrieval.

Describing the two basic scenarios: when a file is added to Wuala it is encrypted on the client system then broken into fragments which are given to a supernode that then distributes the fragments using a predictable algorithm to other supernodes that in turn send them to storage nodes. Upon a client's request for a file the request is sent to one of the supernodes that the client knows about and the request is examined and routed to other supernodes that finally forward the request to the storage nodes that contain fragments of the file. These storage nodes then communicate directly with the client to along concurrent downloads of the fragments. The fragments are then reassembled and finally unencrypted by the client.

Speed

One of the primary design goals of Wuala is speed. The designers of Wuala wanted a way for files to be recovered from multiple sources allowing parallel downloads. The designer's of Wuala decided to store files as a number of fragments across the Wuala cloud as this would provide the data in a manner that would allow concurrent access<ref name="video"> Gromilund Dominik. 2007. Wuala - A Distributed File System. Computer Engineering and Networks Laboratory, ETH Zurich. http://www.youtube.com/watch?v=3xKZ4KGkQY8</ref>.

This fragmented structure is used by Wuala to ensure fast access in unusual circumstances as well. If a single storage node contains a fragment that suddenly becomes in high demand the storage node can redirect requests to other clients that recently downloaded that fragment. This ensures the system scales to high demand for a single file and is similar to other peer-to-peer systems.

Minimizing write times is another way of ensuring fast access. This is one reason the designers of Wuala went with erasure codes for the fragments of data. Replication would cause the entire file to be copied a number of times across the Wuala cloud. This would increase the amount of time for a write to finish and maintenance actions to complete. Erasure codes decrease the amount of data that needs to be stored to provide reliable storage.

While files are on a storage node they are residing in untrusted storage. The fragments are encrypted and as such reside in an encrypted file system on the storage node. Wuala has access rights similar to other files systems but most cryptographic storage methods make changing these rights very slow and entire portions of the file system need to be re-encrypted <ref name="cryptree"> Gromilund Dominik, Meisser Luzius, Schmid Stefan, Wattenhofer Roger. 2006. Cryptree: A Folder Tree Structure for Cryptographic File Systems. Computer Engineering and Networks Laboratory, ETH Zurich. http://dcg.ethz.ch/publications/srds06.pdf</ref>. The designers of Wuala have designed a hierarchical encrypted file system that allows the changing of permissions in constant time by allowing the permissions to be inheritable in the structure.

Routing is the last significant area that the designers of Wuala focused on to obtain speed. Wuala's routing can be thought of a structure overlay network<ref name="language"> Gromilund Dominik, Miller Peter. 2007. A Pattern Language for Overlay Networks in Peer-to-Peer Systems. Department of Computer Science, ETH Zurich. http://people.inf.ethz.ch/lehnerh/pm/publications/getpdf.php?bibname=Own&id=GrolimundMueller06.pdf</ref> which is a system that given a key finds the node that contains the information associated with that key. In Wuala the routing table stores paths between super nodes. Each super node is connected to it's immediate neighbours but additionally each super node is connected to some randomly selected super nodes. These random connections reduce the amount of hops it takes for a request to get routed to O(log n) where n is the number of super nodes in the system.

Security

Wuala was designed with security as another primary goal. All files are encrypted on the client system with 128-bit AES before being fragmented and stored in the Wuala cloud. This allows the client's system to provide all the computation resources for decrypting and encrypting and it also provides additional security as the user's password which is encrypted with 2048 bit RSA never leaves the client's system.

Wuala also allows users to share their files with different levels of granularity. These levels are user, group, and world. These permissions exist within the metadata that is stored with the fragments. One potential leak of information that could occur is that users who have access to shared files could potentially see who else has access to the same files. This information leakage has been accounted for an Wuala's underlying enrypted file system cryptree<ref name="cryptree"/> is designed to avoid this potential security hole.

Routing is a potential source of attack. Super nodes are drawn from the client population as such it is possible for a malicious user to reverse engineer the method of election and set up a subset of super nodes. In a traditional routing system the super nodes would have static routes often only knowing about their immediate neighbours, this could leave the system susceptible to a partition attack as the a malicious user could remove all there super nodes at once thus fracturing the network. Wuala's routing doesn't consist solely of the immediate neighbours or any other static information. While Wuala's routing does contain immediate neighbours it also includes a number of random super nodes. These randomly connected super nodes also change over time as all requests contain routing information that can be examined by any routing super nodes. The routing super nodes can in turn opt to add a connection to any of the super nodes mentioned in this information and do so periodically.

Reliability

To join the Wuala cloud as a member the client's system must be online a minimum of 17% of the time which works out to about 4 hours a day. This is considered the baseline for reliability in Wuala's calculations.

Erasure codes are used instead of raw replication. Erasure codes are a well known method of ensuring data integrity with minimal redundancy and have been used by RAID ( Redundant Array of Independant Disks). Simply explained, erasure codes allow a file to be split into n fragments that contain data so that you only need m fragments to reconstruct the original file. This increases reliability as it decreases the number of replications across the Wuala cloud that need to occur to ensure availability of the file.

Clients are responsible for checking the health of their files. Periodically the client software will check to see if any of the nodes that host the client's fragments have permanently gone offline. If some of the hosting storage nodes are missing the client will produce the fragments that were on the missing nodes and resubmit them to the Wuala cloud. This has the additional benefit of no centralized server.

Fairness

As a system that relies on the contributions of it's users Wuala needs methods to encourage contribution and deter freeloading. Users's contributions are measured in three main dimensions: uptime, storage, and bandwidth<ref name="video"/>.

First is sufficient uptime. To keep the total number of fragments at a reasonable number while ensure reliable recovery Wuala needs an average uptime from each storage and super node at 17% which is approximately four hours. Wuala rewards users whose node(s) exceed the minimum uptime as seen in storage.

1GB of storage in the Wuala cloud is granted upon registration. Users with sufficient uptime can volunteer additional storage on any system that uses their credentials. The amount of storage granted on the Wuala cloud is ( storage volunteered x uptime ) so to both storage and uptime are rewarded as a product of the other. An interesting property of the Wuala system is a single user can have one or many system tied to their account. They can in turn have one or more systems volunteering storage and the storage granted to the user would be the sum of the all the contributing node's ( storage x uptime ) products.

Measuring uptime and contributed storage are relatively simple and can be made difficult to falsify the last dimension of contribution: bandwidth can be harder to quantify. This is because the other two are easy to quantify and bandwidth needs to be qualified. To qualify the level of bandwidth contribution Wuala relies on a reputation system for storage nodes. This process of reputation monitoring is done in five steps<ref name="reputation"> Gromilund Dominik, Meisser Luzius, Schmid Stefan, Wattenhofer Roger. 2006. Havelaar: A Robust and Efficient Reputation System for Active Peer-to-Peer Systems. Computer Engineering and Networks Laboratory, ETH Zurich. http://dcg.ethz.ch/publications/netecon06.pdf</ref>:

  1. Every node keeps records of the amount of bandwidth of all the nodes that it conducts transactions with.
  2. Periodically the nodes report to their observations to their immediate neighbours.
  3. The neighbours discard any data from the reporting node to avoid falsification and aggregate reports.
  4. The reputation of storage nodes is updated locally on each client.
  5. When concurrent requests are made to a storage node it consults the reputation of the requesting nodes and rewards proportionately more bandwidth to node with a better repution.

These three dimensions combine to allow Wuala function a distributed system by encouraging contribution by promoting fairness.

Experiences/Comparison

In this section we will discuss our experience using Wuala.

Installation

All installation files were obtained from Wuala's website http://www.wuala.com

Windows

  • downloaded WualaSetup.exe and followed the instructions
  • since the Windows install was the first install I created my Wuala account with them

Mac OSX

  • downloaded WualaInstaller.dmg
  • copied wuala.app to the Applications directory
  • installed MacFUSE that was included in the Wuala installer

Linux (Ubuntu)

  • download and opened the Wuala debian package
  • with the package added to the Synaptic Package Manager selected Wuala for installation

The installation for all three OSs was quick and painless. After installation n each OS Wuala appeared as an application, and a mounted drive. The application was identical for all three OSs.

One thing to note that after installation I was unable to 'trade' local space for storage on the Wuala cloud until I had exceeded the minimum 17% uptime by waiting about 4 hours.

Tests

Note these test are not exhaustive but are used to indicate the level of viability for a casual user of online storage.

Large File Performance

To test large file performance we copied a 1.5GB movie file to the Wuala drive. The connection of the client system committing the file was capped to 100KB/s. With a 1.5GB this lead us to expect an optimal upload time of approximately 41 hours. Wuala encrypts all files before fragmenting and uploading. The encryption took approximately two minutes. The upload took approximately 56 hours for the file to be fully transferred. This is a increase of 36% over the original estimation. Considering the varying nature of the internet connection used and the fact that my reputation hasn't been established in the Wuala cloud this seems acceptable. A more complete test would be to try again after several weeks of use.

Small/Medium File Performance

To test medium file performance we move a hierarchical structure of music files to the Wuala drive. The average size of the music files were 5MB, and the structure contained 163 files and approximately 180 directories. The encryption process took less than two minutes for the entire structure.

At the limit of 100KB/s the naive calculation suggests that a 6MB file should be transferred in about a minute. Observed times were closer to 5 minutes per 6MB. This is a 500% increase over theoriginal estimation. For such small files the vagaries of the internet connection used in testing could be easily observed as maintaining near peak throughput of 100KB/s. Again a more complete test would be to try again after several weeks of use and my clients have a chance to establish a positive reputation via the Wuala cloud.

Update Rate

Updating the same file via two computers was conducted. The latency between the commit at one system and the verification at the other system was as long as it took for us to switch via the a switch box between he two systems. This puts the total time under 2 seconds. Of ten successive samples only only once where we able to open the file before the local version had been updated via the Wuala cloud.

Concurrency Test

Two systems were were used to open the same file and make differences to it and save at the same time. In this test one of the systems produced a cryptic report to the user attempting to explain that the current version was now out of date. Saving again overwrote the other file. Luckily Wuala supports a conversioning system of the files and the other versions of the file can be reopened.

With the exception of the Small/Medium file tests suggested reasonable performance and behavior. However, due to the reputation system in Wuala bandwidth sensitive tests can not be considered conclusive at this time.

Discussion

There are a number of other features that Wuala provides that hasn't been tested in this paper.

Wuala allows backups of folders outside of the Wuala drive and alternatively synchronization of folders outside of Wuala drive as well. This extends Wuala's functionality far past industry leader Dropbox but other services such as SpiderOak do provide similar capability.

Wuala also has a rich sharing experience that has not been explored in this paper.

Finally Wuala has a web start system that provides a java app that is launched from the browser instead of being downloaded and installed. This is more secure since again the password is kept on the local machine and the encryption is done locally as well. This is arguably a superior alternative to industry leader Dropbox and there web interface and is infinitely more secure.

There are some unanswered questions however.

If one leaves the Wuala network accidentally, think of a failed machine while on vacation and thus the contribution level falls below 17% what happens to the stored files?

Similarly what happens to stored files when a computer leaves the Wuala network? When do storage nodes determine that some fragments are no longer needed and should be removed?

Wuala has a commercial side as well and users can buy storage instead of contributing storage. How does this effect the reputation system. Are these paid user's at the highest level or a higher tier than even the most active contributor?

These are some of the least technical questions that quickly arose during the research on Wuala.

Conclusion

Wuala is a unique offering amongst a see of similar online storage services. It is based on an architecture that seems to be tremendously scalable and relies on the resources of heterogeneous network of systems thus Wuala can be considered a distributed file system. Unlike many other other distributed file systems Wuala is neither specifically designed for a limited type of interaction nor exists solely in the realm of academia. Wuala is currently a commercial offering that retains it's academic roots by providing free storage to those willing to contribute. Finally Wuala delivers on it's promise of speed, security and reliability making it excellent system. The only significant shortcoming is that Wuala is closed source and thus doesn't provide source for academic analysis.

References

<references/>