DistOS-2011W Distributed File Sharing
Author: Omi Iyamu oiyamu@gmail.com
PDF available at [PDF]
Abstract
File sharing is a tool necessary for group collaboration, a simple way to make your files available to others, and nice way to access file contents across multiple machines. This paper discusses on a high-level the different file-sharing systems currently being used and the different strategies they employ to facilitate file sharing. In section 2, different file sharing systems are categorized based on scale into Local Area Network sharing and Internet based sharing. Section 3 discusses the steps involved in the process of sharing an actual file using the different file sharing systems discussed previously in section 2. Finally in section 4, this paper discusses the challenges that need to be overcome to develop an effective file sharing system for a distributed operating system and gives some suggestions to how some of them may be overcome.
Introduction
File sharing in a distributed environment should differ from that in a local environment. In this paper, whenever a mention of a distributed operating system is made, it will be done so with reference to an Internet based operating system. As such, the distributed environment that will be talked about will be the Internet. Whenever a local environment is mentioned, it will be done so with reference to a local area network.
The scope of this paper is just a review of a few file-sharing systems. The motivation is to determine what challenges need to be addressed in the development of a file sharing system that can be deployed on a distributed operating system.
Discussions in this paper will be on a high level in order to enable readers that do not have strong technical background ease of understanding. However, a small level of computer science or similar background is needed.
File Sharing systems
The main differences between different file sharing systems are the modes of access and the methods used to transfer the shared files. There are numerous types of file sharing systems out there; I have categorized them into two types based on scale. Section 2.1 talks about Local Area Network sharing, which can be considered as a small-scale file sharing system. Section 2.2 talks about Internet based file-sharing systems, which can be considered large scale file sharing.
Local Area Network Sharing
On a Local Area Network (LAN), the computers present on a LAN have some degree of trust between them. The key advantages to using sharing systems designed for Local Area Networks is the ability to set access restrictions to files being shared and increased transfer speeds. Examples of such are AFP (Apple Filing Protocol) used by Apple and SMB (Server Message Block) used by Windows.
Internet Based File Sharing
There are a number of Internet based or online file sharing systems that take different approaches to file sharing. Some examples are peer-2-peer networks, discussed in section 2.2.1, and FTP (File Transfer Protocol), discussed in section 2.2.2.
Peer-2-peer Systems
Peer-2-peer is one of the most commonly used file sharing systems out there. User computers act as both client and server nodes and share content in between themselves. There are two main styles to which peer-2-peer file-sharing systems work by, one involves the use of torrents and the other does not.
- Torrent style
Out of all the torrent based peer-2-peer networks Bit-torrent by is the most commonly used today [1]. In itself, Bit-torrent is just a file downloading protocol that enables simulations downloading from different sources holding the exact same file.
- Non-torrent style
This is more of the older style peer-2-pper networks like Kazaa. Unlike torrent networks, there is a centralized server that holds information about who is sharing what files and downloading is done from one single computer to another single computer.
File Transfer Protocol
FTP as the name suggests is a file transfer protocol. File transfer is made from a single computer source to a single receiving computer. FTP file systems are often password protected, this is to ensure only authorized users access the files. To access an FTP file system you need to know the IP address or the domain name to the computer to which you want to access. When a file is requested for, the complete file is downloaded onto the requesting computer.
File Sharing Process
There are numerous file sharing protocols available and can generally be broken up into three main steps, the sharing of the file itself, the finding for the shared file, and the accessing or transferring of the shared file. In this section we will be discussing the process for peer-2-peer networks and Local Area Networks.
Sharing the file
The sharing of the actual file is the process of setting up a file for sharing. Different file sharing systems follow a different process of actually getting a file to be enabled for sharing.
Peer-2-peer sharing
Peer-2-peer torrent networks generally follow a submission process towards file sharing. With Bit torrent, a user injects new content buy uploading a torrent file to a torrent search website such as supernova.com and creating a seed with the first copy of the file [1]. Bit torrent has a mediator system that checks the content of files to make sure they are what they say they are. When a user submits a new file, a mediator has to check it before it is allowed into the sharing network. After a user has submitted several files that passed mediation, he will then be promoted to unmediated submitter status. This means the user is trusted enough to submit files that will be directly injected into the sharing network without having to be mediated [1]. Non-torrent peer-2-peer networks don’t follow this submission system; all you have to do to share a file is usually just to place it in the share directory used buy the third-party peer-2-peer application.
There is no notion of setting access restrictions with peer-2-peer file sharing. Users generally have unrestricted access to shared content; they can be downloaded, edited, and re-uploaded by all.
Local Area Network sharing
In local Area Networks, setting up a file to be shared does not involve any submission process or mediation. Being that members of the network have some level of trust between them, to setup a file for sharing, all you have to do is go into the file’s properties and enable its sharing property. Access restrictions can also be set to restrict read and or write properties of the files or directories being shared.
- Read only
In this setting the user is only allowed to view contents of the file. This is to say that no changes can be made to the root file. The only way around this is to copy the particular file over and make changes to your local copy.
- Write only
This setting is used on directories. In this setting a directory will be turned into a drop box. That is to say another user on the network can write files to the given directory but cannot view the contents of the directory. Access to read the contents of the directory is only for the owner of the directory.
- Read and Write
This setting will allow the user to make changes the file, and save these changes on to the root file. In this, the file does not need to be copied over. In a directory case, contents of the directory can be modified remotely.
People share files so that themselves and or other people may access it remotely. As such, finding a file that has been shared is a key step in the process of sharing. Methods of locating shared files differ between sharing systems.
Peer-2-peer file search
n peer-2-peer systems, finding the shared files you want is pretty easy. Non-torrent networks like Kazaa have a centralized server that holds lists of who is sharing what [3]. In order to search thorough this list, a third-party peer-2-peer application is needed. However cleaning of the file lists on these types of systems is poor which results in users sometimes downloading “fake” files.
In torrent networks like Bit-torrent where the shared files are checked on submission, the likelihood of downloading a fake file is reduced. However, searching for a shared file is done via third party search engines like supernova.com and isohunt.com.
Local Area Network file search
In local area networks, in order to find shared files you need to know where the file is located. This is to say that if lets say you are looking for a particular file and you don’t know the location, you may have to comb through the entire network manually in search of this file.
Transferring the file
In order to access a file over any network, some level of transfer needs to be made whether temporary or permanent. Files are transferred temporarily only if they only need to be viewed or edited. Files are transferred permanently if it is being copied or moved completely. File sharing systems like peer-2-peer only transfer files permanently, whereas most local file sharing systems over a local area network will only make a permanent transfer when a copy or cut command is executed.
Peer-2-peer file transfer
After the user has identified his target file. Depending on the type of the peer-2-peer network, there are two main ways the file can be transferred to the user.
- Single user to single user transfer
In this style of transfer, the complete file is downloaded from a single source. Non-torrent peer-2-peer networks use this style of transfer. Torrent networks only uses this style when dealing with shared files that only have a single seed.
- Multiple users to single user transfer
In this style of transfer, the file is simultaneously downloaded from multiple sources. This is the style more used by torrent networks like Bit torrent. Files shared on torrent networks are split into chunks. The torrent file itself hold information about seeds for the particular shared file. As such, different chunks of the shared file is downloaded simultaneously onto the users computer and reassembled. This way much higher download speeds can be achieved compared to the single-to-single user transfers.
Local operating system file transfer
In a local area network setting, files are generally viewed from the root. Technically, the complete or portions of the file are transferred to main memory and then viewed form there, the same way it would if you had a local copy. The only difference being that instead of the transfer being made from your local storage (hard drive) to main memory, the transfer is from a remote storage device somewhere on the network to main memory. The only real reason why this can be done is that transfer speeds over a local network is faster than over the Internet. As such, access restrictions can properly be enforced.
Sharing of Distributed Files
When we think of file sharing we generally think of the file location being on our computer. With a distributed file system the location of the file to which we want to share most likely will not physically be on our computer. This brings a level of complexity to the actual sharing of the file.
Sharing of a file in a distributed operating system’s case will have to be scalable enough that it can be deployed over the Internet. This means that traditional AFP and SMB approaches will have difficulty scaling up to the task. Examples of file sharing systems that already work on this level as discussed are peer-2-peer networks and FTP. Defining an effective file sharing system for a distributed operating system the following challenges need to be addressed.
- Transfer speed
When a file is to be transferred it should be done so with the highest speed possible. A torrent approach may not necessarily be a complete answer as multiple copies of the file is needed to improve speed. This will be a huge problem with sensitive files in which a user may not want multiple copies of it located all over the internet.
- Duplicate files
As it is already, common files like music files may have millions of copies located on different computers all over the world. For a distributed file system, having so many copies of the same file is an ineffective use of space and should be avoided where possible.
- File integrity
Corrupted files or fake files are an issue in sharing because they may end up corrupting computers that access the file. One way this is mitigated today is through reporting systems in which users can report a fake or corrupted file to the host or source. Another approach is by plain old checking systems that go through files checking its integrity. In torrent systems, as previously discussed, mediators manually do the checking of files.
- File backup
This is a solution to help file integrity as well as data loss. If it is determined that a file has lost its integrity, there needs to be a mechanism to restore the integrity of the file. The easiest way to do this is to restore the file from a good backup. Data or file loss can happen in a lot of ways, for instance if a server in which the file is stored goes down. In this case, a back up copy needs to be located somewhere else that the user can access.
- Access restrictions
File sharing systems like FTP, AFP and SMB can restrict a users ability to access a particular file with authentication mechanisms. Having such capabilities in a distributed environment for sharing is certainly necessary in order to have a more flexible and restricted sharing ability. AFP and SMB take access restrictions further to also restrict read and write capabilities.
- Search capability
This can be looked at as more of a convenience measure than a need; it would be nice for a user to be able to search through all the shared files that he or she has access. Having this will certainly aid in the development of more user friendly distributed operating systems.
Conclusion
File sharing is a need necessary to accomplish many collaborative tasks not only in the work place, but in other areas as well. We have discussed the differences in some of the popular file sharing systems being used today like peer-2-peer networks and Local Area Network file sharing. The similarity between both of these is that the shared files are stored on the host computers. In a distributed environment this may not be the case. Through the study of the current file sharing systems, we have found that in order to develop an effective file sharing system for a distributed operating system, challenges such as, transfer speeds, duplicate files, file integrity, file backup, access restrictions, and search capabilities need to be addressed. Current file sharing systems address some of these issues but no single one addresses all of them properly. As such maybe a hybrid between the Local Area Network sharing and Internet based file sharing is needed.
References
[1] J. Pouwelse, P. Garbacki, D. Epema, H. Sips. The Bit-torrent P2P File-Sharing System. Delft University of Technology, Delft, The Netherlands.
[2] R. Bhagwan, S. Savage, and G. M. Voelker. Understanding availability. In Inter- national Workshop on Peer to Peer Systems, Berkeley, CA, USA, February 2003.
[3] B. Cohen. Incentives build robustness in bittorrent. In Workshop on Economics of Peer-to- Peer Systems, Berkeley, USA, May 2003.
[4] S. Saroiu, P. Krishna, G. Steven, D. Gribble. A Measurement Study of Peer-to-peer File Sharing Systems. University of Washington, Seattle, WA, USA.
[5] N. Leibowitz, M. Ripeanu, and A. Wierzbicki. Deconstructing the kazaa network. In 3rd IEEE Workshop on Internet Applications (WIAPP’03), San Jose, CA, USA, June 2003.
[6] R. Sherwood, R. Braud, and B. Bhattacharjee. Slurpie: A cooperative bulk data transfer protocol. In IEEE Infocom, Honk Kong, China, March 2004.
[7] B.T. Loo, J.M. Hellerstein, R. Huebsch, S. Shenker, I. Stoica. Enhancing P2P File-Sharing with an Internet-Scale Query Processor.UC Berkeley. VLDB Conference, Toronto, Canada, 2004.