DistOS 2014W Lecture 8: Difference between revisions

From Soma-notes
Sjoy (talk | contribs)
formatting change
Eapache (talk | contribs)
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
==Group 1==
==NFS and AFS (Jan 30)==


'''NFS:'''
* [http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-11/sandberg-nfs.pdf Russel Sandberg et al., "Design and Implementation of the Sun Network Filesystem" (1985)]
* [http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-11/howard-afs.pdf John H. Howard et al., "Scale and Performance in a Distributed File System" (1988)]


1) per operating traffic
==NFS==
Group 1:


2) rpc based
1) per operation traffic.
 
2) rpc based. Easy with which to program but a very [http://www.joelonsoftware.com/articles/LeakyAbstractions.html leaky abstraction].


3) unreliable
3) unreliable


'''AFS:'''
Group 2:
 
1) designed to share disks over a network, not files
 
2) more UNIX like. They tried to maintain unix file semantics on the client and server side.
 
3) portable. It was meant to work (as a server) across many FS types.
 
4) used UDP: if request dropped, just request again.
 
5) it is not minimize network traffic.


1) design for 5000 clients
6) used VNODE, VFS as transparent interfaces to local disks.


2) high integrity.
7) not have much hardware equipment


==Group 2==
8) later versions took on features of AFS


'''NFS:'''
9) stateless protocol conflicts with files being stateful by nature.


1) designed to share disks over a network, not files
Group 3:


2) more UNIX like
1) cache assumption invalid.


3) portable
2) no dedicated locking mechanism. They couldn't decide on which locking strategy to use, so they left it up to the users of NFS to use their own separate locking service.


4) use UDP
3) bad security


5) it is not minimize network traffic.
Other:
* Client mounts full FS. No common namespace.
* Hostname lookup and address binding happens at mount


6) used VNODE
==AFS==


7) not have much hardware equipment
Group 1


8) later versions took on features of AFS
1) design for 5000 to 10000 clients


9) stateless protocol conflicts with files being state-full by nature.
2) high integrity.


'''AFS:'''
Group 2


1) designed to share files over a network, not disks
1) designed to share files over a network, not disks. It is one FS.


2) better scalability
2) better scalability


3) better security.
3) better security (Kerberos).


4) minimize network traffic.
4) minimize network traffic.
Line 55: Line 71:
8) inode concept replaced with fid
8) inode concept replaced with fid


 
Group 3
==Group 3==
 
'''NFS:'''
 
1) cache assumption invalid.
 
2) no locking
 
3) bad security
 
'''AFS:'''


1) cache assumption valid
1) cache assumption valid
Line 74: Line 79:
3) good security.
3) good security.


==Group 4==
Other:
 
* Caches full files locally on open. Sends diffs on close.
 


==Class Discussion:==


Additional notes from the class discussion:
NFS and AFS took substantially different approaches to the many problems they faced; while we consider AFS to have made generally better choices in this respect, it was not widely adopted because it was complex and difficult to setup/administer/maintain. NFS however, was comparatively simple. Its protocol and API were relatively stateless (thus it used UDP) and it shared information at the file level rather than the block level. It was also built on RPC, which was convenient to program in but was (as we have already discussed) a bad abstraction since it hid the inherent flakiness of the network. This use of RPC led to security and reliability problems with NFS.


Capturing some of Anil's Observations about NFS and AFS: The reason why NFS does not try to share at block level instead of file level is that sharing at block level is complicated from the implementation point of view. NFS use UDP as the transport protocol since UDP being a stateless protocol is in-line with the NFS design philosophy of not maintaining state information.  Security and unreliability issues in NFS are an implication of using RPC. RPC is a nice way for programming but RPC is not designed for networks (where flakiness is an inherent characteristic) which is better explained by the analogy that you never expect from a programming point of view your function call to fail(not to return) because of communication error. AFS designers considered network as a bottle neck and tried to reduce the number of chatter over network by using caching. In Anil's opinion 'open' and 'close' operations in AFS were critical and the 'close' operation assumes importance to the same proportions of a 'commit' operation in a well-designed database system. Anil mentioned that security model of AFS is interesting in that rather than going for the UNIX access list based implementation AFS used a single sign on system based on Kerberos. In Anil's opinion cool thing about Kerberos is that idea of using tickets to get access. Another interesting fact that was mentioned was that irrespective of having better features compared to NFS, AFS did not get widely adopted. The reason for this was that the administrative mechanism for AFS was complex and it required highly trained/skilled people to setup AFS and it also required quite a number of day’s effort to set it up and maintain.
AFS took a more thorough approach to figuring out coherent consistency guarantees and how to implement them efficiently. The AFS designers considered the network as a bottle neck and tried to reduce the amount of chatter over network by making heavy use of caching. The 'open' and 'close' operations in AFS were critical, assuming importance similar in proportion to 'commit' operations in a well-designed database system. The security model of AFS was also interesting in that rather than going for the UNIX access list based implementation AFS used a single sign on system based on Kerberos.

Latest revision as of 17:33, 23 April 2014

NFS and AFS (Jan 30)

NFS

Group 1:

1) per operation traffic.

2) rpc based. Easy with which to program but a very leaky abstraction.

3) unreliable

Group 2:

1) designed to share disks over a network, not files

2) more UNIX like. They tried to maintain unix file semantics on the client and server side.

3) portable. It was meant to work (as a server) across many FS types.

4) used UDP: if request dropped, just request again.

5) it is not minimize network traffic.

6) used VNODE, VFS as transparent interfaces to local disks.

7) not have much hardware equipment

8) later versions took on features of AFS

9) stateless protocol conflicts with files being stateful by nature.

Group 3:

1) cache assumption invalid.

2) no dedicated locking mechanism. They couldn't decide on which locking strategy to use, so they left it up to the users of NFS to use their own separate locking service.

3) bad security

Other:

  • Client mounts full FS. No common namespace.
  • Hostname lookup and address binding happens at mount

AFS

Group 1

1) design for 5000 to 10000 clients

2) high integrity.

Group 2

1) designed to share files over a network, not disks. It is one FS.

2) better scalability

3) better security (Kerberos).

4) minimize network traffic.

5) less UNIX like

6) plugin authentication

7) needs more kernel storage due to complex commands

8) inode concept replaced with fid

Group 3

1) cache assumption valid

2) locking

3) good security.

Other:

  • Caches full files locally on open. Sends diffs on close.

Class Discussion:

NFS and AFS took substantially different approaches to the many problems they faced; while we consider AFS to have made generally better choices in this respect, it was not widely adopted because it was complex and difficult to setup/administer/maintain. NFS however, was comparatively simple. Its protocol and API were relatively stateless (thus it used UDP) and it shared information at the file level rather than the block level. It was also built on RPC, which was convenient to program in but was (as we have already discussed) a bad abstraction since it hid the inherent flakiness of the network. This use of RPC led to security and reliability problems with NFS.

AFS took a more thorough approach to figuring out coherent consistency guarantees and how to implement them efficiently. The AFS designers considered the network as a bottle neck and tried to reduce the amount of chatter over network by making heavy use of caching. The 'open' and 'close' operations in AFS were critical, assuming importance similar in proportion to 'commit' operations in a well-designed database system. The security model of AFS was also interesting in that rather than going for the UNIX access list based implementation AFS used a single sign on system based on Kerberos.