DistOS 2018F 2018-10-01

Readings

NFS & AFS (+ Literature reviews)

Russel Sandberg et al., "Design and Implementation of the Sun Network Filesystem" (1985)
John H. Howard et al., "Scale and Performance in a Distributed File System" (1988)

Harvey, "What Is a Literature Review?" (DOC) (PPT)
Taylor, "The Literature Review: A Few Tips On Conducting It" (PDF)

Notes

in-class Lecture notes:

FS client-server file sharing

Server has directory, you can access it

How much work were they trying to do? They were trying to do as little work as possible.

Simple to implement and allow clients to load files

How did they break POSIX (portable OS interface), classic UNIX semantics ... it was stateless which was an implementation choice that caused them to break things. NFS, the thing they took out was open, read, write and close b/c it is stateless.

The paper we read today is not NFS today, describing NFS v1 or v2....v3 changed a lot and v4 changed a lot more. Stateless b/c they didn’t want the server to keep track of the clients so all the client had to do was send a read or a write and then it was done. NFS original allowed for limited caching and seeking (changing file pointer) but still not state on the server but it changes subsequent reads and writes which the kernel must keep track of (bit offset the NFS has but not locally). Path name and translate to a token for late lookups so you don’t need to parse strings every time but the server needs to translate token to path. Corner case: Unix and Open, what happens when you delete a file that is open for writing in Unix? Program using it still exists, takes up space on disk until file closed by the process....delete the file but disk is still full so must kill the process to delete the log files (sig signal) but for NFS this doesn’t work so they had a crude hack....if remove last link to open file, would rename it to .nfsXXXX that would need to later go and cleanup. It has to do with the client knew that it was still open and it knew the ref. Count. In Unix, no delete, only unlink....looks to links to iNode.

Key thing to know about NFS....how to determine who had access to what file? In Unix, by userID and groupID...the request would come for a specific request, who does the permission check in NFS? The client-kernel b/c it is assumed that the user ID and group IDs are synced on server; client wants to access file, if not allowed, the client kernel would deny it. When mount it, the local kernel can pretend to be any idea on the remote server...except the root user (but root is a camelian)...NFS...No File Security

Why no file security, why transparent? Clients were trusted by a central admin...on wire things would be encrypted with a RPC or data encrypting standard...back in the 80s but would not ship that b/c could not export it b/c had to get a license to export it. Was turned off, option to enable but no-one did.

NFS was widely used with large installations but, it sucks b/c major problems. Scalability, and security....trust client computers to do anything with the file system...every file access is going through the server instead of cached...not safe with NFS.

Every read with NFS generates an RPC...every file access, every read/write, doing an RPC...doing network traffic...NFS was designed in a world where they said the “network is the computer”. Built systems that didn’t have local hard drives and all file access was remote. Disk-less workstations, 10-20 systems, it would work well. With AFS, don’t want to do network traffic on every read and write, only on open and close thereby reducing network traffic required...cache of file to validate or maybe only have to send changes back.

When changed from NFS to AFS...with NFS if the network goes down, what happens? System freezes, can’t do any read-writes, just waits (blocks until network comes back) so, kinda know when things are bad so don’t loose much data but can’t do anything more. With AFS, do read/writes etc. Then do a close but the network is down, close fails, loose all changes...close has a return value, close can failed, part of the POSIX standard...here close is a commit so if it succeeds or not, it matters! The API changed in a way that is not obvious which breaks things...b/c can fail on close so...programs needed to change to check for the close...POSIX but weird...hard to know what your assumptions are until they are violated...change how it works and then the mental model is potentially wrong. Close just tells the terminal you are done. Conflicts due to multiple edits to the file when the network is back.

With NFS, the way file IDs were synced...classic system called YP (Yellow Pages)...still see references for YP...a trademark so changed Network Information Service...thing for syncing the password file entry.

With AFS, the client is not trusted...must authenticate using CORBORUS ... talk to an auth server, auth to it and gives a ticket (temp key for 8 hours) and go to the network, get resources (files and mail) but the auth server just has be involved when you login to get tickets to do everything. AFS depends on you having tickets...if the tickets expire, all of a sudden cannot access anything...must renew tickets with a command or logout/login again.

AFS scalability tricks beyond open/close and caching...had volumes so can move files around in the file system without requring the data in memory...an abstraction in the file server, multiple copies of the same volume. Read only replicas and then later read/write replicas...replications and fault tolerance were added in. And a globally unique file name....AFS was cool b/c you could nav to /afs/athena.mit.edu /afs/andrew.com.edu why we don’t use AFS...b/c we use the web...everything was slow, go outside afs cell, things would break.

One of the biggest things they did not build a web-browser over it.

AFS was not easy to setup, NFS took a few seconds to setup.

AFS cool ideas but, the web took-off b/c easy to setup

AFS was a Multix like thing v.s NFS which was like Unix....when you have access control lists....look for a overly complex security system.

Literature Review:

Pick an area related to distributed OS and do a literature review of it. Scope too broadly, would have to write a book about it Wants it 10-20 pages....proper review of topic picked. Trick to pick a narrow enough topic you find interesting How to limit scope? Don’t start with high level and gather papers, wrong approach. Incomplete view of chosen topic. Too random. What to do instead? Pick a paper. Pick one paper that you think is interesting and look for things that are related to it and look from who the paper cites and grow outward. Basically, if say talk about this narrow thing...what is in common with the papers. How do you find the paper? One paper that you understand, read carefully and reasonably well cited Go through major conferences, pick paper. Cannot pick a paper looked at in class, the papers are not narrow enough, need something more specialized. Better off taking something related to what we find interesting. i.e. crypt-currencies or interested in graphics or usability...your area of comp sci.

If going to get something done by the end of the term....test is on the 17th, just before the break so, when the break comes, spend time finding papers.

Outline: an abstract, an outline and 10 references. Going from 1 to 10 is not hard if having a good paper to start with.

In cloud space...IPFS, solid etc. For distributed computation...what is the history? Where does it come from? Tell a factual story.