DistOS 2021F 2021-09-21

Discussion Questions

Discussion questions for NFS:
What type of workloads was it designed for? How would the system appear to regular users?
To what extent did it scale?
What was their distribution strategy?
Did any of their design choices clearly limit their scalability?
How did their design choices affect semantics versus a single system (e.g., standard UNIX semantics)?
Could files be bigger than one computer could handle?
Notes

Lecture 4: NFS
--------------

What did you think of NFS?
 - stateless is cool
 - synchronous RPC calls
 - recovery strategy (lack thereof)
 - soft vs hard mounting
 - file operations abstracted from actual data manipulation
    (VFS)
 - portability
 - no direct mention of scalability
 - performance hacks (caching)
   - client-side caching is a bit hacky, but why?
     - because the client doesn't know the true state of the file,
       it just hopes the server didn't change it from underneath
       it

What happens when two clients write to the same file at the same time?
 - bad things happen! (almost anything, including data loss)

NFS is often referred to as "no file security"
 - time isn't synched
 - clients can report whatever user they want to
   - no strong user authentication on server, takes client's
     word for it
     - that's why root is mapped to nobody
 - data is unencrypted and unauthenticated

why didn't Sun implement strong encryption & authentication tech?
 - could do it using symmetric cryptography, could be quite fast
 - the real reason was export controls
 - (they intended the RPC mechanism to have strong crypto)

cryptography-related technology was export controlled, just like other military technology
 - like nuclear technology
 - you could only export weak crypto
 - this is why early web browsers used 40-bit keys
   rather than 128 bit keys

Note that modern NFS is very different (NFSv4 and up)
 - has crypto, auth
 - stateful

NFS was, and still is, the standard way for UNIX-like systems to share filesystems
 - SMB/CIFS (Samba) is a Windows technology

Stateless server was a very important decision
 - why did they choose it?

* you don't have to sync client/server state
  - so if one crashes, the other doesn't care
* less server-side resources (RAM requirements) to keep
  track of clients
  - helps with scalability (handling more clients)
* simpler server

Do we see (somewhat) stateless servers in modern distributed systems?
 - yes, the web!
 - http is stateless, https has session information but is still *mostly* stateless

Modern practice is to make as much of your stack stateless, and centralize state in a database (that can be made separately scalable)
 - but minimize database accesses as that can impact performance
 - note stateless allows for caching (proxy servers/content distribution networks)

UNIX wasn't picky about state
 - kernel keeps track of lots of state about processes
 - caused problems when we tried to distribute processes & files
   - the "disgusting" file staying open after unlink

So we have to be very careful about how we manage state when we go distributed

How do you scale an NFS-based system?
 - need more than one server to handle load
 - classic way is to split up filesystem into multiple ones, export them separately
   - an automounter would often be used to mount volumes as necessary

/nfs/users/soma <-- this would be mounted when I went into
                    this directory

(/nfs/users might be mounted, or /nfs/users/soma might be mounted,
depending upon how big it was)

A classic strategy would be to break up further

/nfs/users/s/soma
  - can split up the filesystem by hierarchy

This is almost like managing memory before virtual memory
 - manual partitions & overlays

What happens if a server doesn't reply to a client?
 - could get a hang (hard mount) or a timeout (soft mount)

NFS works, but it really isn't scalable.  You need a pretty different design to scale

(But note that http isn't scalable on its own, but we figured out ways to make it scale because it was stateless.  So NFS could have been made scalable but there was never the incentive, too many other issues.)

NFS never became universal in part because it was trying so hard to be UNIX-like
 - note that http isn't copying anything, it is its own thing

Modern NFS can be made scalable, and how do I know?
 - Amazon does it!
 - but I don't think their solutions are standard?

When we start looking at other remote/distributed filesystems,
be sure to consider
 - how stateful?
 - how UNIX-like?
 - how are errors/failures treated?

First experience should come out next week
 - we'll update the due dates
 - playing with kubernetes in a very simple way