DistOS 2021F 2021-10-07: Difference between revisions

Latest revision as of 02:23, 13 October 2021

Notes

Lecture 9
---------

Plan for next week for groups
 - generated by a script
 - posted at start of class, you'll need to
   manually go to your assigned room
 - based on what you submit
 - 4 groups:
   - high response grades
   - low response grades
   - high quiz grades
   - low quiz grades
 - groups will be first assigned inside each group randomly and then
   based on nearby groups (high with high, low with low)
 - will respect group exclusion list


Zookeeper & Chubby
 - why do we need these?
 - and why don't we have them on individual systems, or on older
   distributed systems (Plan 9, NFS, LOCUS etc)

We need them because
 - failures: hosts die, so we need more than one to run the service
   - so need to be able to recover
 - but really, it is *because* it is distributed that we need these complex algorithms

Servers A and B
Clients X and Y

A and B are supposed to provide access to the "same" data

Client X reads and writes to A
Client Y reads and writes to B

"like networked mutex-semaphore systems"
  - except there is NO SHARED MEMORY

You want multiple hosts to behave as if they were the same host
 - unified view of state that, externally, is always consistent

With older systems, if you wanted a unified view of state, it would just go to one host
 - but here we need more than one, for scalability, performance, and reliability

Chubby really takes this seriously
Zookeeper can get these sorts of guarantees but allows flexibility
 - trade off consistency for performance

consistency: act like copies are "the same"
  - changes are visible everywhere at the same time

Classic single system operating systems take shared state as a given
 - get in for free with shared RAM
 - in NUMA systems (i.e., any modern system with multiple cores),
   this can take a hit, but the hardware makes it mostly true
   - and when it isn't we can use special CPU instructions to
     make sure it is true

Note that Zookeeper is open source and is widely used
 - Chubby is google-specific
 - and I don't know of an open source implementation of it

Hadoop project is a gathering of projects for building large-scale distributed systems
 - many based on papers published by Google
 - all developed by parties other than Google, but major internet corporations (Yahoo was one of these, but continued with Facebook etc)

Kubernetes was developed by Google
 - I think they finally realized that people were developing tech
   outside their bubble and it meant that hires would know others'
   tech, not theirs

Note that Google doesn't use Kubernetes internally
 - they have Borg, which we will discuss next week

Sometimes you need everyone to be on the same page, that's when you use these sorts of services
 - not for data, but for metadata, config, etc
 - they pay a huge price for consistency guarantees
    - limited data storage
    - limited access methods
    - limited scalability (they don't scale per se, they allow
      other systems to scale)

Chubby turns 5 computers into 1 file server
  - but not really more than 5
  - but 5 is enough for fault tolerance, and load is relatively low
    because "locks" are coarse-grained
  - workload could be served by fewer machines, but these can be
    distributed around infrastructure

Zookeeper is based on Zab
Chubby is based on Paxos
etcd is based on Raft

Each algorithm has its tradeoffs
 - importance is you need something that provides a consistent view to build larger distributed systems
 - seems to be easier if you do this as a service rather than a library
   an application uses

@@ Line 2: / Line 2: @@
 <pre>
-Lecture 8: NASD & GFS
+Lecture 9
----------------------
+---------
-Questions?
+Plan for next week for groups
-  - is NASD a NAS?
+  - generated by a script
-  - how cost efficient?  NASD or GFS?
+  - posted at start of class, you'll need to
-  - having the file server out of the loop, security issues with
+   manually go to your assigned room
-    NASD?
+  - based on what you submit
- - checkpoint system?
+  - 4 groups:
- - NASD in use?
+    - high response grades
-  - why just kill chunkservers?  Not shut down?
+   - low response grades
-  - GFS file security?
+   - high quiz grades
+   - low quiz grades
+  - groups will be first assigned inside each group randomly and then
+   based on nearby groups (high with high, low with low)
+  - will respect group exclusion list
-NAS is just a file server
- - dedicated, but a file server
- - generally use standard network file sharing protocols
-   (CIFS, NFS)
-NASD is a different beast
+Zookeeper & Chubby
-  - disks are object servers, not file servers
+  - why do we need these?
-    - objects are just variable-sized chunks of data + metadata
+ - and why don't we have them on individual systems, or on older
-     (no code)
+    distributed systems (Plan 9, NFS, LOCUS etc)
-   - contrast with blocks
-With object-based distributed filesystems, we've added a level of indirection
+We need them because
- - file server translates files to sets of objects, handle file
+ - failures: hosts die, so we need more than one to run the service
-   metadata
+   - so need to be able to recover
-  - object servers store objects
+  - but really, it is *because* it is distributed that we need these complex algorithms
-Why add this level of indirection?  Why not just use fixed-sized blocks?
+Servers A and B
+Clients X and Y
-(In GFS, instead of objects we have chunks, bit less metadata)
+A and B are supposed to provide access to the "same" data
+Client X reads and writes to A
+Client Y reads and writes to B
-Objects are all about parallel access
+"like networked mutex-semaphore systems"
- - to enable performance
+  - except there is NO SHARED MEMORY
-Client can ask for objects from multiple object servers at once
+You want multiple hosts to behave as if they were the same host
-  - file server doesn't have to be involved at all
+  - unified view of state that, externally, is always consistent
-The classic way we did redundancy & reliability in storage is with RAID
+With older systems, if you wanted a unified view of state, it would just go to one host
+ - but here we need more than one, for scalability, performance, and reliability
-Sounds like most of you haven't used RAID
+Chubby really takes this seriously
- - Redundant Array of Inexpensive/Independent Disks
+Zookeeper can get these sorts of guarantees but allows flexibility
-  - idea is to combine multiple drives together to get
+  - trade off consistency for performance
-   more, higher performance, more reliable storage
-RAID-0: striping
+consistency: act like copies are "the same"
-RAID-1: mirroring
+  - changes are visible everywhere at the same time
-RAID-5: striping + parity
-With RAID, data is distributed across disks at the block level
+Classic single system operating systems take shared state as a given
- - drives have no notion of files, just blocks
+ - get in for free with shared RAM
+ - in NUMA systems (i.e., any modern system with multiple cores),
+   this can take a hit, but the hardware makes it mostly true
+   - and when it isn't we can use special CPU instructions to
+     make sure it is true
-The modern insight with distributed storage is distributing at the block layer is too low level
+Note that Zookeeper is open source and is widely used
-  - better to distribute bigger chunks, like objects!
+  - Chubby is google-specific
+ - and I don't know of an open source implementation of it
-Read objects in parallel, rather than blocks
+Hadoop project is a gathering of projects for building large-scale distributed systems
-  - files are big, so feasible to read multiple objects in parallel
+ - many based on papers published by Google
+  - all developed by parties other than Google, but major internet corporations (Yahoo was one of these, but continued with Facebook etc)
-We do "mirroring" with objects/chunks, i.e. have multiple copies
+Kubernetes was developed by Google
-  - parity/erasure codes mostly not worth the effort for
+  - I think they finally realized that people were developing tech
-    these systems (but later systems will use such things)
+   outside their bubble and it meant that hires would know others'
+    tech, not theirs
-Security
+Note that Google doesn't use Kubernetes internally
-  - NASD security?  How can clients securely access
+  - they have Borg, which we will discuss next week
-   individual drives?
-In Linux (POSIX) capabilities are a way to split up root access
+Sometimes you need everyone to be on the same page, that's when you use these sorts of services
-  - but that is actually not the "normal" meaning of capabilities
+  - not for data, but for metadata, config, etc
-   in a security context
+ - they pay a huge price for consistency guarantees
+    - limited data storage
+    - limited access methods
+    - limited scalability (they don't scale per se, they allow
+      other systems to scale)
-Capabilities are tokens a process can present to a service to enable access
+Chubby turns 5 computers into 1 file server
- - separate authentication server gives out capability tokens
+  - but not really more than 5
- - idea is the authentication server doesn't have to check
+  - but 5 is enough for fault tolerance, and load is relatively low
-   when access is done, it can be done in advance
+    because "locks" are coarse-grained
+  - workload could be served by fewer machines, but these can be
+    distributed around infrastructure
-With capabilities, the drives can control access without
+Zookeeper is based on Zab
-needing to understand about users, groups, etc
+Chubby is based on Paxos
- - it just has to understand the tokens, have a way to
+etcd is based on Raft
-   verify them
- - make sure the tokens can't be faked!
-Most single sign on systems tend to have some sort of capability-like token underneath if they are really distributed
+Each algorithm has its tradeoffs
+  - importance is you need something that provides a consistent view to build larger distributed systems
-Note that capability tokens are ephemeral
+  - seems to be easier if you do this as a service rather than a library
-  - normally expire after a relatively short period of time (minutes or hours)
+    an application uses
-    - needed to prevent replay attacks
-Imagine having 10,000 storage servers and one authentication server
-  - if auth server had to be involved in every file access,
-   would become a bottleneck
- - but with capabilities it can issue them at a much slower rate
-   and sit back while mass data transfers happen
-Capabilities are at the heart of NASD
-What about GFS?
- - nope, assumes a trusted data center
- - I think it has UNIX-like file permissions, but
-    nothing fancy
-    - just to prevent accidental file damage
-What was GFS for?
- - building a search engine
- - i.e., downloading and indexing the entire web!
-   - data comes in from crawlers
-   - indices built as batch jobs
-Are GFS files regular files?
- - they are weird because they are sets of records
-   - records can be duplicated, must have unique id's
- - record, think web page
-   - have to account for crawler messing up and
-     downloading same info multiple times
-     (i.e., if the crawler had a hardware or
-      software fault)
 </pre>