Difference between revisions of "DistOS 2021F 2021-10-07"

From Soma-notes
Jump to navigation Jump to search
(Created page with "==Notes== <pre> Lecture 8: NASD & GFS --------------------- Questions? - is NASD a NAS? - how cost efficient? NASD or GFS? - having the file server out of the loop, secu...")
 
 
Line 2: Line 2:


<pre>
<pre>
Lecture 8: NASD & GFS
Lecture 9
---------------------
---------


Questions?
Plan for next week for groups
  - is NASD a NAS?
  - generated by a script
  - how cost efficient? NASD or GFS?
  - posted at start of class, you'll need to
  - having the file server out of the loop, security issues with
  manually go to your assigned room
   NASD?
  - based on what you submit
- checkpoint system?
  - 4 groups:
- NASD in use?
   - high response grades
  - why just kill chunkservers?  Not shut down?
  - low response grades
  - GFS file security?
  - high quiz grades
  - low quiz grades
  - groups will be first assigned inside each group randomly and then
  based on nearby groups (high with high, low with low)
  - will respect group exclusion list


NAS is just a file server
- dedicated, but a file server
- generally use standard network file sharing protocols
  (CIFS, NFS)


NASD is a different beast
Zookeeper & Chubby
  - disks are object servers, not file servers
  - why do we need these?
   - objects are just variable-sized chunks of data + metadata
- and why don't we have them on individual systems, or on older
    (no code)
   distributed systems (Plan 9, NFS, LOCUS etc)
  - contrast with blocks


With object-based distributed filesystems, we've added a level of indirection
We need them because
- file server translates files to sets of objects, handle file
- failures: hosts die, so we need more than one to run the service
  metadata
  - so need to be able to recover
  - object servers store objects
  - but really, it is *because* it is distributed that we need these complex algorithms


Why add this level of indirection?  Why not just use fixed-sized blocks?
Servers A and B
Clients X and Y


(In GFS, instead of objects we have chunks, bit less metadata)
A and B are supposed to provide access to the "same" data


Client X reads and writes to A
Client Y reads and writes to B


Objects are all about parallel access
"like networked mutex-semaphore systems"
- to enable performance
  - except there is NO SHARED MEMORY


Client can ask for objects from multiple object servers at once
You want multiple hosts to behave as if they were the same host
  - file server doesn't have to be involved at all
  - unified view of state that, externally, is always consistent


The classic way we did redundancy & reliability in storage is with RAID
With older systems, if you wanted a unified view of state, it would just go to one host
- but here we need more than one, for scalability, performance, and reliability


Sounds like most of you haven't used RAID
Chubby really takes this seriously
- Redundant Array of Inexpensive/Independent Disks
Zookeeper can get these sorts of guarantees but allows flexibility
  - idea is to combine multiple drives together to get
  - trade off consistency for performance
  more, higher performance, more reliable storage


RAID-0: striping
consistency: act like copies are "the same"
RAID-1: mirroring
  - changes are visible everywhere at the same time
RAID-5: striping + parity


With RAID, data is distributed across disks at the block level
Classic single system operating systems take shared state as a given
- drives have no notion of files, just blocks
- get in for free with shared RAM
- in NUMA systems (i.e., any modern system with multiple cores),
  this can take a hit, but the hardware makes it mostly true
  - and when it isn't we can use special CPU instructions to
    make sure it is true


The modern insight with distributed storage is distributing at the block layer is too low level
Note that Zookeeper is open source and is widely used
  - better to distribute bigger chunks, like objects!
  - Chubby is google-specific
- and I don't know of an open source implementation of it


Read objects in parallel, rather than blocks
Hadoop project is a gathering of projects for building large-scale distributed systems
  - files are big, so feasible to read multiple objects in parallel
- many based on papers published by Google
  - all developed by parties other than Google, but major internet corporations (Yahoo was one of these, but continued with Facebook etc)


We do "mirroring" with objects/chunks, i.e. have multiple copies
Kubernetes was developed by Google
  - parity/erasure codes mostly not worth the effort for
  - I think they finally realized that people were developing tech
   these systems (but later systems will use such things)
  outside their bubble and it meant that hires would know others'
   tech, not theirs


Security
Note that Google doesn't use Kubernetes internally
  - NASD security?  How can clients securely access
  - they have Borg, which we will discuss next week
  individual drives?


In Linux (POSIX) capabilities are a way to split up root access
Sometimes you need everyone to be on the same page, that's when you use these sorts of services
  - but that is actually not the "normal" meaning of capabilities
  - not for data, but for metadata, config, etc
  in a security context
- they pay a huge price for consistency guarantees
    - limited data storage
    - limited access methods
    - limited scalability (they don't scale per se, they allow
      other systems to scale)


Capabilities are tokens a process can present to a service to enable access
Chubby turns 5 computers into 1 file server
- separate authentication server gives out capability tokens
  - but not really more than 5
- idea is the authentication server doesn't have to check
  - but 5 is enough for fault tolerance, and load is relatively low
  when access is done, it can be done in advance
    because "locks" are coarse-grained
  - workload could be served by fewer machines, but these can be
    distributed around infrastructure


With capabilities, the drives can control access without
Zookeeper is based on Zab
needing to understand about users, groups, etc
Chubby is based on Paxos
- it just has to understand the tokens, have a way to
etcd is based on Raft
  verify them
- make sure the tokens can't be faked!


Most single sign on systems tend to have some sort of capability-like token underneath if they are really distributed
Each algorithm has its tradeoffs
 
  - importance is you need something that provides a consistent view to build larger distributed systems
Note that capability tokens are ephemeral
  - seems to be easier if you do this as a service rather than a library
  - normally expire after a relatively short period of time (minutes or hours)
   an application uses
    - needed to prevent replay attacks
 
Imagine having 10,000 storage servers and one authentication server
  - if auth server had to be involved in every file access,
  would become a bottleneck
- but with capabilities it can issue them at a much slower rate
  and sit back while mass data transfers happen
 
Capabilities are at the heart of NASD
 
What about GFS?
- nope, assumes a trusted data center
- I think it has UNIX-like file permissions, but
   nothing fancy
    - just to prevent accidental file damage
 
What was GFS for?
- building a search engine
- i.e., downloading and indexing the entire web!
  - data comes in from crawlers
  - indices built as batch jobs
 
Are GFS files regular files?
- they are weird because they are sets of records
  - records can be duplicated, must have unique id's
- record, think web page
  - have to account for crawler messing up and
    downloading same info multiple times
    (i.e., if the crawler had a hardware or
      software fault)
</pre>
</pre>

Latest revision as of 22:23, 12 October 2021

Notes

Lecture 9
---------

Plan for next week for groups
 - generated by a script
 - posted at start of class, you'll need to
   manually go to your assigned room
 - based on what you submit
 - 4 groups:
   - high response grades
   - low response grades
   - high quiz grades
   - low quiz grades
 - groups will be first assigned inside each group randomly and then
   based on nearby groups (high with high, low with low)
 - will respect group exclusion list


Zookeeper & Chubby
 - why do we need these?
 - and why don't we have them on individual systems, or on older
   distributed systems (Plan 9, NFS, LOCUS etc)

We need them because
 - failures: hosts die, so we need more than one to run the service
   - so need to be able to recover
 - but really, it is *because* it is distributed that we need these complex algorithms

Servers A and B
Clients X and Y

A and B are supposed to provide access to the "same" data

Client X reads and writes to A
Client Y reads and writes to B

"like networked mutex-semaphore systems"
  - except there is NO SHARED MEMORY

You want multiple hosts to behave as if they were the same host
 - unified view of state that, externally, is always consistent

With older systems, if you wanted a unified view of state, it would just go to one host
 - but here we need more than one, for scalability, performance, and reliability

Chubby really takes this seriously
Zookeeper can get these sorts of guarantees but allows flexibility
 - trade off consistency for performance

consistency: act like copies are "the same"
  - changes are visible everywhere at the same time

Classic single system operating systems take shared state as a given
 - get in for free with shared RAM
 - in NUMA systems (i.e., any modern system with multiple cores),
   this can take a hit, but the hardware makes it mostly true
   - and when it isn't we can use special CPU instructions to
     make sure it is true

Note that Zookeeper is open source and is widely used
 - Chubby is google-specific
 - and I don't know of an open source implementation of it

Hadoop project is a gathering of projects for building large-scale distributed systems
 - many based on papers published by Google
 - all developed by parties other than Google, but major internet corporations (Yahoo was one of these, but continued with Facebook etc)

Kubernetes was developed by Google
 - I think they finally realized that people were developing tech
   outside their bubble and it meant that hires would know others'
   tech, not theirs

Note that Google doesn't use Kubernetes internally
 - they have Borg, which we will discuss next week

Sometimes you need everyone to be on the same page, that's when you use these sorts of services
 - not for data, but for metadata, config, etc
 - they pay a huge price for consistency guarantees
    - limited data storage
    - limited access methods
    - limited scalability (they don't scale per se, they allow
      other systems to scale)

Chubby turns 5 computers into 1 file server
  - but not really more than 5
  - but 5 is enough for fault tolerance, and load is relatively low
    because "locks" are coarse-grained
  - workload could be served by fewer machines, but these can be
    distributed around infrastructure

Zookeeper is based on Zab
Chubby is based on Paxos
etcd is based on Raft

Each algorithm has its tradeoffs
 - importance is you need something that provides a consistent view to build larger distributed systems
 - seems to be easier if you do this as a service rather than a library
   an application uses