Difference between revisions of "DistOS 2021F 2021-10-12"

Latest revision as of 22:24, 12 October 2021

Notes

Lecture 10
----------
BigTable & MapReduce

Questions

BigTable
 - single row transactions
 - sparse semi-structured data, what do they mean?

MapReduce
 - what happens when the master goes down?
 - restarting tasks seems wasteful, why do that?
 - single master, failure unlikely how?!

What's the problem MapReduce is trying to solve?
 - how to process *large* amounts of data quickly
   - really, summarize data
 - think of this as web crawl data

example problem: what's the frequency of words on the entire web (as crawled by google?)

What's the strategy?
 - map: analyze data in small chunks in parallel,
        i.e., run a program on each bit of data separately

        (example: have a program calculate word frequencies in
	 a web page)
 - reduce: combine results to summarize
    - must be an operation that doesn't depend on order
      of operations
    - get a list, produce a smaller list where every element is
      of the same type as before
    - example: combine word frequencies for individual pages
      to get aggregate word frequencies

When fully reduced, you have word frequencies for the entire web.

Can every problem be expressed as a map and a reduce?
 - no!

Can a lot of problems that people care about be expressed this way?
 - yes!

map is embarassingly parallel
  - so can be maximally sped up
  - think of this as extracting the features you care about

reduce can be slow, but normally it shouldn't be that hard
 - can't have order dependency though
 - here we process the features, but in a non-order-dependent way

Why do pieces "start over" when a node crashes during a mapreduce operation?
 - we don't have access to the local state that was computed
    - local storage is way faster for this sort of thing!
 - and even if you did, you can't trust that state
    - because it could have been messed up for a while

What about master crashing?
 - if it is a long running job, the app developer should
   do their own checkpointing
 - if it isn't, we don't care!
 - and we can generally scale the machines used
   (at a place like google) to make individual runs pretty quick

MapReduce is everywhere now
 - sometimes too much, it can be used when the paradigm
   doesn't really fit the task
    - anything that has more data dependencies, i.e. depends on
      order of execution
    
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
 - "dwarves" of parallel computation
 - they talk about what can be parallized and what can't
 - most of what we talk about in this class is embararringly parallel
    - except for the parts that aren't

BigTable
 - "database" that makes use of parallel storage, i.e., GFS
 - overall very parallel, but certain parts are not
    - what is very centralized?
    - look at Figure 4, most data is in SSTables stored in GFS, but the root isn't, it is in Chubby

Chubby gives robust config information
 - the state of BigTable at a high level
 - has to be very centralized, very available

Note that Chubby *doesn't* scale well, it only scales a bit
 - like 5 machines
 - but it scales enough to do what it needs to
 - it is the part of the state of the system that
   you can NEVER allow to be inconsistent

Parallel systems are hard because they can get partitioned in various ways and state can become inconsistent
 - and communication to maintain consistency goes up exponentially with size if done naively

Chubby is a fully connected graph amongst its nodes
 - all messages are seen by all nodes

But efficient parallel systems can't be fully connected
 - way, way too much communication

How does BigTable handle mutable state?
 - can you update arbitrary entries?  not directly!
    - data is immutable
 - if you want to change something, you have to write it to a
   new table and eventually consolidate to get rid of old data
    - a bit like a log-structured filesystem

immutability is a very powerful idea in distributed systems
 - if state can't change, you never have to communicate updates

Does this architecture have privacy implications?
 - deletes don't immediately delete data
   - modern solution: delete encryption key

@@ Line 2: / Line 2: @@
 <pre>
-Lecture 9
+Lecture 10
----------
+----------
+BigTable & MapReduce
-Plan for next week for groups
+Questions
- - generated by a script
- - posted at start of class, you'll need to
-   manually go to your assigned room
- - based on what you submit
- - 4 groups:
-   - high response grades
-   - low response grades
-   - high quiz grades
-   - low quiz grades
- - groups will be first assigned inside each group randomly and then
-   based on nearby groups (high with high, low with low)
- - will respect group exclusion list
+BigTable
+ - single row transactions
+ - sparse semi-structured data, what do they mean?
-Zookeeper & Chubby
+MapReduce
-  - why do we need these?
+  - what happens when the master goes down?
-  - and why don't we have them on individual systems, or on older
+ - restarting tasks seems wasteful, why do that?
-   distributed systems (Plan 9, NFS, LOCUS etc)
+  - single master, failure unlikely how?!
-We need them because
+What's the problem MapReduce is trying to solve?
-  - failures: hosts die, so we need more than one to run the service
+  - how to process *large* amounts of data quickly
-    - so need to be able to recover
+    - really, summarize data
-  - but really, it is *because* it is distributed that we need these complex algorithms
+  - think of this as web crawl data
-Servers A and B
+example problem: what's the frequency of words on the entire web (as crawled by google?)
-Clients X and Y
-A and B are supposed to provide access to the "same" data
+What's the strategy?
+ - map: analyze data in small chunks in parallel,
+        i.e., run a program on each bit of data separately
-Client X reads and writes to A
+        (example: have a program calculate word frequencies in
-Client Y reads and writes to B
+	 a web page)
+ - reduce: combine results to summarize
+    - must be an operation that doesn't depend on order
+      of operations
+    - get a list, produce a smaller list where every element is
+      of the same type as before
+    - example: combine word frequencies for individual pages
+      to get aggregate word frequencies
-"like networked mutex-semaphore systems"
+When fully reduced, you have word frequencies for the entire web.
-  - except there is NO SHARED MEMORY
-You want multiple hosts to behave as if they were the same host
+Can every problem be expressed as a map and a reduce?
-  - unified view of state that, externally, is always consistent
+  - no!
-With older systems, if you wanted a unified view of state, it would just go to one host
+Can a lot of problems that people care about be expressed this way?
-  - but here we need more than one, for scalability, performance, and reliability
+  - yes!
-Chubby really takes this seriously
+map is embarassingly parallel
-Zookeeper can get these sorts of guarantees but allows flexibility
+  - so can be maximally sped up
- - trade off consistency for performance
+  - think of this as extracting the features you care about
-consistency: act like copies are "the same"
+reduce can be slow, but normally it shouldn't be that hard
-  - changes are visible everywhere at the same time
+ - can't have order dependency though
+ - here we process the features, but in a non-order-dependent way
-Classic single system operating systems take shared state as a given
+Why do pieces "start over" when a node crashes during a mapreduce operation?
-  - get in for free with shared RAM
+  - we don't have access to the local state that was computed
-  - in NUMA systems (i.e., any modern system with multiple cores),
+    - local storage is way faster for this sort of thing!
-   this can take a hit, but the hardware makes it mostly true
+  - and even if you did, you can't trust that state
-   - and when it isn't we can use special CPU instructions to
+    - because it could have been messed up for a while
-     make sure it is true
-Note that Zookeeper is open source and is widely used
+What about master crashing?
-  - Chubby is google-specific
+  - if it is a long running job, the app developer should
-  - and I don't know of an open source implementation of it
+   do their own checkpointing
+  - if it isn't, we don't care!
+ - and we can generally scale the machines used
+   (at a place like google) to make individual runs pretty quick
-Hadoop project is a gathering of projects for building large-scale distributed systems
+MapReduce is everywhere now
-  - many based on papers published by Google
+ - sometimes too much, it can be used when the paradigm
-  - all developed by parties other than Google, but major internet corporations (Yahoo was one of these, but continued with Facebook etc)
+   doesn't really fit the task
+    - anything that has more data dependencies, i.e. depends on
+      order of execution
+https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf
+ - "dwarves" of parallel computation
+  - they talk about what can be parallized and what can't
+  - most of what we talk about in this class is embararringly parallel
+    - except for the parts that aren't
-Kubernetes was developed by Google
+BigTable
-  - I think they finally realized that people were developing tech
+  - "database" that makes use of parallel storage, i.e., GFS
-   outside their bubble and it meant that hires would know others'
+ - overall very parallel, but certain parts are not
-   tech, not theirs
+    - what is very centralized?
+    - look at Figure 4, most data is in SSTables stored in GFS, but the root isn't, it is in Chubby
-Note that Google doesn't use Kubernetes internally
+Chubby gives robust config information
-  - they have Borg, which we will discuss next week
+  - the state of BigTable at a high level
+ - has to be very centralized, very available
-Sometimes you need everyone to be on the same page, that's when you use these sorts of services
+Note that Chubby *doesn't* scale well, it only scales a bit
-  - not for data, but for metadata, config, etc
+  - like 5 machines
-  - they pay a huge price for consistency guarantees
+  - but it scales enough to do what it needs to
-    - limited data storage
+ - it is the part of the state of the system that
-    - limited access methods
+   you can NEVER allow to be inconsistent
-    - limited scalability (they don't scale per se, they allow
-      other systems to scale)
-Chubby turns 5 computers into 1 file server
+Parallel systems are hard because they can get partitioned in various ways and state can become inconsistent
-  - but not really more than 5
+ - and communication to maintain consistency goes up exponentially with size if done naively
-  - but 5 is enough for fault tolerance, and load is relatively low
-    because "locks" are coarse-grained
-  - workload could be served by fewer machines, but these can be
-    distributed around infrastructure
-Zookeeper is based on Zab
+Chubby is a fully connected graph amongst its nodes
-Chubby is based on Paxos
+ - all messages are seen by all nodes
-etcd is based on Raft
-Each algorithm has its tradeoffs
+But efficient parallel systems can't be fully connected
-  - importance is you need something that provides a consistent view to build larger distributed systems
+  - way, way too much communication
-  - seems to be easier if you do this as a service rather than a library
-    an application uses
+How does BigTable handle mutable state?
+ - can you update arbitrary entries?  not directly!
+    - data is immutable
+ - if you want to change something, you have to write it to a
+   new table and eventually consolidate to get rid of old data
+    - a bit like a log-structured filesystem
+immutability is a very powerful idea in distributed systems
+  - if state can't change, you never have to communicate updates
+Does this architecture have privacy implications?
+ - deletes don't immediately delete data
+    - modern solution: delete encryption key
 </pre>

Difference between revisions of "DistOS 2021F 2021-10-12"

Latest revision as of 22:24, 12 October 2021

Notes

Navigation menu

Search