DistOS 2018F 2018-11-07: Difference between revisions
|  Created page with "==Readings==  * [http://research.google.com/archive/mapreduce.html Dean & Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters" (OSDI 2004)] * [https://www.useni..." | |||
| Line 5: | Line 5: | ||
| ==Notes== | ==Notes== | ||
| In-Class Lecture notes: | |||
| MapReduce: | |||
| Parallel computations | |||
| Some things are inherently serial so, with computation, some things will take the same time regardless of cores or processors. | |||
| Palatalization hard, some amount of coordination between the systems to allow them to work together. | |||
| Systems that scale to large number of cores, systems. The ones that minimize coordination succeed.  | |||
| For instance, GFS, they did a lot of things to reduce coordination to allow it to scale....does cause a bit of a mess but to clean up the mess would slow things down. | |||
| Models of computation that take advantage of these systems, MapReduce is the cleanest analysis. What sort of analysis do they talk about doing. Indexing, counting across things, grep....a search engine is just a big grep.  | |||
| Why not run grep individually on all the computers, why do you need the framework? | |||
| Coordinating and consolidating the results of the machine. All MapReduce is, pieces of data computations done and then combine together. | |||
| Function programming aspect....stateless...don’t maintain state. The state of variables does not change (only binding can change). In a parallel system, coordinating maintaining state. If no state, don’t need to coordinate. Map & Reduce...if stateful, might have to undo computations if they mess up, side-affects. Could not run the code over and over again on the same data. If made purely functional, the answer will be the same no-matter how many times it has run. Fault tolerance. Duplicating work to make sure the overall computation finishes on time. Run multiple times to ensure the same answers. Aspects of fault tolerance show up when not doing functional programming. 10000 machines to run a job, computers can fail during computation and you don’t care b/c we don’t care about maintaining state....master server gathers together, combining function to get results. When you do a search, everything you get has been pre-computed. | |||
| MapReduce is an ad-hoc kinda that is not used anymore due to fundamental limitations but is the correct paradigm. Limitation: problem has to fit into the map and reduce...only coordination on the reduce side, no coordination on the map side. TensorFlow is not embarrassingly parallel at all. Have to worry about interactions. What is the model to break them up. Unit is a Tensor. A multidimensional array. Lot of mathematics that are defined on multidimensional arrays. TensorFlow will breakup computation into tensors. Functional programming in a sense but in a different context. Doing mathematics, if can define the math at a high level, the underlying system can refactor it at a high level. How it fits together, not our problem....divided up into Tensors and math ops on Tensors and parallelism. | |||
| AI is primary task for TensorFlow. Why not do other things like this? Large-scale simulations, other use case for clusters. Are maintaining large amount of state, in those systems, do partitioning based on state.  | |||
| Game world...game parallelization. Divide the world into different places, going from one world to another could be going from one server to another.  | |||
| Why do we need insane computations for doing AI....optimizing it and AI algorithms today are very stupid. Need lots and lots of data with lots and lots of samples. The learning is not advanced the pattern abstraction is almost brute force. How many miles AI driven cars have driven...to train a single model. How many miles does a person have to drive to learn how to drive.....driving is a task based on a world model we have been building our whole lives. Self driving cars, have to teach them how the world works...don’t have a world model. Can you learn how the world works from driving a car.  | |||
| Check-Pointing in MapReduce...why not?....failures that happen during the computation don’t matter, just restart and do that part....but TensorFlow cares a lot more about reliability thing.  | |||
| The master, how long does a MapReduce job run? Not long, a few minutes to an hour tops. Parallelism has made everything quick. What kind of neural net is TensorFlow training? Facial recognition, recommendations, translations...models running for a long time with TBs of input for days or weeks...lots of state being created of the entire neural networks and save along the line so implemented check-pointing. What do you care about saving? b/c saving all the state is expensive...save every hour and results  that are good when neural network is doing well.  | |||
| Genetic algorithms...the next wave in AI.  | |||
| Fitness function, some layers are garbage, some have good fitness and then combine them together (using a operation called crossover)...mutation, flip some bits to reproduce solutions and then do it again...recompute fitness...an abstraction of natural evolution. | |||
Revision as of 02:56, 9 November 2018
Readings
- Dean & Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters" (OSDI 2004)
- Martin Abadi et al., "TensorFlow: A System for Large-Scale Machine Learning" (OSDI 2016)
Notes
In-Class Lecture notes:
MapReduce:
Parallel computations Some things are inherently serial so, with computation, some things will take the same time regardless of cores or processors. Palatalization hard, some amount of coordination between the systems to allow them to work together. Systems that scale to large number of cores, systems. The ones that minimize coordination succeed. For instance, GFS, they did a lot of things to reduce coordination to allow it to scale....does cause a bit of a mess but to clean up the mess would slow things down. Models of computation that take advantage of these systems, MapReduce is the cleanest analysis. What sort of analysis do they talk about doing. Indexing, counting across things, grep....a search engine is just a big grep. Why not run grep individually on all the computers, why do you need the framework? Coordinating and consolidating the results of the machine. All MapReduce is, pieces of data computations done and then combine together. Function programming aspect....stateless...don’t maintain state. The state of variables does not change (only binding can change). In a parallel system, coordinating maintaining state. If no state, don’t need to coordinate. Map & Reduce...if stateful, might have to undo computations if they mess up, side-affects. Could not run the code over and over again on the same data. If made purely functional, the answer will be the same no-matter how many times it has run. Fault tolerance. Duplicating work to make sure the overall computation finishes on time. Run multiple times to ensure the same answers. Aspects of fault tolerance show up when not doing functional programming. 10000 machines to run a job, computers can fail during computation and you don’t care b/c we don’t care about maintaining state....master server gathers together, combining function to get results. When you do a search, everything you get has been pre-computed.
MapReduce is an ad-hoc kinda that is not used anymore due to fundamental limitations but is the correct paradigm. Limitation: problem has to fit into the map and reduce...only coordination on the reduce side, no coordination on the map side. TensorFlow is not embarrassingly parallel at all. Have to worry about interactions. What is the model to break them up. Unit is a Tensor. A multidimensional array. Lot of mathematics that are defined on multidimensional arrays. TensorFlow will breakup computation into tensors. Functional programming in a sense but in a different context. Doing mathematics, if can define the math at a high level, the underlying system can refactor it at a high level. How it fits together, not our problem....divided up into Tensors and math ops on Tensors and parallelism.
AI is primary task for TensorFlow. Why not do other things like this? Large-scale simulations, other use case for clusters. Are maintaining large amount of state, in those systems, do partitioning based on state. Game world...game parallelization. Divide the world into different places, going from one world to another could be going from one server to another.
Why do we need insane computations for doing AI....optimizing it and AI algorithms today are very stupid. Need lots and lots of data with lots and lots of samples. The learning is not advanced the pattern abstraction is almost brute force. How many miles AI driven cars have driven...to train a single model. How many miles does a person have to drive to learn how to drive.....driving is a task based on a world model we have been building our whole lives. Self driving cars, have to teach them how the world works...don’t have a world model. Can you learn how the world works from driving a car.
Check-Pointing in MapReduce...why not?....failures that happen during the computation don’t matter, just restart and do that part....but TensorFlow cares a lot more about reliability thing. The master, how long does a MapReduce job run? Not long, a few minutes to an hour tops. Parallelism has made everything quick. What kind of neural net is TensorFlow training? Facial recognition, recommendations, translations...models running for a long time with TBs of input for days or weeks...lots of state being created of the entire neural networks and save along the line so implemented check-pointing. What do you care about saving? b/c saving all the state is expensive...save every hour and results that are good when neural network is doing well.
Genetic algorithms...the next wave in AI.
Fitness function, some layers are garbage, some have good fitness and then combine them together (using a operation called crossover)...mutation, flip some bits to reproduce solutions and then do it again...recompute fitness...an abstraction of natural evolution.