DistOS 2018F 2018-11-07

Readings

Dean & Ghemawat, "MapReduce: Simplified Data Processing on Large Clusters" (OSDI 2004)
Martin Abadi et al., "TensorFlow: A System for Large-Scale Machine Learning" (OSDI 2016)

Notes

In-Class Lecture notes:

MapReduce:

Parallel computations Some things are inherently serial so, with computation, some things will take the same time regardless of cores or processors. Palatalization hard, some amount of coordination between the systems to allow them to work together. Systems that scale to large number of cores, systems. The ones that minimize coordination succeed. For instance, GFS, they did a lot of things to reduce coordination to allow it to scale....does cause a bit of a mess but to clean up the mess would slow things down. Models of computation that take advantage of these systems, MapReduce is the cleanest analysis. What sort of analysis do they talk about doing. Indexing, counting across things, grep....a search engine is just a big grep. Why not run grep individually on all the computers, why do you need the framework? Coordinating and consolidating the results of the machine. All MapReduce is, pieces of data computations done and then combine together. Function programming aspect....stateless...don’t maintain state. The state of variables does not change (only binding can change). In a parallel system, coordinating maintaining state. If no state, don’t need to coordinate. Map & Reduce...if stateful, might have to undo computations if they mess up, side-affects. Could not run the code over and over again on the same data. If made purely functional, the answer will be the same no-matter how many times it has run. Fault tolerance. Duplicating work to make sure the overall computation finishes on time. Run multiple times to ensure the same answers. Aspects of fault tolerance show up when not doing functional programming. 10000 machines to run a job, computers can fail during computation and you don’t care b/c we don’t care about maintaining state....master server gathers together, combining function to get results. When you do a search, everything you get has been pre-computed.

MapReduce is an ad-hoc kinda that is not used anymore due to fundamental limitations but is the correct paradigm. Limitation: problem has to fit into the map and reduce...only coordination on the reduce side, no coordination on the map side. TensorFlow is not embarrassingly parallel at all. Have to worry about interactions. What is the model to break them up. Unit is a Tensor. A multidimensional array. Lot of mathematics that are defined on multidimensional arrays. TensorFlow will breakup computation into tensors. Functional programming in a sense but in a different context. Doing mathematics, if can define the math at a high level, the underlying system can refactor it at a high level. How it fits together, not our problem....divided up into Tensors and math ops on Tensors and parallelism.

AI is primary task for TensorFlow. Why not do other things like this? Large-scale simulations, other use case for clusters. Are maintaining large amount of state, in those systems, do partitioning based on state. Game world...game parallelization. Divide the world into different places, going from one world to another could be going from one server to another.

Why do we need insane computations for doing AI....optimizing it and AI algorithms today are very stupid. Need lots and lots of data with lots and lots of samples. The learning is not advanced the pattern abstraction is almost brute force. How many miles AI driven cars have driven...to train a single model. How many miles does a person have to drive to learn how to drive.....driving is a task based on a world model we have been building our whole lives. Self driving cars, have to teach them how the world works...don’t have a world model. Can you learn how the world works from driving a car.

Check-Pointing in MapReduce...why not?....failures that happen during the computation don’t matter, just restart and do that part....but TensorFlow cares a lot more about reliability thing. The master, how long does a MapReduce job run? Not long, a few minutes to an hour tops. Parallelism has made everything quick. What kind of neural net is TensorFlow training? Facial recognition, recommendations, translations...models running for a long time with TBs of input for days or weeks...lots of state being created of the entire neural networks and save along the line so implemented check-pointing. What do you care about saving? b/c saving all the state is expensive...save every hour and results that are good when neural network is doing well.

Genetic algorithms...the next wave in AI.

Fitness function, some layers are garbage, some have good fitness and then combine them together (using a operation called crossover)...mutation, flip some bits to reproduce solutions and then do it again...recompute fitness...an abstraction of natural evolution.

More notes

Systems that scale well minimize coordination. Recall that lots of coordination means lots of serializability and less performance.
- in GFS we don't know how many times appended
MapReduce is for embarrassingly parallel architectures.
We are interested in MODELS OF COMPUTATION that take advantage of massive parallelization.
MapReduce is for indexing, for counting. Search engine is like a big grep, but why not use Grep? Because you need coordination to take advantage of thousands of computers.
Functional programming is stateless. If x=5, then x always equals 5, you can only create a new x. We always pay for coordinating state.
Because MapReduce has no state you don't have to worry about side effects, if a machine flakes just do it on another, no state to manage. Inherent fault tolerance!
Larger scale programming will probably be done with functional programming. MapReduce allows you to run a job on thousands of computers.
count # of times argentina occurs in a lot of pages, voila 50 million times it appears. But when you query google you are not starting a mapreduce job, everything has been precompued.
MapReduce has fundamental limitations, problem has to fit into MapReduce paradigm. Cant do equations between a Mapping.
- TensorFlow allows you to do this, TensorFlow is NOT for embarrassingly parallel computation at all.
Fundamental data in TensorFlow is a multidimensional array, matrix is a 2d array, tensor is multidimensional.
Tensors can be enormous, so TensorFlow will break up the tensor and combine them together, can do this through mathematical equivalences.
A lot of AI can be represented as Tensor operations. Large scale simulations do not map nicely to Tensors. In large scale computations you need state. Need some partitioning based on space, ex. In weather simulation you divide into cubes of 1km or something like that. Same thing in a game world how you go from 1 server to another when you change place.
AI algorithms today are stupid and require lots of training, learning is not sophisticated. Think about how many miles AI cars have driven. People only had to drive a few. Humans have been building a world model since children, we are just slightly adding to it with a car. You can not learn the way the world works by driving a car. Toddler learns physics as they grow up. You need a lot of data and computation.
No checkpoint in MapReduce since you can just redo it and jobs are short.
In TensorFlow you require checkpoints. Models in TensorFlow run for days, lots of state being created. Checkpoints implemented, under control of application, dont want to save too often, just keep the neural nets that do well.
Anil believes that future of AI is in Genetic algorithms.
Neural networks:
- In neural networks you have a lot of weights, and you backpropagate to adjust weights.
- DeepLearning is with lots of layers.
- Need to encode data into the input layers.
A genetic algorithm starts with random population and fitness function. Successful ones get combined together and mutation sticked to it, then you further refine it.
Most of AI is searching through space.