DistOS 2021F 2021-11-09

Notes

Lecture 15
----------
 - experiences
 - proposal
 - midterm update
 - participation

Spanner
 - a big, distributed (semi-)relational database
   - very consistent
 - supported SQL
   - all of the query parts
   - management, maybe not so much?
 - big deal, because of usability
   - developers know SQL
   - want transactions, helpful for consistency
     across tables

in distributed systems, we're always making tradeoffs
between functionality, scalability, and complexity
 - normally we just think about functionality vs scalability
   (SQL vs NoSQL)
 - but add complexity and you can get functionality & scalability at the same time


Spanner is proprietary to Google, others have
made their own versions (CockroachDB)


Tradeoff also shows up in Tensorflow
 - for "machine learning"
 - what is it really for?
   - working with n-dimensional arrays (i.e. tensors)
 - and we can do neural networks if we can do fast
   tensor processing

Is this just the same thing as MapReduce?
 - what's different?
   - not embarassingly parallel!
   - have to communicate between tasks as they run,
     not just at the end (i.e., during reduce)

Modern machine learning is based on large, mutable models
 - MANY parameters (weights in the neural network)

Basic idea of a neural network
 - input, hidden, and output nodes
 - input nodes are connected to layers of hidden nodes
 - hidden nodes are connected to output nodes
 - weights on connections between nodes determine
   how values are transformed as they propagate along connections between nodes

So here, take an input tensor, transform it a bunch of times until you get an output tensor

All that "deep" learning means is it is a neural network
with many, many layers of hidden nodes

The cool part about tensorflow is you don't care about the hardware
 - your data model can be efficiently mapped onto a wide
   variety of architectures
 - big change from past efforts in supercomputing