DistOS 2023W 2023-04-03: Difference between revisions
Created page with "==Notes== <pre> Spanner & Tensorflow -------------------- Last two papers! April 5th - class wrap-up discussion, exam review April 10 & 12 - project presentations Spanner - big, distributed SQL database (mostly) - at Google - compare with Bigtable, Dynamo (NoSQL systems) - what is the difference in functionality? - why does it matter? - HOW?! what is the "neat trick"? - has to do with time, but why? - to what degree is Spanner a full relational database, l..." |
|||
Line 8: | Line 8: | ||
April 5th - class wrap-up discussion, exam review | April 5th - class wrap-up discussion, exam review | ||
April 10 | April 10 - project presentations | ||
Spanner - big, distributed SQL database (mostly) | Spanner - big, distributed SQL database (mostly) | ||
Line 25: | Line 25: | ||
- how are they done in a scalable fashion? | - how are they done in a scalable fashion? | ||
- why isn't this more general? | - why isn't this more general? | ||
AFTER DISCUSSION | |||
Only presentations on April 10th, no class on April 12th | |||
If you wish to volunteer to present on April 5th, please PM me | |||
- time left for speakers on the 10th will depend on how many volunteer on the 5th | |||
- may have "lightning talks" of 3-4 min | |||
Thoughts on Spanner? | |||
So why spanner? | |||
- wanted to support more complex queries (i.e., ones involving multiple tables) | |||
- programmers are used to using SQL | |||
- and transactions | |||
- great for updates without corrupting data (i.e., partial, incomplete updates) | |||
SQL is the obvious choice for making a database | |||
- compromised to support scalability | |||
- how does Spanner do SQL & scalability? | |||
True time is just a way to get really accurate time | |||
- generally from GPS | |||
GPS turns time information into location | |||
- only works with super accurate clocks | |||
- GPS satellites are just atomic clocks that broadcast their time | |||
So anyway any computer with a GPS receiver has access to very accurate clocks | |||
So put them in servers, they have accurate time, and now we can get an absolute ordering of events | |||
Normally you can't use local clocks to determine relative order of events because of clock skew. | |||
- but with really accurate clocks, you can! | |||
Traditionally databases were the hardest part of a web app to scale | |||
- but now we really can scale them as far as we want | |||
Tensorflow | |||
parallel computing is hard | |||
- especially when communication is slow/expensive | |||
Distributed OS is largely trying to hide the complexity of parallel computing | |||
from apps | |||
- provide the right abstractions | |||
POSIX files, processes, they aren't right | |||
But append-only files, containers, and specialized computation abstractions are | |||
- mapreduce, BOINC were our first examples, required embarassingly parallel problems | |||
- tensorflow is an effort to parallelize machine learning apps in a general way | |||
- abstraction: dataflow | |||
So why tensorflow? | |||
- dataflow with tensors | |||
- tensors are just multidimensional arrays | |||
</pre> | </pre> |
Latest revision as of 17:52, 3 April 2023
Notes
Spanner & Tensorflow -------------------- Last two papers! April 5th - class wrap-up discussion, exam review April 10 - project presentations Spanner - big, distributed SQL database (mostly) - at Google - compare with Bigtable, Dynamo (NoSQL systems) - what is the difference in functionality? - why does it matter? - HOW?! what is the "neat trick"? - has to do with time, but why? - to what degree is Spanner a full relational database, like Postgres? Tensorflow - how does it compare with mapreduce? - what type of operations does it handle? - why are these operations important? - how are they done in a scalable fashion? - why isn't this more general? AFTER DISCUSSION Only presentations on April 10th, no class on April 12th If you wish to volunteer to present on April 5th, please PM me - time left for speakers on the 10th will depend on how many volunteer on the 5th - may have "lightning talks" of 3-4 min Thoughts on Spanner? So why spanner? - wanted to support more complex queries (i.e., ones involving multiple tables) - programmers are used to using SQL - and transactions - great for updates without corrupting data (i.e., partial, incomplete updates) SQL is the obvious choice for making a database - compromised to support scalability - how does Spanner do SQL & scalability? True time is just a way to get really accurate time - generally from GPS GPS turns time information into location - only works with super accurate clocks - GPS satellites are just atomic clocks that broadcast their time So anyway any computer with a GPS receiver has access to very accurate clocks So put them in servers, they have accurate time, and now we can get an absolute ordering of events Normally you can't use local clocks to determine relative order of events because of clock skew. - but with really accurate clocks, you can! Traditionally databases were the hardest part of a web app to scale - but now we really can scale them as far as we want Tensorflow parallel computing is hard - especially when communication is slow/expensive Distributed OS is largely trying to hide the complexity of parallel computing from apps - provide the right abstractions POSIX files, processes, they aren't right But append-only files, containers, and specialized computation abstractions are - mapreduce, BOINC were our first examples, required embarassingly parallel problems - tensorflow is an effort to parallelize machine learning apps in a general way - abstraction: dataflow So why tensorflow? - dataflow with tensors - tensors are just multidimensional arrays