DistOS 2018F 2018-10-31: Difference between revisions

From Soma-notes
No edit summary
Sheldon (talk | contribs)
Line 5: Line 5:


==Notes==
==Notes==
GFS: Why do they want to store stuff?
What is it being used for?
The search engine, a copy of the web, for the crawler and the indices built from the crawler. Motivated the design. Bring in lots of data and do something with it. Crawler going and downloading everything it can file. What kind of file system? Lots of small files? No, not efficient. Had an atomic record append. Find things and put together a bunch of pages, whole data structure...a record, an application object is what did the crawler find? Atomic append, got it, saved, done. When appending it, that matters b/c it is not a byte stream, it is an application object appended one after another.
If crawling along, try to do a write, problem with the write....do it again and try a few times...how many times did you write it? Got written at least once. Record append, no garrantee that when you write something at the end, application level has to detect duplicates which might be done multiple times...makes sense in the context of a very large web crawler. Enormous files with lots of the internet in them, in one file. Large files. Inform the design of it, what is the chunk size, 64MB (designed for large web crawler, want to save everything). Lots of files and metadata...no, important b/c they did a lot of work to minimize metadata by design choice....One master server stored all metadata and you want it to be fast, cannot be the bottleneck so...loaded up with RAM etc. But run from memory, metadata lookup and then the chunk server.
Ceph has an entire cluster, dynamically allocate metadata....GFS has one server with replicas and that’s it. Seems dumb, why? Simpler, ridiculously simple...not a general purpose file system, not POSIX compatible just developed for a web crawler. 64MB and one metadata server...optimized for atomic append operations on objects that may be replicated. Why they don’t use GFS now, optimized for batch processing...build and index and make available to the search engine....but now, oriented completely towards fresh data. But GFS is an interesting design from perspective of specialized storage, design for a specific application....lots of trade offs, high performance but limited functionality. If doing a read, what are you doing? What is the process. Ask the master first what chunk you are looking for....great here is the info on the chunks....read a byte range, give the chunks then the client gets the data as fast as it can digest it. The master server is not involved most of the time. Course grain nature of the chunks....64MB, the number of data that has to be transmitted, amplification factor...leverage to be able to do relative scaling.
Writes are more complicated .... three is the magic number (3 replicas)...send out, wait for ack...might need to re-transmit until done....how big of a buffer do you need? A good size, several GB of RAM. Crawling application, use a lot of systems all writing to the same file, order in which they will be appended...who knows but order doesn’t matter...why it is a weird kind of file.
Chubby, what is chubby? A file system. Hierarchical key value store, can you store big files? No, 256k....but it’s a file system....why not use a regular file system, why chubby? Consistency, consistent distributed file system....read and write will look the same at any time. Readers and writers will see the same view. Previously, not completely consistent....b/c going to use it for locks...lock files are fine on an individual system but here, distributed file system that is completely consistent to use for locking....not easy, kinda crazy. They use Paxos....same state for everyone, how many failures can you tolerate.....5 b/c the Paxos algorithm is 2n+1 so if want to tolerate two failures, should have 5 machines....high level of reliability.
GFS when it wants to elect a master at a cell....go to chubby, find out who the master is and start talking to the master....if master is down or not responding...ask chubby, elects a new master....reliable backbone to elect who is in-charge....when master fails, switch over and transmit state to everyone quickly....chubby is how it was done.
Paxos does not scale...5 servers, go to 50...NO, chatty protocol so...what you need consistency for, can get it but must pay...so the rest of the system must be less consistent. If force consistency on everything all the time, never get performance. Paxos is the algorithm chubby used to coordinate state in the file system. Managing uncertainty....need the same state...paxos is a data structure, updates to the data structure....could always ask the current value from chubby and get the same value from any of the nodes...might have a delay...don’t know yet...wait and then when gives answer, is the correct answer.
Sequencer in Chubby: consensus....what is the ordering the states? Who goes first, second third and fourth....so if enforce ordering, will pay a price. Can do this with a lock generation number. Ordering, temporal ordering, is more difficult than consensus of state.
Hadoop ZooKeeper...the things people built after Google published...a whole set of technologies....not identical to GFS but inspired by GFS
How does Hadoop compare to AWS....amazon has own version of this stuff. Can go to Amazon and tell them to give you a bunch of Vms and can deploy Hadoop.....what AWS does, they can do it for you. What they offer S3 (cheapest storage)

Revision as of 00:11, 1 November 2018

Readings

Notes

GFS: Why do they want to store stuff?

What is it being used for?

The search engine, a copy of the web, for the crawler and the indices built from the crawler. Motivated the design. Bring in lots of data and do something with it. Crawler going and downloading everything it can file. What kind of file system? Lots of small files? No, not efficient. Had an atomic record append. Find things and put together a bunch of pages, whole data structure...a record, an application object is what did the crawler find? Atomic append, got it, saved, done. When appending it, that matters b/c it is not a byte stream, it is an application object appended one after another. If crawling along, try to do a write, problem with the write....do it again and try a few times...how many times did you write it? Got written at least once. Record append, no garrantee that when you write something at the end, application level has to detect duplicates which might be done multiple times...makes sense in the context of a very large web crawler. Enormous files with lots of the internet in them, in one file. Large files. Inform the design of it, what is the chunk size, 64MB (designed for large web crawler, want to save everything). Lots of files and metadata...no, important b/c they did a lot of work to minimize metadata by design choice....One master server stored all metadata and you want it to be fast, cannot be the bottleneck so...loaded up with RAM etc. But run from memory, metadata lookup and then the chunk server. Ceph has an entire cluster, dynamically allocate metadata....GFS has one server with replicas and that’s it. Seems dumb, why? Simpler, ridiculously simple...not a general purpose file system, not POSIX compatible just developed for a web crawler. 64MB and one metadata server...optimized for atomic append operations on objects that may be replicated. Why they don’t use GFS now, optimized for batch processing...build and index and make available to the search engine....but now, oriented completely towards fresh data. But GFS is an interesting design from perspective of specialized storage, design for a specific application....lots of trade offs, high performance but limited functionality. If doing a read, what are you doing? What is the process. Ask the master first what chunk you are looking for....great here is the info on the chunks....read a byte range, give the chunks then the client gets the data as fast as it can digest it. The master server is not involved most of the time. Course grain nature of the chunks....64MB, the number of data that has to be transmitted, amplification factor...leverage to be able to do relative scaling.

Writes are more complicated .... three is the magic number (3 replicas)...send out, wait for ack...might need to re-transmit until done....how big of a buffer do you need? A good size, several GB of RAM. Crawling application, use a lot of systems all writing to the same file, order in which they will be appended...who knows but order doesn’t matter...why it is a weird kind of file.

Chubby, what is chubby? A file system. Hierarchical key value store, can you store big files? No, 256k....but it’s a file system....why not use a regular file system, why chubby? Consistency, consistent distributed file system....read and write will look the same at any time. Readers and writers will see the same view. Previously, not completely consistent....b/c going to use it for locks...lock files are fine on an individual system but here, distributed file system that is completely consistent to use for locking....not easy, kinda crazy. They use Paxos....same state for everyone, how many failures can you tolerate.....5 b/c the Paxos algorithm is 2n+1 so if want to tolerate two failures, should have 5 machines....high level of reliability. GFS when it wants to elect a master at a cell....go to chubby, find out who the master is and start talking to the master....if master is down or not responding...ask chubby, elects a new master....reliable backbone to elect who is in-charge....when master fails, switch over and transmit state to everyone quickly....chubby is how it was done.

Paxos does not scale...5 servers, go to 50...NO, chatty protocol so...what you need consistency for, can get it but must pay...so the rest of the system must be less consistent. If force consistency on everything all the time, never get performance. Paxos is the algorithm chubby used to coordinate state in the file system. Managing uncertainty....need the same state...paxos is a data structure, updates to the data structure....could always ask the current value from chubby and get the same value from any of the nodes...might have a delay...don’t know yet...wait and then when gives answer, is the correct answer.

Sequencer in Chubby: consensus....what is the ordering the states? Who goes first, second third and fourth....so if enforce ordering, will pay a price. Can do this with a lock generation number. Ordering, temporal ordering, is more difficult than consensus of state.

Hadoop ZooKeeper...the things people built after Google published...a whole set of technologies....not identical to GFS but inspired by GFS

How does Hadoop compare to AWS....amazon has own version of this stuff. Can go to Amazon and tell them to give you a bunch of Vms and can deploy Hadoop.....what AWS does, they can do it for you. What they offer S3 (cheapest storage)