DistOS 2018F 2018-11-14: Difference between revisions
Created page with "==Readings== * [http://static.usenix.org/legacy/events/osdi10/tech/full_papers/Beaver.pdf Beaver et al., "Finding a needle in Haystack: Facebook’s photo storage" (OSDI 2010..." |
|||
(2 intermediate revisions by one other user not shown) | |||
Line 5: | Line 5: | ||
==Notes== | ==Notes== | ||
Lecture Nov 14: Haystack & F4 | |||
CDN (Canadian Dairy Network), when talking about Haystack, talking about CDNs. What is a CDN...Content Distribution Network. Large-scale cache for data. The idea is you have servers replicated around the world. Servers close to people and the things people want...instead of going to main server, go to local server and get a copy from there. Pioneer Akamai .... folks from MIT....models that became content distribution networks necessary for large scale systems. Reduces latency of page loads. Do not serve entire website, CDNs are bad at serving dynamic content such as email...a web app is not in a CDN b/c it makes no sense...no-one should be asking for the same email, should not be replicated....what you see is specific to you does not make sense. Only makes sense if replicated across multiple page views. Code on client is going to have to get custom data, that is where a CDN does not work. See Figure 3...to CDN and to Haystack storage. Original solution....photos are the problem...original solution was NFS....this sucks but why? Bad performance....too many disk accesses why? Metadata...having to go access the iNode and then the actual contents of the file was too much...for a normal file system, of course you have to separate the metadata from the data....metadata has different access patterns, did not make sense for Facebook...didn’t want separate reads for both...why can you get away with merging metadata with data...the needle....figure 5....metadata and data are intertwined....why can they get away with the format....the data is immutable. The game changes with how you deal with metadata or data b/c they don’t change...Where is the photo, find the photo and then everything about it in one place. Keep track of headers then read everything about it in one go. Just need the offset, do not need a separate iNode...no pointer to iNode, just need big set of files where it is and the offset of the photo...reduce file operations and thus increase performance. It is just realizing that your data is immutable. Separating metadata and data together, fast access pattern. | |||
Have a photo name, gets you speed. Has to be immutable, fast is not good if it is not durable. We need protection and redundancy. Don’t store every photo once, store multiple times. The indexes are in memory. | |||
F4 is built on the Hadoop and Haystack uses FX | |||
Only have to touch the disk to read it faster...at this scale not using solid state disks, way too expensive. Haystack great for serving photos, what are the problems with it? Replication factors...reason for fractional numbers, in practice, have failures and so get fractions...replicates between 3-4 times when you get 3.6, on average at least three copies....too expensive, everyone’s photos between 3-4 times each, if you can remove one replication, save on storage costs....2010-2014 when people started paying attention to Facebook, specifically privacy. Diff between Haystack and F4, F4 deletes quickly while Haystack only marked items for deletion. Cheaper storage, better deletes. What is the trick, how did they do it? Same thing as used in RAID5, parity bits to track data and do something with it...good enough to erase data, stripe it. Encrypt everything, every photo has an encryption key stored in a separate database. If you delete the encryption key, you delete the data. B/c modern systems replicate data everywhere, logs journals etc. All over the place to guard against failures...copies on top of copies on top of copies on every scale. If you encrypt, delete the key, everything is gone. Haystack becomes the photo cache, photos being accessed quickly. For worm storage, F4 is used for that...why not use f4 for everything. Parity stuff and it has fewer replicas to read from, with multiple replicas, can read from them in parallel so Haystack is good for hot stuff while F4 is better for the colder but not completely cold. | |||
Amazon Glacier...cheap storage, really cheap but cannot access quickly, from Glacier to S3 it take hours. Not online, might be in tapes sitting offline, so not good for immediate access data. So it will take long from f4 to haystack but not that long, a couple of seconds. | |||
Cold storage for disaster recovery....traditionally what cold storage is about but not useful for online services. | |||
Haystack has a durability guarantee with replication but also a performance benefit. How much engineering goes into these seemingly trivial uses. | |||
== Headline text == | |||
<ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">CDN</span></li> | |||
<ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">CDN stands for Content Distribution Network, essentially a large scale cache. Idea is you have servers replicated around the world, and have these servers close to people, and are for locality, instead of going to far away servers.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Akamai is a big name in CDNs, developers were physicists at MIT and used fluid mechanics to model peoples usage and data flow.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">A CDN is necessary for large scale systems.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">CDNs do not serve entire websites. They are bad at serving dynamic content - email is not in CDN, makes no sense. Every time what you see is specific to you, it does not make sense for a CDN, only makes sense for highly replicated data, not personalized data.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Browsers go to CDN to get some data, but also goes to Haystack.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">CDNs normally rewrite the URL.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">AMP is CDN for websites. Do stuff in this format and google will help you deliver it faster and also provide you analytics.</span></li> | |||
</ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Haystack</span></li> | |||
<ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">NFS was initially used, but too many disk accesses because of metadata. Makes sense to separate metadata in classic file systems but not for facebook.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Needle merged metadata with data, can get away because data is immutable.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">A bunch of photos combined into a single file, then this stored on a classic file system.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Name of file is just an offset into a big file where you fetch both metadata and data, instead of going through the inode structure.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Fast is not enough, they have to replicate it.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Haystack uses XFS, does its own replication, f4 built on HDFS</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Replication factor never exactly 3 because of servers coming back, and other issues. This is why we get a replication factor of 3.6</span></li> | |||
<ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">This is too high of a replication factor.</span></li> | |||
</ul> | |||
</ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">f4</span></li> | |||
<ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">People started caring about FB at this time, in particular security and deletes.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">f4 enables deletes, f4 marks things for deletion, and they will be eventually reclaimed. Cheaper storage, better deletes.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Facebook encrypts everything, every photo has its encryption key which is stored in another database. Delete the encryption key.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">The only way to delete things quickly in modern systems is to have it all encrypted and delete the key.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Haystack is for hot storage, f4 is for warm storage. Haystack in effect becomes the photo CDN.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Why not use f4 for everything?</span></li> | |||
<ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">f4 is inherently slower because it has parity stuff.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Best thing about the multiple replicas in Haystack is its faster to read from.</span></li> | |||
</ul> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Cold storage: data which is not accessed often and takes time to retreive. Ex. Amazon glacier. Amazon Glacier is really cheap, but cant access it immediately.</span></li> | |||
<li style="font-weight: 400;"><span style="font-weight: 400;">Facebook doesn't really have cold storage.</span></li> | |||
</ul> | |||
</ul> |
Latest revision as of 18:54, 18 November 2018
Readings
- Beaver et al., "Finding a needle in Haystack: Facebook’s photo storage" (OSDI 2010)
- Muralidhar et al., "f4: Facebook's Warm BLOB Storage System" (OSDI 2014)
Notes
Lecture Nov 14: Haystack & F4
CDN (Canadian Dairy Network), when talking about Haystack, talking about CDNs. What is a CDN...Content Distribution Network. Large-scale cache for data. The idea is you have servers replicated around the world. Servers close to people and the things people want...instead of going to main server, go to local server and get a copy from there. Pioneer Akamai .... folks from MIT....models that became content distribution networks necessary for large scale systems. Reduces latency of page loads. Do not serve entire website, CDNs are bad at serving dynamic content such as email...a web app is not in a CDN b/c it makes no sense...no-one should be asking for the same email, should not be replicated....what you see is specific to you does not make sense. Only makes sense if replicated across multiple page views. Code on client is going to have to get custom data, that is where a CDN does not work. See Figure 3...to CDN and to Haystack storage. Original solution....photos are the problem...original solution was NFS....this sucks but why? Bad performance....too many disk accesses why? Metadata...having to go access the iNode and then the actual contents of the file was too much...for a normal file system, of course you have to separate the metadata from the data....metadata has different access patterns, did not make sense for Facebook...didn’t want separate reads for both...why can you get away with merging metadata with data...the needle....figure 5....metadata and data are intertwined....why can they get away with the format....the data is immutable. The game changes with how you deal with metadata or data b/c they don’t change...Where is the photo, find the photo and then everything about it in one place. Keep track of headers then read everything about it in one go. Just need the offset, do not need a separate iNode...no pointer to iNode, just need big set of files where it is and the offset of the photo...reduce file operations and thus increase performance. It is just realizing that your data is immutable. Separating metadata and data together, fast access pattern.
Have a photo name, gets you speed. Has to be immutable, fast is not good if it is not durable. We need protection and redundancy. Don’t store every photo once, store multiple times. The indexes are in memory.
F4 is built on the Hadoop and Haystack uses FX
Only have to touch the disk to read it faster...at this scale not using solid state disks, way too expensive. Haystack great for serving photos, what are the problems with it? Replication factors...reason for fractional numbers, in practice, have failures and so get fractions...replicates between 3-4 times when you get 3.6, on average at least three copies....too expensive, everyone’s photos between 3-4 times each, if you can remove one replication, save on storage costs....2010-2014 when people started paying attention to Facebook, specifically privacy. Diff between Haystack and F4, F4 deletes quickly while Haystack only marked items for deletion. Cheaper storage, better deletes. What is the trick, how did they do it? Same thing as used in RAID5, parity bits to track data and do something with it...good enough to erase data, stripe it. Encrypt everything, every photo has an encryption key stored in a separate database. If you delete the encryption key, you delete the data. B/c modern systems replicate data everywhere, logs journals etc. All over the place to guard against failures...copies on top of copies on top of copies on every scale. If you encrypt, delete the key, everything is gone. Haystack becomes the photo cache, photos being accessed quickly. For worm storage, F4 is used for that...why not use f4 for everything. Parity stuff and it has fewer replicas to read from, with multiple replicas, can read from them in parallel so Haystack is good for hot stuff while F4 is better for the colder but not completely cold.
Amazon Glacier...cheap storage, really cheap but cannot access quickly, from Glacier to S3 it take hours. Not online, might be in tapes sitting offline, so not good for immediate access data. So it will take long from f4 to haystack but not that long, a couple of seconds.
Cold storage for disaster recovery....traditionally what cold storage is about but not useful for online services. Haystack has a durability guarantee with replication but also a performance benefit. How much engineering goes into these seemingly trivial uses.
Headline text
- CDN
- CDN stands for Content Distribution Network, essentially a large scale cache. Idea is you have servers replicated around the world, and have these servers close to people, and are for locality, instead of going to far away servers.
- Akamai is a big name in CDNs, developers were physicists at MIT and used fluid mechanics to model peoples usage and data flow.
- A CDN is necessary for large scale systems.
- CDNs do not serve entire websites. They are bad at serving dynamic content - email is not in CDN, makes no sense. Every time what you see is specific to you, it does not make sense for a CDN, only makes sense for highly replicated data, not personalized data.
- Browsers go to CDN to get some data, but also goes to Haystack.
- CDNs normally rewrite the URL.
- AMP is CDN for websites. Do stuff in this format and google will help you deliver it faster and also provide you analytics.
- Haystack
- NFS was initially used, but too many disk accesses because of metadata. Makes sense to separate metadata in classic file systems but not for facebook.
- Needle merged metadata with data, can get away because data is immutable.
- A bunch of photos combined into a single file, then this stored on a classic file system.
- Name of file is just an offset into a big file where you fetch both metadata and data, instead of going through the inode structure.
- Fast is not enough, they have to replicate it.
- Haystack uses XFS, does its own replication, f4 built on HDFS
- Replication factor never exactly 3 because of servers coming back, and other issues. This is why we get a replication factor of 3.6
- This is too high of a replication factor.
- f4
- People started caring about FB at this time, in particular security and deletes.
- f4 enables deletes, f4 marks things for deletion, and they will be eventually reclaimed. Cheaper storage, better deletes.
- Facebook encrypts everything, every photo has its encryption key which is stored in another database. Delete the encryption key.
- The only way to delete things quickly in modern systems is to have it all encrypted and delete the key.
- Haystack is for hot storage, f4 is for warm storage. Haystack in effect becomes the photo CDN.
- Why not use f4 for everything?
- f4 is inherently slower because it has parity stuff.
- Best thing about the multiple replicas in Haystack is its faster to read from.
- Cold storage: data which is not accessed often and takes time to retreive. Ex. Amazon glacier. Amazon Glacier is really cheap, but cant access it immediately.
- Facebook doesn't really have cold storage.