<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://homeostasis.scs.carleton.ca/wiki/index.php?action=history&amp;feed=atom&amp;title=DistOS_2021F_2021-11-04</id>
	<title>DistOS 2021F 2021-11-04 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://homeostasis.scs.carleton.ca/wiki/index.php?action=history&amp;feed=atom&amp;title=DistOS_2021F_2021-11-04"/>
	<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=DistOS_2021F_2021-11-04&amp;action=history"/>
	<updated>2026-05-12T23:28:39Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.42.1</generator>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=DistOS_2021F_2021-11-04&amp;diff=23433&amp;oldid=prev</id>
		<title>Soma: Created page with &quot;==Notes==  &lt;pre&gt; Lecture 14 ----------  BLOBs  - photos, videos, or other binary data  - immutable  traditional filesystems have to support reading &amp; writing at any time  - he...&quot;</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=DistOS_2021F_2021-11-04&amp;diff=23433&amp;oldid=prev"/>
		<updated>2021-11-04T23:54:18Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;==Notes==  &amp;lt;pre&amp;gt; Lecture 14 ----------  BLOBs  - photos, videos, or other binary data  - immutable  traditional filesystems have to support reading &amp;amp; writing at any time  - he...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;==Notes==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Lecture 14&lt;br /&gt;
----------&lt;br /&gt;
&lt;br /&gt;
BLOBs&lt;br /&gt;
 - photos, videos, or other binary data&lt;br /&gt;
 - immutable&lt;br /&gt;
&lt;br /&gt;
traditional filesystems have to support reading &amp;amp; writing at any time&lt;br /&gt;
 - here, we just need creation and reading&lt;br /&gt;
&lt;br /&gt;
Haystack&lt;br /&gt;
 - what&amp;#039;s the problem that it solved?&lt;br /&gt;
   - making photo storage more efficient&lt;br /&gt;
      - higher density&lt;br /&gt;
      - more requests per second&lt;br /&gt;
&lt;br /&gt;
How do you get faster?  Do less work&lt;br /&gt;
 - reduce disk operations by reducing what has to be read&lt;br /&gt;
 - they benchmarked, noticed metadata lookups were taking lots of time&lt;br /&gt;
   - because you first read metadata, then read data&lt;br /&gt;
&lt;br /&gt;
Idea: keep a cache of metadata in RAM, so we can just do data I/O&lt;br /&gt;
 - to do this, need to minimize metadata&lt;br /&gt;
&lt;br /&gt;
Why do regular filesystems have so much metadata?&lt;br /&gt;
(Does anyone know how ISO9660/CD-ROM/DVD/etc is organized?)&lt;br /&gt;
&lt;br /&gt;
Traditional filesystems have to allow files to grow and shrink arbitrarily&lt;br /&gt;
 - have to add or remove blocks from a file&lt;br /&gt;
   - removed blocks need to be allocated to new files&lt;br /&gt;
&lt;br /&gt;
optical media filesystems are designed to be read only&lt;br /&gt;
 - so can be optimized for writes&lt;br /&gt;
&lt;br /&gt;
regular filesystems are optimized by organizing files into ranges of blocks&lt;br /&gt;
 - want to have as few as possible, but can have sequences of blocks&lt;br /&gt;
   anywhere on disk&lt;br /&gt;
&lt;br /&gt;
With a read-only filesystem, we can just lay out files one at a time, each getting as many blocks as it needs in sequence&lt;br /&gt;
 - file a is blocks 2-50, file b is blocks 51-1000, etc.&lt;br /&gt;
Note this GREATLY reduces size of metadata&lt;br /&gt;
 - because metadata includes the list of blocks&lt;br /&gt;
 - what&amp;#039;s the most compact way to represent a list of blocks?  a range&lt;br /&gt;
&lt;br /&gt;
1, 5, 100, 2, ... vs 2-10?&lt;br /&gt;
&lt;br /&gt;
remember with BLOBs we&amp;#039;re talking about large files&lt;br /&gt;
  - megabytes up&lt;br /&gt;
  - blocks are 4k normally&lt;br /&gt;
&lt;br /&gt;
With haystack, we have a big haystack with lots of needles&lt;br /&gt;
 - a needle is just a file (data + metadata), but stored sequentially&lt;br /&gt;
   (metadata first, then data)&lt;br /&gt;
&lt;br /&gt;
In memory, I just need the name of the file and its starting and ending blocks&lt;br /&gt;
 - from that I get all the data at once&lt;br /&gt;
&lt;br /&gt;
What happens when I delete a needle (a file)?&lt;br /&gt;
 - fragmentation, i.e., unused storage&lt;br /&gt;
&lt;br /&gt;
Facebook would mark a file as being deleted, but it wouldn&amp;#039;t delete it&lt;br /&gt;
(and space would only be reclaimed when the entire haystack was compacted)&lt;br /&gt;
 - could stick around for a long time&lt;br /&gt;
&lt;br /&gt;
A key goal for F4 was to facilitate quick data deletion&lt;br /&gt;
 - so delete =&amp;gt; delete encryption key&lt;br /&gt;
&lt;br /&gt;
Diversion: How do SSD&amp;#039;s work?&lt;br /&gt;
 - randomly accessible array of blocks (interface)&lt;br /&gt;
 - looks like a hard disk mostly, except there&amp;#039;s one problem&lt;br /&gt;
    - writes damage individual storage cells&lt;br /&gt;
    - generally only good for hundreds to thousands of writes,&lt;br /&gt;
      individually, then they fail&lt;br /&gt;
&lt;br /&gt;
Normal filesystems have hot spot, places where there are lots of write accesses&lt;br /&gt;
 - for example, inodes, especially the accessed timestamp&lt;br /&gt;
&lt;br /&gt;
So, for any sort of flash storage to be practical, we have to smooth out write hotspots&lt;br /&gt;
 - can&amp;#039;t write to the same area of flash repeatedly&lt;br /&gt;
&lt;br /&gt;
Solution: every write goes to a new set of storage cells&lt;br /&gt;
 - this is &amp;quot;wear leveling&amp;quot;&lt;br /&gt;
&lt;br /&gt;
But what if I want to keep writing to the same block?&lt;br /&gt;
 - abstraction level: writing to block 2000 repeatedly actually goes&lt;br /&gt;
   to many different storage cells&lt;br /&gt;
     - keeps track what is the current block 2000, returns that when it is requested&lt;br /&gt;
&lt;br /&gt;
So, there is massive duplication of repeatedly written files in SSDs.&lt;br /&gt;
 - and there is background garbage collection to get rid of old ones&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Writing always to new places, basically &amp;quot;appending&amp;quot; to the disk always,&lt;br /&gt;
is what a &amp;quot;log-structured filesystem&amp;quot; does&lt;br /&gt;
 - makes writes fast at the expense of slower reads unless you use lots of cache&lt;br /&gt;
&lt;br /&gt;
haystack is like a log-structured filesystem, kind of (whole files are written so don&amp;#039;t have the fragmentation issue of regular log-structured filesystems)&lt;br /&gt;
&lt;br /&gt;
Why &amp;quot;warm&amp;quot; storage?&lt;br /&gt;
 - hot is stuff that is being frequently accessed&lt;br /&gt;
 - cold is infrequently accessed&lt;br /&gt;
&lt;br /&gt;
so warm is in between&lt;br /&gt;
 - been around, should be pretty fast, but doesn&amp;#039;t need to be the fastest&lt;br /&gt;
    - can optimize for durability and space efficiency&lt;br /&gt;
&lt;br /&gt;
want to have the same kind of guarantees we get with lots of replicas, but with fewer replicas&lt;br /&gt;
 - hence, erasure coding&lt;br /&gt;
&lt;br /&gt;
Same idea as RAID-5, but more sophisticated&lt;br /&gt;
 - can lose a disk but still reconstruct the data (at reduced performance)&lt;br /&gt;
&lt;br /&gt;
Trade computation for storage space&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Soma</name></author>
	</entry>
</feed>