<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://homeostasis.scs.carleton.ca/wiki/index.php?action=history&amp;feed=atom&amp;title=DistOS_2021F_2021-10-05</id>
	<title>DistOS 2021F 2021-10-05 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://homeostasis.scs.carleton.ca/wiki/index.php?action=history&amp;feed=atom&amp;title=DistOS_2021F_2021-10-05"/>
	<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=DistOS_2021F_2021-10-05&amp;action=history"/>
	<updated>2026-05-12T23:28:13Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.42.1</generator>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=DistOS_2021F_2021-10-05&amp;diff=23408&amp;oldid=prev</id>
		<title>Soma: Created page with &quot;==Notes==  &lt;pre&gt; Lecture 8: NASD &amp; GFS ---------------------  Questions?  - is NASD a NAS?  - how cost efficient?  NASD or GFS?  - having the file server out of the loop, secu...&quot;</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=DistOS_2021F_2021-10-05&amp;diff=23408&amp;oldid=prev"/>
		<updated>2021-10-13T02:22:46Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;==Notes==  &amp;lt;pre&amp;gt; Lecture 8: NASD &amp;amp; GFS ---------------------  Questions?  - is NASD a NAS?  - how cost efficient?  NASD or GFS?  - having the file server out of the loop, secu...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;==Notes==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Lecture 8: NASD &amp;amp; GFS&lt;br /&gt;
---------------------&lt;br /&gt;
&lt;br /&gt;
Questions?&lt;br /&gt;
 - is NASD a NAS?&lt;br /&gt;
 - how cost efficient?  NASD or GFS?&lt;br /&gt;
 - having the file server out of the loop, security issues with&lt;br /&gt;
   NASD?&lt;br /&gt;
 - checkpoint system?&lt;br /&gt;
 - NASD in use?&lt;br /&gt;
 - why just kill chunkservers?  Not shut down?&lt;br /&gt;
 - GFS file security?&lt;br /&gt;
&lt;br /&gt;
NAS is just a file server&lt;br /&gt;
 - dedicated, but a file server&lt;br /&gt;
 - generally use standard network file sharing protocols&lt;br /&gt;
   (CIFS, NFS)&lt;br /&gt;
&lt;br /&gt;
NASD is a different beast&lt;br /&gt;
 - disks are object servers, not file servers&lt;br /&gt;
   - objects are just variable-sized chunks of data + metadata&lt;br /&gt;
     (no code)&lt;br /&gt;
   - contrast with blocks&lt;br /&gt;
&lt;br /&gt;
With object-based distributed filesystems, we&amp;#039;ve added a level of indirection&lt;br /&gt;
 - file server translates files to sets of objects, handle file&lt;br /&gt;
   metadata&lt;br /&gt;
 - object servers store objects&lt;br /&gt;
&lt;br /&gt;
Why add this level of indirection?  Why not just use fixed-sized blocks?&lt;br /&gt;
&lt;br /&gt;
(In GFS, instead of objects we have chunks, bit less metadata)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Objects are all about parallel access&lt;br /&gt;
 - to enable performance&lt;br /&gt;
&lt;br /&gt;
Client can ask for objects from multiple object servers at once&lt;br /&gt;
 - file server doesn&amp;#039;t have to be involved at all&lt;br /&gt;
&lt;br /&gt;
The classic way we did redundancy &amp;amp; reliability in storage is with RAID&lt;br /&gt;
&lt;br /&gt;
Sounds like most of you haven&amp;#039;t used RAID&lt;br /&gt;
 - Redundant Array of Inexpensive/Independent Disks&lt;br /&gt;
 - idea is to combine multiple drives together to get&lt;br /&gt;
   more, higher performance, more reliable storage&lt;br /&gt;
&lt;br /&gt;
RAID-0: striping&lt;br /&gt;
RAID-1: mirroring&lt;br /&gt;
RAID-5: striping + parity&lt;br /&gt;
&lt;br /&gt;
With RAID, data is distributed across disks at the block level&lt;br /&gt;
 - drives have no notion of files, just blocks&lt;br /&gt;
&lt;br /&gt;
The modern insight with distributed storage is distributing at the block layer is too low level&lt;br /&gt;
 - better to distribute bigger chunks, like objects!&lt;br /&gt;
&lt;br /&gt;
Read objects in parallel, rather than blocks&lt;br /&gt;
 - files are big, so feasible to read multiple objects in parallel&lt;br /&gt;
&lt;br /&gt;
We do &amp;quot;mirroring&amp;quot; with objects/chunks, i.e. have multiple copies&lt;br /&gt;
 - parity/erasure codes mostly not worth the effort for&lt;br /&gt;
   these systems (but later systems will use such things)&lt;br /&gt;
&lt;br /&gt;
Security&lt;br /&gt;
 - NASD security?  How can clients securely access&lt;br /&gt;
   individual drives?&lt;br /&gt;
&lt;br /&gt;
In Linux (POSIX) capabilities are a way to split up root access&lt;br /&gt;
 - but that is actually not the &amp;quot;normal&amp;quot; meaning of capabilities&lt;br /&gt;
   in a security context&lt;br /&gt;
&lt;br /&gt;
Capabilities are tokens a process can present to a service to enable access&lt;br /&gt;
 - separate authentication server gives out capability tokens&lt;br /&gt;
 - idea is the authentication server doesn&amp;#039;t have to check&lt;br /&gt;
   when access is done, it can be done in advance&lt;br /&gt;
&lt;br /&gt;
With capabilities, the drives can control access without&lt;br /&gt;
needing to understand about users, groups, etc&lt;br /&gt;
 - it just has to understand the tokens, have a way to&lt;br /&gt;
   verify them&lt;br /&gt;
 - make sure the tokens can&amp;#039;t be faked!&lt;br /&gt;
&lt;br /&gt;
Most single sign on systems tend to have some sort of capability-like token underneath if they are really distributed&lt;br /&gt;
&lt;br /&gt;
Note that capability tokens are ephemeral&lt;br /&gt;
 - normally expire after a relatively short period of time (minutes or hours)&lt;br /&gt;
    - needed to prevent replay attacks&lt;br /&gt;
&lt;br /&gt;
Imagine having 10,000 storage servers and one authentication server&lt;br /&gt;
 - if auth server had to be involved in every file access,&lt;br /&gt;
   would become a bottleneck&lt;br /&gt;
 - but with capabilities it can issue them at a much slower rate&lt;br /&gt;
   and sit back while mass data transfers happen&lt;br /&gt;
&lt;br /&gt;
Capabilities are at the heart of NASD&lt;br /&gt;
&lt;br /&gt;
What about GFS?&lt;br /&gt;
 - nope, assumes a trusted data center&lt;br /&gt;
 - I think it has UNIX-like file permissions, but&lt;br /&gt;
   nothing fancy&lt;br /&gt;
    - just to prevent accidental file damage&lt;br /&gt;
&lt;br /&gt;
What was GFS for?&lt;br /&gt;
 - building a search engine&lt;br /&gt;
 - i.e., downloading and indexing the entire web!&lt;br /&gt;
   - data comes in from crawlers&lt;br /&gt;
   - indices built as batch jobs&lt;br /&gt;
&lt;br /&gt;
Are GFS files regular files?&lt;br /&gt;
 - they are weird because they are sets of records&lt;br /&gt;
   - records can be duplicated, must have unique id&amp;#039;s&lt;br /&gt;
 - record, think web page&lt;br /&gt;
   - have to account for crawler messing up and&lt;br /&gt;
     downloading same info multiple times&lt;br /&gt;
     (i.e., if the crawler had a hardware or&lt;br /&gt;
      software fault)&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;/div&gt;</summary>
		<author><name>Soma</name></author>
	</entry>
</feed>