Difference between revisions of "OceanStore & GPFS"

From Soma-notes
Jump to navigation Jump to search
 
(5 intermediate revisions by 2 users not shown)
Line 2: Line 2:


[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/oceanstore-sigplan.pdf John Kubiatowicz et al., "OceanStore: An Architecture for Global-Scale Persistent Storage" (2000)]
[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/oceanstore-sigplan.pdf John Kubiatowicz et al., "OceanStore: An Architecture for Global-Scale Persistent Storage" (2000)]
[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/fast2003-pond.pdf Sean Rhea et al., "Pond: the OceanStore Prototype" (2003)]


[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/gpfs-fast02.pdf Frank Schmuck and Roger Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters" (2002)]
[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/gpfs-fast02.pdf Frank Schmuck and Roger Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters" (2002)]


[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/fast2003-pond.pdf Pond]
[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/walker-xufs-worlds06.pdf Edward Walker, "A Distributed File System for a Wide-Area High Performance Computing Infrastructure" (2006)]
 
==Questions==
Is it worth it??
 


[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/walker-xufs-worlds06.pdf XUFS]


==Questions==
=Ocean Store=
Pros
*Only trust required is own box
*Data is highly durable due to file versioning
*Information divorced from location
**So long as you can reliably obtain information, it doesn't matter where it is located
*Applicable to many data storage situations, not for a specific case
*Routing is decentralized
*2/3 of network is up? All is available
 
Cons
*Very expensive to computer cryptography (slow generation of keys)
*Utility models don't make economic sense, people prefer not to pay for access to their data
 
 
=GPFS=
Distributed local OS designed for clusters
Max size of 4096TB
 
Pros
*Massively parallel - data is striped across many many disks
*Therefor read/write is very fast
*Option of redundancy
*Locking mechanism
**Two options
***1. Data shipping
****Distributed
****First client to request access to file receives token
****Other clients must request the current owner of the token
*****The current owner of the file grants portional access to their file (breaks token and gives portion access)
***2. Centralized locking
****Faster in a small disk circumstance
*Extreme reliability
**Able to literally remove a hotswap disk and insert a blank one in its place, only to have the blank disk completely regenerate the missing data
**Journalling to record token ownership - helps recovery when node in possession dies
 
Cons
*Everything must be trusted! Designed for clusters, not across LAN/WAN
*Not appropriate for distributed networks.
 
=XUFS=
*User-space implementation
*Designed to be simple
*Very generic

Latest revision as of 16:52, 25 February 2008

Readings

John Kubiatowicz et al., "OceanStore: An Architecture for Global-Scale Persistent Storage" (2000)

Sean Rhea et al., "Pond: the OceanStore Prototype" (2003)

Frank Schmuck and Roger Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters" (2002)

Edward Walker, "A Distributed File System for a Wide-Area High Performance Computing Infrastructure" (2006)

Questions

Is it worth it??


Ocean Store

Pros

  • Only trust required is own box
  • Data is highly durable due to file versioning
  • Information divorced from location
    • So long as you can reliably obtain information, it doesn't matter where it is located
  • Applicable to many data storage situations, not for a specific case
  • Routing is decentralized
  • 2/3 of network is up? All is available

Cons

  • Very expensive to computer cryptography (slow generation of keys)
  • Utility models don't make economic sense, people prefer not to pay for access to their data


GPFS

Distributed local OS designed for clusters Max size of 4096TB

Pros

  • Massively parallel - data is striped across many many disks
  • Therefor read/write is very fast
  • Option of redundancy
  • Locking mechanism
    • Two options
      • 1. Data shipping
        • Distributed
        • First client to request access to file receives token
        • Other clients must request the current owner of the token
          • The current owner of the file grants portional access to their file (breaks token and gives portion access)
      • 2. Centralized locking
        • Faster in a small disk circumstance
  • Extreme reliability
    • Able to literally remove a hotswap disk and insert a blank one in its place, only to have the blank disk completely regenerate the missing data
    • Journalling to record token ownership - helps recovery when node in possession dies

Cons

  • Everything must be trusted! Designed for clusters, not across LAN/WAN
  • Not appropriate for distributed networks.

XUFS

  • User-space implementation
  • Designed to be simple
  • Very generic