OceanStore & GPFS: Difference between revisions

From Soma-notes
Emmellst (talk | contribs)
No edit summary
Emmellst (talk | contribs)
 
(One intermediate revision by the same user not shown)
Line 34: Line 34:


Pros
Pros
**Massively parallel - data is striped across many many disks
*Massively parallel - data is striped across many many disks
**Therefor read/write is very fast
*Therefor read/write is very fast
*Option of redundancy
*Option of redundancy
*Locking mechanism
*Locking mechanism
**Two options  
**Two options  
***Data shipping
***1. Data shipping
****Distributed
****Distributed
****First client to request access to file receives token
****First client to request access to file receives token
****Other clients must request the current owner of the token
****Other clients must request the current owner of the token
*****The current owner of the file grants portional access to their file (breaks token and gives portion access)
*****The current owner of the file grants portional access to their file (breaks token and gives portion access)
***Centralized locking
***2. Centralized locking
****Faster in a small disk circumstance
****Faster in a small disk circumstance
*Extreme reliability
*Extreme reliability
Line 53: Line 53:
*Everything must be trusted! Designed for clusters, not across LAN/WAN
*Everything must be trusted! Designed for clusters, not across LAN/WAN
*Not appropriate for distributed networks.
*Not appropriate for distributed networks.


=XUFS=
=XUFS=

Latest revision as of 20:52, 25 February 2008

Readings

John Kubiatowicz et al., "OceanStore: An Architecture for Global-Scale Persistent Storage" (2000)

Sean Rhea et al., "Pond: the OceanStore Prototype" (2003)

Frank Schmuck and Roger Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters" (2002)

Edward Walker, "A Distributed File System for a Wide-Area High Performance Computing Infrastructure" (2006)

Questions

Is it worth it??


Ocean Store

Pros

  • Only trust required is own box
  • Data is highly durable due to file versioning
  • Information divorced from location
    • So long as you can reliably obtain information, it doesn't matter where it is located
  • Applicable to many data storage situations, not for a specific case
  • Routing is decentralized
  • 2/3 of network is up? All is available

Cons

  • Very expensive to computer cryptography (slow generation of keys)
  • Utility models don't make economic sense, people prefer not to pay for access to their data


GPFS

Distributed local OS designed for clusters Max size of 4096TB

Pros

  • Massively parallel - data is striped across many many disks
  • Therefor read/write is very fast
  • Option of redundancy
  • Locking mechanism
    • Two options
      • 1. Data shipping
        • Distributed
        • First client to request access to file receives token
        • Other clients must request the current owner of the token
          • The current owner of the file grants portional access to their file (breaks token and gives portion access)
      • 2. Centralized locking
        • Faster in a small disk circumstance
  • Extreme reliability
    • Able to literally remove a hotswap disk and insert a blank one in its place, only to have the blank disk completely regenerate the missing data
    • Journalling to record token ownership - helps recovery when node in possession dies

Cons

  • Everything must be trusted! Designed for clusters, not across LAN/WAN
  • Not appropriate for distributed networks.

XUFS

  • User-space implementation
  • Designed to be simple
  • Very generic