Difference between revisions of "OceanStore & GPFS"

From Soma-notes
Jump to navigation Jump to search
Line 16: Line 16:
=Ocean Store=
=Ocean Store=
Pros
Pros
-Only trust required is own box
*Only trust required is own box
-Data is highly durable due to file versioning
*Data is highly durable due to file versioning
-Information divorced from location
*Information divorced from location
--So long as you can reliably obtain information, it doesn't matter where it is located
**So long as you can reliably obtain information, it doesn't matter where it is located
-Applicable to many data storage situations, not for a specific case
*Applicable to many data storage situations, not for a specific case
-Routing is decentralized
*Routing is decentralized
-2/3 of network is up? All is available
*2/3 of network is up? All is available


Cons
Cons
-Very expensive to computer cryptography (slow generation of keys)
*Very expensive to computer cryptography (slow generation of keys)
-Utility models don't make economic sense, people prefer not to pay for access to their data
*Utility models don't make economic sense, people prefer not to pay for access to their data
 
=Pond=
Example of oceanstore




Line 35: Line 32:
Distributed local OS designed for clusters
Distributed local OS designed for clusters
Max size of 4096TB
Max size of 4096TB
Pros
Pros
-Massively parallel - data is striped across many many disks
**Massively parallel - data is striped across many many disks
--Therefor read/write is very fast
**Therefor read/write is very fast
-Option of redundancy
*Option of redundancy
-Locking mechanism
*Locking mechanism
--Two options  
**Two options  
---Data shipping
***Data shipping
----Distributed
****Distributed
----First client to request access to file receives token
****First client to request access to file receives token
----Other clients must request the current owner of the token
****Other clients must request the current owner of the token
-----The current owner of the file grants portional access to their file (breaks token and gives portion access)
*****The current owner of the file grants portional access to their file (breaks token and gives portion access)
---Centralized locking
***Centralized locking
----Faster in a small disk circumstance
****Faster in a small disk circumstance
-Extreme reliability
*Extreme reliability
--Able to literally remove a hotswap disk and insert a blank one in its place, only to have the blank disk completely regenerate the missing data
**Able to literally remove a hotswap disk and insert a blank one in its place, only to have the blank disk completely regenerate the missing data
--Journalling to record token ownership - helps recovery when node in possession dies
**Journalling to record token ownership - helps recovery when node in possession dies


Cons
Cons
-Everything must be trusted! Designed for clusters, not across LAN/WAN
*Everything must be trusted! Designed for clusters, not across LAN/WAN
-Not appropriate for distributed networks.
*Not appropriate for distributed networks.




=XUFS=
=XUFS=
User-space implementation
*User-space implementation
Designed to be simple
*Designed to be simple
Very generic
*Very generic

Revision as of 16:50, 25 February 2008

Readings

John Kubiatowicz et al., "OceanStore: An Architecture for Global-Scale Persistent Storage" (2000)

Sean Rhea et al., "Pond: the OceanStore Prototype" (2003)

Frank Schmuck and Roger Haskin, "GPFS: A Shared-Disk File System for Large Computing Clusters" (2002)

Edward Walker, "A Distributed File System for a Wide-Area High Performance Computing Infrastructure" (2006)

Questions

Is it worth it??


Ocean Store

Pros

  • Only trust required is own box
  • Data is highly durable due to file versioning
  • Information divorced from location
    • So long as you can reliably obtain information, it doesn't matter where it is located
  • Applicable to many data storage situations, not for a specific case
  • Routing is decentralized
  • 2/3 of network is up? All is available

Cons

  • Very expensive to computer cryptography (slow generation of keys)
  • Utility models don't make economic sense, people prefer not to pay for access to their data


GPFS

Distributed local OS designed for clusters Max size of 4096TB

Pros

    • Massively parallel - data is striped across many many disks
    • Therefor read/write is very fast
  • Option of redundancy
  • Locking mechanism
    • Two options
      • Data shipping
        • Distributed
        • First client to request access to file receives token
        • Other clients must request the current owner of the token
          • The current owner of the file grants portional access to their file (breaks token and gives portion access)
      • Centralized locking
        • Faster in a small disk circumstance
  • Extreme reliability
    • Able to literally remove a hotswap disk and insert a blank one in its place, only to have the blank disk completely regenerate the missing data
    • Journalling to record token ownership - helps recovery when node in possession dies

Cons

  • Everything must be trusted! Designed for clusters, not across LAN/WAN
  • Not appropriate for distributed networks.


XUFS

  • User-space implementation
  • Designed to be simple
  • Very generic