Soma-notes - User contributions [en]

DistOS 2014W Lecture 23

2014-04-23T22:15:25Z

Dkirillov: /* Metadata management in Distributed File System - Sandarbh */

'''Presentations'''
===Distributed Shared Memory Systems - Mojgan===
* Introduction to DSM systems
* Advantages and Disadvantages
* Classification of DSM systems
* Design considerations
* Examples of DSM systems
- OpenSSI
- Mermaid
- MOSIX
- DDM

===Survey: Fault Tolerance in Distributed File System - Mohammed===
* Abstract
* Introductions
** About fault tolerance in any distributed system. Comparison between different file systems.
** Whats more suitable for Mobile based systems.
** Why satisfaction high for fault tolerance is one of the main issues for DFS's ?
* Replication and fault tolerance
** What is the Replica and Placement policy? What is the synchronization? What is its benefit?
- Synchronous Method
- Asynchronous Method
- Semi-Asynchronous Method
* Cache consistency and fault tolerance
** What is the cache? What is its benefit? Cache consistency?
- Write only Read Many (WORM)
- Transactional Locking - Read and write locks
- Leasing
* Example DFS mentioned in the paper
** Google File Systems
** HDFS
** MOOSEFS
** iRODS
** GlusterFS
** Lustre
** Ceph
** PARADISE for mobile
* Conclusion

===Survey on Control Plane Frameworks for Software Defined Networking - Sijo===
* Introduction
** Traditional Networks - Control Plane and Forwarding Plane
** Software Defined Networking
- Proposes decoupling of layers into independent layers
- Network entities or nodes are specialized elements which does the forwarding
- Control applications works on the logical view of the network provided by the controller without having to worry about
managing state distribution, topology discovery etc.
* Theme, Argument Outline
- Need for using distributed systems design principles, tools in SDN controller design to achieve scalability and reliability
* Controller Platforms
- Centralized and Distributed approaches
- Identify the need to use in controller platforms
- For centralized it started with NOX - Maestro - Beacon - Floodlight - POX - OpenDayLight
- For Distributed : ONIX - Hyperflow - YANC - ONOS
- Leverage parallel processing capabilities
* In detail about two systems:
** ONIX
** ONOS
* References

===Metadata management in Distributed File System - Sandarbh===
* What is metadata?
- Defined by bare-minimum functions for MDS (Metadata Server)
- Monitor the performance of DFS so that it can be used further
- Structure of metadata in Paper
* Why is Metadata management difficult?
- 50% of file operations are metadata operations
- Size of metadata
- Distribute the load evenly across all MDS
- Be able to handle thousands of clients
- Be able to handle file/directory permission change
- Recover data if some MDS goes down
- Be POSIX compliant
- Be able to scale- addition of new MDS shouldn't cause ripples
- Contrasting goals - replication and consistency - Average case improvements vs guaranteed performance for each access
* Static sub-tree partitioning
- Advantage - Clients know which MDS to contact for the file - Prefix caching
- Disadvantage - Directory hot spot formation
* Static hashing based partitioning
- Hash the filename or File identifier and assign it to MDS
- Advantage - Distributes load evenly - Gets rid of hotpsot info
- Disadvantage
* "Don't ask me where your server is" approach
- Ex : Ceph , GlusterFS, OceanStore, Hierarchical Bloom filters, Cassandra
- Responsibilities - Replica management, Consistency, Access control, Recover metadata in case of crash, Talk to each others to handle the load dynamically
* What's not in the slides
- Not focused on replication of metadata
- Semantic based search
* Structure of the survey
- Conventional metadata systems
- No-metadata approach
- Metadata approach of the file systems designed for specific goals 0 GFS, Haystack etcs
- Evolution history
- Comparison within category
- Cover reliability and consistency part
- Summarize learnings with expected trends

===Distributed Stream Processing - Ronak Chaudhari===
* About Stream processing
- Data streams
- DBMS vs Stream processing
* Applications
- Monitoring applications
- Militia applications
- Financial analysis
- Tracking applications
* Aurora
- Process incoming streams
- It has its own query algebra
- System Model - Query Model - Runtime Architecture
- QOS criteria
- SQuAL - Query algebra
- Aurora GUI
- Challenges in distribute operation
* Aurora vs Medusa
* Medusa
- Architecture
- Addition to Aurora - Lookup and Brain
- Failure detection
- Transfer of processing
- System API
- Load management
- High availability
- Benefits
* References

DistOS 2014W Lecture 23

2014-04-23T22:07:55Z

Dkirillov: /* Survey on Control Plane Frameworks for Software Defined Networking - Sijo */

'''Presentations'''
===Distributed Shared Memory Systems - Mojgan===
* Introduction to DSM systems
* Advantages and Disadvantages
* Classification of DSM systems
* Design considerations
* Examples of DSM systems
- OpenSSI
- Mermaid
- MOSIX
- DDM

===Survey: Fault Tolerance in Distributed File System - Mohammed===
* Abstract
* Introductions
** About fault tolerance in any distributed system. Comparison between different file systems.
** Whats more suitable for Mobile based systems.
** Why satisfaction high for fault tolerance is one of the main issues for DFS's ?
* Replication and fault tolerance
** What is the Replica and Placement policy? What is the synchronization? What is its benefit?
- Synchronous Method
- Asynchronous Method
- Semi-Asynchronous Method
* Cache consistency and fault tolerance
** What is the cache? What is its benefit? Cache consistency?
- Write only Read Many (WORM)
- Transactional Locking - Read and write locks
- Leasing
* Example DFS mentioned in the paper
** Google File Systems
** HDFS
** MOOSEFS
** iRODS
** GlusterFS
** Lustre
** Ceph
** PARADISE for mobile
* Conclusion

===Survey on Control Plane Frameworks for Software Defined Networking - Sijo===
* Introduction
** Traditional Networks - Control Plane and Forwarding Plane
** Software Defined Networking
- Proposes decoupling of layers into independent layers
- Network entities or nodes are specialized elements which does the forwarding
- Control applications works on the logical view of the network provided by the controller without having to worry about
managing state distribution, topology discovery etc.
* Theme, Argument Outline
- Need for using distributed systems design principles, tools in SDN controller design to achieve scalability and reliability
* Controller Platforms
- Centralized and Distributed approaches
- Identify the need to use in controller platforms
- For centralized it started with NOX - Maestro - Beacon - Floodlight - POX - OpenDayLight
- For Distributed : ONIX - Hyperflow - YANC - ONOS
- Leverage parallel processing capabilities
* In detail about two systems:
** ONIX
** ONOS
* References

===Metadata management in Distributed File System - Sandarbh===
* What is metadata?
- Define by bare-minimum functions for MDS (Metadata Server)
- Monitor the performance of DFS so that it can be used further
- Structure of metadata in Paper
* Why is Metadata management difficult?
- 50% file operations are metadata operations
- Size of metadata
- Distribute the load evenly across all MDS
- Be able to handle thousands of clients
- Be able to handle file/directory permission change
- Recover data if some MDS goes down
- Be POSIX compliant
- Be able to scale- addition of new MDS shoudn't cause ripples
- Contrasting goals - replication and consistency - Average case improvements vs guaranteed performance for each access
* Static sub-tree partitioning
- Advantage - Clients know which MDS to contact for the file - Prefix caching
- Disadvantage - Directory hot spot formation
* Static hashing based partitioning
- Hash the filename or File identifier and assign to MDS
- Advantage - Distributes load evenly - Gets rid of hotpsot info
- Disadvantage
* Don't ask me where your server is approach
- Ex : Ceph , GlusterFS, OceanStore, Hierarchical Bloom filters, Cassandra
- Responsibilities - Replica mgmt, Consistency, Access control, Recover metadata in case of crash, Talk to each other to handle the load dynamically
* What's not in the slides
- Not focused on replication of metadata
- Semantic based search
* Structure of the survey
- Conventional metadata systems
- No-metadata approach
- Metadata approach of the file systems designed for specific goals 0 GFS, Haystack etcs
- Evolution history
- Comparison with in ctageory
- Cover reliability and consistency part
- Summarize learnings with expected trends

===Distributed Stream Processing - Ronak Chaudhari===
* About Stream processing
- Data streams
- DBMS vs Stream processing
* Applications
- Monitoring applications
- Militia applications
- Financial analysis
- Tracking applications
* Aurora
- Process incoming streams
- It has its own query algebra
- System Model - Query Model - Runtime Architecture
- QOS criteria
- SQuAL - Query algebra
- Aurora GUI
- Challenges in distribute operation
* Aurora vs Medusa
* Medusa
- Architecture
- Addition to Aurora - Lookup and Brain
- Failure detection
- Transfer of processing
- System API
- Load management
- High availability
- Benefits
* References

DistOS 2014W Lecture 19

2014-03-25T00:59:17Z

Dkirillov:

== Dynamo ==

*Key value-store.
*Build a distributed storage system:
*Scale
*Simple: key-value
*Highly available
*Guarantee Service Level Agreements (SLA).
*high concurrent.
* no dynamic routing.
* 0-hop DHT: means it is doe not have information when deliver packet from node to another , it has direct link to the destination
* Dynamo sacrifices consistency under certain failure scenarios.
* it has partition algorithm.
*Consistent hashing: the output range of a hash function is treated as a fixed circular space or “ring”.
*Key is linear and the nodes is partition.
*”Virtual Nodes”: Each node can be responsible for more than one virtual node.
*Each data item is replicated at N hosts.
*“preference list”: The list of nodes that is responsible for storing a particular key.
* Sacrifice strong consistency for availability
* it work with 100 servers,it is not more big.

== Bigtable ==

* BigTable is a distributed storage system for managing structured data.
* Designed to scale to a very large size
* it stores the column together ,the raw is web pages and the column is the contents.
* Each pages have incoming links
* A BigTable is a sparse, distributed persistent multi-dimensional sorted map.
* it have a many columns and it is look as table.
* Each raw has arbitrary column.
* It is multi-dimension map.
* An SSTable provides a persistent,ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.
* Large tables broken into tablets at row boundaries and each raw Tablet holds contiguous range of rows.
* Metadata operations: Create/delete tables, column families, change metadata.

The question to consider is- can big table be used in a shopping cart type of scenario, where latency and availability are the main focus( or to rephrase the question- can big table be used in place of dynamo and vice- versa ). The answer is- it can be but it wouldn't be as good as dynamo at latency parameter, Dynamo would probably do a lot better than big table but the reason is that big table was not designed to work under such a scenario, its use cases were different. There is no one solution that can solve all the problems in the world of distributed file systems, there is no silver bullet, no - one size fits all. file systems are usually designed for specific use cases and they work best for them, later if the need be they can be molded to work on other scenarios as well and they may provide good enough performance for the later added goals as well but they would work best for the use cases,which were the targets in the beginnings.

== General talk ==

* Read the introduction and conclusion for each paper and think about cases in the paper more than look to how the author solve the problem.

DistOS 2014W Lecture 19

2014-03-25T00:59:02Z

Dkirillov:

== Dynamo ==

*Key value-store.
*Build a distributed storage system:
*Scale
*Simple: key-value
*Highly available
*Guarantee Service Level Agreements (SLA).
*high concurrent.
* no dynamic routing.
* 0-hop DHT: means it is doe not have information when deliver packet from node to another , it has direct link to the destination
* Dynamo sacrifices consistency under certain failure scenarios.
* it has partition algorithm.
*Consistent hashing: the output range of a hash function is treated as a fixed circular space or “ring”.
*Key is linear and the nodes is partition.
*”Virtual Nodes”: Each node can be responsible for more than one virtual node.
*Each data item is replicated at N hosts.
*“preference list”: The list of nodes that is responsible for storing a particular key.
* Sacrifice strong consistency for availability
* it work with 100 servers,it is not more big.

== Bigtable ==

* BigTable is a distributed storage system for managing structured data.
* Designed to scale to a very large size
* it stores the column together ,the raw is web pages and the column is the contents.
* Each pages have incoming links
* A BigTable is a sparse, distributed persistent multi-dimensional sorted map.
* it have a many columns and it is look as table.
* Each raw has arbitrary column.
* It is multi-dimension map.
* An SSTable provides a persistent,ordered immutable map from keys to values, where both keys and values are arbitrary byte strings.
* Large tables broken into tablets at row boundaries and each raw Tablet holds contiguous range of rows.
* Metadata operations: Create/delete tables, column families, change metadata.

The question to consider is- can big table be used in a shopping cart type of scenario, where latency and availability are the main focus( or to rephrase the question- can big table be used in place of dynamo and vice- versa ). The answer is- it can be but it wouldnt be as good as dynamo at latency parameter, Dynamo would probably do a lot better than big table but the reason is that big table was not designed to work under such a scenario, its use cases were different. There is no one solution that can solve all the problems in the world of distributed file systems, there is no silver bullet, no - one size fits all. file systems are usually designed for specific use cases and they work best for them, later if the need be they can be molded to work on other scenarios as well and they may provide good enough performance for the later added goals as well but they would work best for the use cases,which were the targets in the beginnings.

== General talk ==

* Read the introduction and conclusion for each paper and think about cases in the paper more than look to how the author solve the problem.

DistOS 2014W Lecture 8

2014-02-06T17:27:10Z

Dkirillov:

==Group 1==

'''NFS:'''

1) per operating traffic

2) rpc based

3) unreliable

'''AFS:'''

1) design for 5000 clients

2) high integrity.

==Group 2==

'''NFS:'''

1) designed to share disks over a network, not files

2) more UNIX like

3) portable

4) use UDP

5) it is not minimize network traffic.

6) used VNODE

7) not have much hardware equipment

8) later versions took on features of AFS

9) stateless protocol conflicts with files being state-full by nature.

'''AFS:'''

1) designed to share files over a network, not disks

2) better scalability

3) better security.

4) minimize network traffic.

5) less UNIX like

6) plugin authentication

7) needs more kernel storage due to complex commands

8) inode concept replaced with fid

==Group 3==

'''NFS:'''

1) cache assumption invalid.

2) no locking

3) bad security

'''AFS:'''

1) cache assumption valid

2) locking

3) good security.

==Group 4==

DistOS 2014W Lecture 10

2014-02-06T17:17:02Z

Dkirillov:

==Context==

== GFS ==

* Very different because of the workload that it is desgined for.
** Because of the number of small files that have to be indexed for the web, etc., it is no longer practical to have a filesystem that stores these individually. Too much overhead. Punts problem to userspace, incl. record delimitation.
* Don't care about latency, surprising considering it's Google, the guys who change the TCP IW standard recommendations for latency.
* Mostly seeking through entire file.
* Paper from 2003, mentions still using 100BASE-T links.
* Data-heavy, metadata light. Contacting the metadata server is a rare event.
* Really good that they designed for unreliable hardware:
** All the replication
** Data checksumming
* Performance degrades for small random access workload; use other filesystem.
* Path of least resistance to scale, not to do something super CS-smart.
* Google used to re-index every month, swapping out indexes. Now, it's much more online. GFS is now just a layer to support a more dynamic layer.

=== Segue on drives ===

* Structure of GFS does match some other modern systems:
** Hard drives are like parallel tapes, very suited for streaming.
** Flash devices are log-structured too, but have an abstracting firmware. You want to do erasure in bulk, in the '''background'''. Used to be we needed specialized FS for MTDs to get better performance; though now we have better microcontrollers in some embedded systems to abstract away the hardware.
* Architectures that start big, often end up in the smallest things.

== How other filesystems compare to GFS and Ceph ==

* Data and metadata are held together.
** Doesn't account for different access patterns:
*** Data → big, long transfers
*** Metadata → small, low latency
** Can't scale separately
* By design, a file is a fraction of the size of a server
** Huge files spread over many servers not even in the cards for NFS
** Meant for small problems, not web-scale
*** Google has a copy of the publicly accessible internet
**** Their strategy is to copy the internet to index it
**** Insane → insane filesystem
* Designed for lower latency
* Designed for POSIX semantics; how the requirements that lead to the ‘standard’ evolved
* Even mainframes, scale-up solutions, ultra-reliable systems, with data sets bigger than RAM don't have this scale.
* Reliability was a property of the host, not the network
* Point-to-point access; much less load-balancing, even in AFS
** Single point of entry, single point of failure, bottleneck
* No notion of data replication.

==Ceph==

<ul>
<li>Ceph is crazy and tries to do everything</li>
<li>Unlike GFS, distributes metadata, not just for read-only copies</li>
<li>Unlike GFS, the OSDs have some intelligence, and autonomously distribute the data, rather than being controlled by a master.
<ul>
<li>Uses hashing in the distribution process to '''uniformly''' distribute data</li>
<li><p>The actual algorithm for distributing data is as follows:</p>
<p><math>file + offset → hash(object ID) → CRUSH(placement group) → OSD</math></p></li>
<li>Each client has knowledge of the entire storage network.</li>
<li>Tracks failure groups (same breaker, switch, etc.), hot data, etc.</li>
<li>Number of replicas is changeable on the fly, but the placement group is not
<ul>
<li>For example, if every client on the planet is accessing the same file, you can scale out for that data.</li></ul>
</li>
<li>You don't ask where to go, you just go, which makes this very scalable</li></ul>
</li>
<li>CRUSH is sufficiently advanced to be called magic.
<ul>
<li><math>O(log n)</math> of the size of the data</li>
<li>CPUs stupidly fast, so the above is of minimal overhead, whereas the network, despite being fast, has latency, etc. Computation scales much better than communication.</li></ul>
</li>
<li>Storage is composed of variable-length atoms</li></ul>