Talk:Untrusted Distributed Storage: Difference between revisions
| No edit summary | |||
| (2 intermediate revisions by one other user not shown) | |||
| Line 34: | Line 34: | ||
| #	What was the programming model used in the implementation of Farsite? | #	What was the programming model used in the implementation of Farsite? | ||
| #	What was the biggest disadvantage to the implementation? | #	What was the biggest disadvantage to the implementation? | ||
| From other groups: | |||
| Group 1: | |||
| #       Why use Java? <br>Java is strongly typed and has a built in garbage collector, which makes it easier and faster to develop for. The other reason was that they wanted to use an event driven architecture for the system and the SEDA prototype, SandStorm, was available.<br><br> | |||
| #       How was the inner ring chosen? <br>The "responsible party" publishes sets of failure-independent nodes discovered through offline measurement and analysis. The inner ring is selected from each of the 3f + 1 independent node sets.<br><br> | |||
| #       How big was the prototype able to scale? <br>There are no clear benchmarks, but Pond was outperformed in most of their benchmarks.  | |||
| Group 3: | |||
| #       What is the Byzantine protocol? <br>The Byzantine protocol is a distributed decision process in which all non-faulty participants reach the same decision as long as more than 2/3 of the participants follow the protocol correctly. 1/3 cannot be faulty otherwise the protocol will fail. The protocol requires a quadratic number of participants, so synchronizing is fairly infeasible. Also for Pond, authentication is required in the form of proactive threshold signatures.<br><br> | |||
| #       What is common between OceanStore and GFS in terms of environment requirements?<br> OceanStore and GFS are quite different. OceanStore uses a network overlay, GFS uses a master server. OceanStore uses untrusted servers, GFS uses trusted servers. OceanStore uses hierchical replication, GFS uses lazy replication. In fact, the only comparable aspect is the number of replicas generated by both systems.<br><br> | |||
| #       How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems) <br>One important design decision was to not to support explicit locks or leases on data, and to instead rely on the update model to for consistency, and the atomicity of updates allows locks to be built at the application layer, if they are used. They also added more UIDs to make objects easier to access. They also implemented erasure coding in order to allow more reliable archiving of data. | |||
| Group 4: | |||
| #       What is Tapestry and how does it work? <br>Tapestry is a decentralized object location and routing system; a scalable overlay network, built on TCP/IP and designed to manage the location of resources. Instead of routing to an IP like in TCP/IP, requests are sent to a GUID. Tapestry is also locality aware, so it then routes the message to the physical host that contains the resource closest to the message source, with high probability. Physical hosts can join Tapestry by supplying a GUID to identify itself, so other hosts can route messages to it. Hosts publish their resource GUIDs so other hosts can route messages to these. Tapestry does not restrict hosts from unpublishing, leaving the network or where resources are located on the host.<br><br> | |||
| #       How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems) <br>Refer to Group 3 Q3.<br><br> | |||
| #       What is the difference between the primary and secondary replicas? <br>Each object has one primary and multiple secondary replicas. Primary replica serializes and applies all updates to the object and creates a digital certificate, called a heartbeat, mapping an AGUID to the VGUID of the most recent version. The heartbeat is a tuple containing an AGUID, a VGUID, a timestamp and a version sequence number. The heartbeat are regularly requested to ensure freshness. Primary replicas also enforces access control restrictions and serializes concurrent updates from multiple users. Secondary replicas find a pre-existing replica to serve as a parent, usually a primary replica if there are no other secondaries. Secondary replicas are also the child nodes of dissemination trees with primary replicas as the root, but don't contact the inner ring to handle client requests. The update model of Pond allows updates to propagate from the primary replica to the children secondary replicas. | |||
| Line 86: | Line 104: | ||
| 1) Farsite was designed to look like NTFS.  How do Farsite’s semantics differ from NTFS? | 1) Farsite was designed to look like NTFS.  How do Farsite’s semantics differ from NTFS? | ||
| Ans. | Ans. | ||
| First: Farsite has multi reader single writer policy. Additional attempts to read an open file will receive a handle to a snapshot of the file, it will not change to reflect updates by remote writers. An application can query the Farsite client to find out whether it has a snapshot handle or a true file handle, but this is not part of NTFS semantics. | First: Farsite has multi reader single writer policy. Additional attempts to read an open file will receive a handle to a snapshot of the file, it will not change to reflect updates by remote writers. An application can query the Farsite client to find out whether it has a snapshot handle or a true file handle, but this is not part of NTFS semantics. | ||
| Second: NTFS does not allow a directory to be renamed if there is an open handle on a file in the directory or in any of its descendents. Thus, Farsite instead implements the Unixlike semantics of not name-locking an open file's path. | Second: NTFS does not allow a directory to be renamed if there is an open handle on a file in the directory or in any of its descendents. Thus, Farsite instead implements the Unixlike semantics of not name-locking an open file's path. | ||
| 2) How is the content lease system similar to lease systems in distributed systems we’ve already seen and which is most similar. | 2) How is the content lease system similar to lease systems in distributed systems we’ve already seen and which is most similar. | ||
| 3) What is the scope of Farsite? Could it work as a World Wide file system like OceanStore. | 3) What is the scope of Farsite? Could it work as a World Wide file system like OceanStore. | ||
Latest revision as of 01:02, 6 November 2008
Group 1
Pond:
1) Why use Java?
2) How was the inner-ring chosen?
3) How big was the prototype able to scale?
Farsite:
1) What is convergence cryptography?
2) What are the advantages/disadvantages of not locking the directory name of an open file handle?
3) What assumptions did they make about concurrency in the system and how did they plan to handle it?
Retro:
1) Why did this never move beyond a research project?
2) (in lessons learned) networking turned out to be the limiting factor over disk space.
	Why had they assumed that networking wouldn't be an issue?
3) What planned goals did they achieve?
Group 2
OceanStore
- What was the purpose of introspection in terms of nomadic data?
- How does the less-reliable-but-faster probabalistic lookup work?
- What is a Bloom filter and how is it used in OceanStore?
FarSite
- Farsite was desgined to look like NTFS. How do Farsite's semantics differ from NTFS?
- How is the content lease system similar to lease systems in distributed systems we've already seen, and which is most similar?
- What is the scope of Farsite? Could it work as a world wide filesystem like OceanStore.
Retro
- How did the lease system change between planning and implementation?
- What was the programming model used in the implementation of Farsite?
- What was the biggest disadvantage to the implementation?
From other groups:
Group 1:
- Why use Java? 
 Java is strongly typed and has a built in garbage collector, which makes it easier and faster to develop for. The other reason was that they wanted to use an event driven architecture for the system and the SEDA prototype, SandStorm, was available.
- How was the inner ring chosen? 
 The "responsible party" publishes sets of failure-independent nodes discovered through offline measurement and analysis. The inner ring is selected from each of the 3f + 1 independent node sets.
- How big was the prototype able to scale? 
 There are no clear benchmarks, but Pond was outperformed in most of their benchmarks.
Group 3:
- What is the Byzantine protocol? 
 The Byzantine protocol is a distributed decision process in which all non-faulty participants reach the same decision as long as more than 2/3 of the participants follow the protocol correctly. 1/3 cannot be faulty otherwise the protocol will fail. The protocol requires a quadratic number of participants, so synchronizing is fairly infeasible. Also for Pond, authentication is required in the form of proactive threshold signatures.
- What is common between OceanStore and GFS in terms of environment requirements?
 OceanStore and GFS are quite different. OceanStore uses a network overlay, GFS uses a master server. OceanStore uses untrusted servers, GFS uses trusted servers. OceanStore uses hierchical replication, GFS uses lazy replication. In fact, the only comparable aspect is the number of replicas generated by both systems.
- How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems) 
 One important design decision was to not to support explicit locks or leases on data, and to instead rely on the update model to for consistency, and the atomicity of updates allows locks to be built at the application layer, if they are used. They also added more UIDs to make objects easier to access. They also implemented erasure coding in order to allow more reliable archiving of data.
Group 4:
- What is Tapestry and how does it work? 
 Tapestry is a decentralized object location and routing system; a scalable overlay network, built on TCP/IP and designed to manage the location of resources. Instead of routing to an IP like in TCP/IP, requests are sent to a GUID. Tapestry is also locality aware, so it then routes the message to the physical host that contains the resource closest to the message source, with high probability. Physical hosts can join Tapestry by supplying a GUID to identify itself, so other hosts can route messages to it. Hosts publish their resource GUIDs so other hosts can route messages to these. Tapestry does not restrict hosts from unpublishing, leaving the network or where resources are located on the host.
- How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems) 
 Refer to Group 3 Q3.
- What is the difference between the primary and secondary replicas? 
 Each object has one primary and multiple secondary replicas. Primary replica serializes and applies all updates to the object and creates a digital certificate, called a heartbeat, mapping an AGUID to the VGUID of the most recent version. The heartbeat is a tuple containing an AGUID, a VGUID, a timestamp and a version sequence number. The heartbeat are regularly requested to ensure freshness. Primary replicas also enforces access control restrictions and serializes concurrent updates from multiple users. Secondary replicas find a pre-existing replica to serve as a parent, usually a primary replica if there are no other secondaries. Secondary replicas are also the child nodes of dissemination trees with primary replicas as the root, but don't contact the inner ring to handle client requests. The update model of Pond allows updates to propagate from the primary replica to the children secondary replicas.
Group 3 - Farsite
OceanStore
1) What is convergence cryptography?
Ans. The file key is used to encrypt the hashes rather than to encrypt the file blocks directly.
2) What are the advantages/disadvantages of not locking the directory name of an open file handle?
Ans. Advantages - The results of directory rename operations are not propagated synchronously to all descendent directory groups during the rename operation, because this would unacceptably retard the rename operation, particularly for directories near the root of the namespace tree.
Disadvantages- because they used lazy propagation, other users wouldn’t see the name immediately, also more then one user can change the name at the same time.
3) What assumptions did they make about concurrency in the system and how did they plan to handle it?
Ans. The authors assume that no files are both read by many users and also frequently updated by at least one user. How did they handle it? There are four classes of leases in Farsite: content leases, name leases,mode leases, and access leases.
Retro
1)Whats is the main target environment for farsite?
Ans. The target was governments and universities environments
2) What are the 3 different type of certificates?  And what are their purposes?
Ans. Namespace certificates – associated the root of a file system namespace with a set of machines that manage the root metadata.
User certificates - associates a user with his personal public key, so that the user identity can be validated for access control.
Machine certificates - associates a machine with its own public key, which is used for establishing the validity of the machine as a physically unique resource
3) What is convergence encryption?
Ans. The file key is used to encrypt the hashes rather than to encrypt the file blocks directly.
Ponds
1) Farsite was designed to look like NTFS. How do Farsite’s semantics differ from NTFS?
Ans. First: Farsite has multi reader single writer policy. Additional attempts to read an open file will receive a handle to a snapshot of the file, it will not change to reflect updates by remote writers. An application can query the Farsite client to find out whether it has a snapshot handle or a true file handle, but this is not part of NTFS semantics.
Second: NTFS does not allow a directory to be renamed if there is an open handle on a file in the directory or in any of its descendents. Thus, Farsite instead implements the Unixlike semantics of not name-locking an open file's path.
2) How is the content lease system similar to lease systems in distributed systems we’ve already seen and which is most similar.
3) What is the scope of Farsite? Could it work as a World Wide file system like OceanStore.
Ans. Farsites main scope was that of a univertisy, governement or large company. Analysis points to a scale of approximately 10^5 machines whereas OceanStore is around 10^10
Group 4 - Farsite Retrospective
Some questions were received by paper so they are put here as well as "FROM <GROUP>"
TO OceanStore
- What is their business model?
- What is introspective and what are its many applications?¬
- What are the advantages of using a “Version Control System” over a typical file system model?
TO Pond
- What is Tapestry and how does it work?
- How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems)
- What is the difference between the primary and secondary replicas?
TO FARSITE
- What was the target environment for FARSITE?
- What were the three different types of certificates and what were their purposes?
- What is convergent encryption?
FROM Pond
- How did the lease system change between planning and implementation?
- What are the programming models used in the implementation?
- What was the biggest disadvantage to their implementation?
From FARSITE(?)
- What are the trends in technology that justify FARSITE?
- Were the team members familiar with windows SIS driver?
- What's WebDAV?