Soma-notes - User contributions [en]

Talk:Untrusted Distributed Storage

2008-11-06T01:02:22Z

Bo:

==Group 1==
Pond: 
1) Why use Java? 
2) How was the inner-ring chosen? 
3) How big was the prototype able to scale? 

Farsite: 
1) What is convergence cryptography? 
2) What are the advantages/disadvantages of not locking the directory name of an open file handle? 
3) What assumptions did they make about concurrency in the system and how did they plan to handle it? 

Retro: 
1) Why did this never move beyond a research project? 
2) (in lessons learned) networking turned out to be the limiting factor over disk space. 
Why had they assumed that networking wouldn't be an issue? 
3) What planned goals did they achieve? 

==Group 2==
OceanStore

# What was the purpose of introspection in terms of nomadic data?
# How does the less-reliable-but-faster probabalistic lookup work?
# What is a Bloom filter and how is it used in OceanStore?

FarSite

# Farsite was desgined to look like NTFS. How do Farsite's semantics differ from NTFS?
# How is the content lease system similar to lease systems in distributed systems we've already seen, and which is most similar?
# What is the scope of Farsite? Could it work as a world wide filesystem like OceanStore.

Retro

# How did the lease system change between planning and implementation?
# What was the programming model used in the implementation of Farsite?
# What was the biggest disadvantage to the implementation?

From other groups:

Group 1:
# Why use Java? Java is strongly typed and has a built in garbage collector, which makes it easier and faster to develop for. The other reason was that they wanted to use an event driven architecture for the system and the SEDA prototype, SandStorm, was available. 
# How was the inner ring chosen? The "responsible party" publishes sets of failure-independent nodes discovered through offline measurement and analysis. The inner ring is selected from each of the 3f + 1 independent node sets. 
# How big was the prototype able to scale? There are no clear benchmarks, but Pond was outperformed in most of their benchmarks.

Group 3:
# What is the Byzantine protocol? The Byzantine protocol is a distributed decision process in which all non-faulty participants reach the same decision as long as more than 2/3 of the participants follow the protocol correctly. 1/3 cannot be faulty otherwise the protocol will fail. The protocol requires a quadratic number of participants, so synchronizing is fairly infeasible. Also for Pond, authentication is required in the form of proactive threshold signatures. 
# What is common between OceanStore and GFS in terms of environment requirements? OceanStore and GFS are quite different. OceanStore uses a network overlay, GFS uses a master server. OceanStore uses untrusted servers, GFS uses trusted servers. OceanStore uses hierchical replication, GFS uses lazy replication. In fact, the only comparable aspect is the number of replicas generated by both systems. 
# How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems) One important design decision was to not to support explicit locks or leases on data, and to instead rely on the update model to for consistency, and the atomicity of updates allows locks to be built at the application layer, if they are used. They also added more UIDs to make objects easier to access. They also implemented erasure coding in order to allow more reliable archiving of data.

Group 4:
# What is Tapestry and how does it work? Tapestry is a decentralized object location and routing system; a scalable overlay network, built on TCP/IP and designed to manage the location of resources. Instead of routing to an IP like in TCP/IP, requests are sent to a GUID. Tapestry is also locality aware, so it then routes the message to the physical host that contains the resource closest to the message source, with high probability. Physical hosts can join Tapestry by supplying a GUID to identify itself, so other hosts can route messages to it. Hosts publish their resource GUIDs so other hosts can route messages to these. Tapestry does not restrict hosts from unpublishing, leaving the network or where resources are located on the host. 
# How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems) Refer to Group 3 Q3. 
# What is the difference between the primary and secondary replicas? Each object has one primary and multiple secondary replicas. Primary replica serializes and applies all updates to the object and creates a digital certificate, called a heartbeat, mapping an AGUID to the VGUID of the most recent version. The heartbeat is a tuple containing an AGUID, a VGUID, a timestamp and a version sequence number. The heartbeat are regularly requested to ensure freshness. Primary replicas also enforces access control restrictions and serializes concurrent updates from multiple users. Secondary replicas find a pre-existing replica to serve as a parent, usually a primary replica if there are no other secondaries. Secondary replicas are also the child nodes of dissemination trees with primary replicas as the root, but don't contact the inner ring to handle client requests. The update model of Pond allows updates to propagate from the primary replica to the children secondary replicas.

==Group 3 - Farsite==

OceanStore

1) What is convergence cryptography?

Ans. The file key is used to encrypt the hashes rather than to encrypt the file blocks directly.

2) What are the advantages/disadvantages of not locking the directory name of an open file handle?

Ans. Advantages - The results of directory rename operations are not propagated synchronously
to all descendent directory groups during the rename operation, because this would unacceptably
retard the rename operation, particularly for directories near the root of the namespace tree.

Disadvantages- because they used lazy propagation, other users wouldn’t see the name immediately, also
more then one user can change the name at the same time.

3) What assumptions did they make about concurrency in the system and how did they plan to handle it?

Ans. The authors assume that no files are both read by many users and also frequently updated by at least one user.
How did they handle it? There are four classes of leases in Farsite: content leases, name leases,mode leases, and
access leases.

Retro

1)Whats is the main target environment for farsite?

Ans. The target was governments and universities environments

2) What are the 3 different type of certificates? And what are their purposes?

Ans. Namespace certificates – associated the root of a file system namespace with a set of machines that manage the root metadata.

User certificates - associates a user with his personal public key, so that the user identity can be validated for access control.

Machine certificates - associates a machine with its own public key, which is used for establishing the validity of the machine as a physically unique resource

3) What is convergence encryption?

Ans. The file key is used to encrypt the hashes rather than to encrypt the file blocks directly.

Ponds

1) Farsite was designed to look like NTFS. How do Farsite’s semantics differ from NTFS?

Ans.
First: Farsite has multi reader single writer policy. Additional attempts to read an open file will receive a handle to a snapshot of the file, it will not change to reflect updates by remote writers. An application can query the Farsite client to find out whether it has a snapshot handle or a true file handle, but this is not part of NTFS semantics.

Second: NTFS does not allow a directory to be renamed if there is an open handle on a file in the directory or in any of its descendents. Thus, Farsite instead implements the Unixlike semantics of not name-locking an open file's path.

2) How is the content lease system similar to lease systems in distributed systems we’ve already seen and which is most similar.

3) What is the scope of Farsite? Could it work as a World Wide file system like OceanStore.

Ans. Farsites main scope was that of a univertisy, governement or large company. Analysis points to a scale of approximately 10^5 machines whereas OceanStore is around 10^10

==Group 4 - Farsite Retrospective==
Some questions were received by paper so they are put here as well as "FROM <GROUP>"

TO OceanStore
# What is their business model?
# What is introspective and what are its many applications?¬
# What are the advantages of using a “Version Control System” over a typical file system model?

TO Pond
# What is Tapestry and how does it work?
# How did they meet the expectations of the original OceanStore paper and vision? (i.e. how did they solve design problems)
# What is the difference between the primary and secondary replicas?

TO FARSITE
# What was the target environment for FARSITE?
# What were the three different types of certificates and what were their purposes?
# What is convergent encryption?

FROM Pond

# How did the lease system change between planning and implementation?
# What are the programming models used in the implementation?
# What was the biggest disadvantage to their implementation?

From FARSITE(?)

# What are the trends in technology that justify FARSITE?
# Were the team members familiar with windows SIS driver?
# What's WebDAV?

Talk:Distributed Shared Memory

2008-09-26T20:35:57Z

Bo:

Here is the page where we will be discussing the DSM readings.

== IVY ==
[[User:Soma|Anil]]: What were the key characteristics of IVY? What exactly did Kai Li build?

[[user:Alireza|Alireza]] : IVY was a software based DSM system that's been developed to allow users share their local memories in a distributed manner. IVY was designed to be used in loosely coupled environments. It had five main modules including memory allocation, process management, initialization, remote operation and memory mapping. The main advantage of IVY was gaining performance in parallel applications comparing.

[[user:Alireza|Alireza]] :(Question) Name some of the applications that you would think benefit from using IVY environment? Distributed Database system is the one that is mentioned in the dissertation. Thinks of something different.

[[user:Azalia|Azalia]]:(Answer) Some of the current sample applications can be (CRM) Customer Relationship Management or (ERP) Enterprise Resource Planning applications that serve multiple users across an organization.Another example, imagine a billing system that has to calculate the telephone bills of thousands of customers can benefit from this environment by calculating the bill of multiple customers at the same time on different machines distributed across the network. Even though, each customer bill calculation can be done separately, using a shared memory space for reading input values like cost per minute, or cost per text message can be very useful. In addition, since each customer bill is a separate object the write operation is done in different pages of the shared memory and even using multiple writer algorithm, in this case, does not introduce any concurrency issue.

[[user:Azalia|Azalia]]:(Question)
What is the potential problem in centralized manager algorithm? What is the alternative algorithm?

[[user:Abecevello|Abecevello]]:(Answer)
There are actually several potential problems with using a centralized manager algorithm. Actually, we can even get philosophical about this question (and possible something not covered in the paper). A centralized manager algorithm represents a single point of failure. If the machine running the centralized algorithm fails, then the entire distributed system fails. A centralized component in a distributed system (not even a distributed shared memory system) is probably against the whole point of making a distributed system. You distributed a system, so why not distribute all of it? Especially today, distributed systems are made so that if one computer fails, the entire system doesn't fail. To answer the second question, the alternative is of course a distributed manager algorithm. Of course, in the Kai Li paper the centralized manager problem was discussed in terms of performance, but also, this goes in terms of centralized vs. distributed. In a distributed algorithm there are multiple machines to do the work, so is it not conceivable that it would be faster overall?

[[user:AlexC|AlexC]]:(Answer) Certainly the central point of failure is a serious problem. It also represents a clear target for an attacker (as opposed to unintentional failure). This is the main reason why these massive spamming botnets have been so successful, they are effective distributed systems, there is no clear point to attack to disable it. More specifically to DSM implementations, the driving reason reason for a distributed manager algorithm was not to avoid a central point of failure (although it was likely a secondary consideration). The primary reason was to avoid a performance bottleneck.

[[user:RobertB|RobertB]]:(Answer) While some form of distributed (or hybrid) manager algorithm is the alternative to a centralized one, it too has its own set of issues that need to be considered. With decisions being made in several locations the chances of data corruption, incorrect reads, preemptive writes, etc are all potentially increased. Thus the implementation of a distributed manager would be a lot more complex and time-consuming.. not to mention difficult to debug, which may impact one's decision to go that route.

[[user:adam_k|Adam_K]]: (Answer) An important characteristic of DSM in general is to make use of processor cycles and memory resources in a loosely coupled system that otherwise would remain idle. This is desirable for more than just speeding up divide and conquer scenarios, but also for creating a capacity of memory larger than any individual system could accomodate. Such an environment is neccessary for certain applications, involving very large models or simulations where it may not be practical to partition the data set.

== Current DSM systems? ==

[[User:soma|Anil]]: What is a current production system that uses distributed shared memory? What about the underlying problem makes DSM a good technological fit?

[[user:Azalia|Azalia]]:(Answer) What is a current production system that uses distributed shared memory?
Any application with complex independent steps that can be parallelized would be suitable for DSM environment. Some of the current sample applications can be (CRM) Customer Relationship Management or (ERP) Enterprise Resource Planning applications that serve multiple users across an organization.
What about the underlying problem makes DSM a good technological fit?
Apart from DSM there are alternative methods for using in distributed environment (e.g. RPC and message passing), they have some inadequacies that DSM has been introduced to address them. For instance, message passing and RPC, have difficulties in sending complex data structures and pointers over the network due to different memory address spaces. The distributed shared memory can be a solution of this problem since all the processors share the same memory address space. In addition, if we consider current RPC technologies like Web Services, we'll realize that for each task we have to pack and send a lot of XML data around. With DSM we can share a memory space and prevent overloading network by sending XML messages.

[[user:Colin|Colin]]:(Answer) Any systems with a great deal of variability in load on its processors could benefit from DSM. This is because the unified address space makes process migration, and thus load balancing, simpler.
(Question) How much more efficient is the movement of data across the networks on a system that implements DSM? Does it not send a comparable amount of data on a page fault as message passing or RPC would to invoke a remote call?

[[user:AlexC|AlexC]]:(Answer) Cluster computing comes to mind. It depends what type of cluster is implemented, a cluster built for High Availabilty would likely not implement DSM because the primary way to accomplish high availabily is redundency. A cluster build for Load Balancing may implement DSM if it was load balancing large computations with shared data, it would be a good candidate. However if the cluster was built for load balancing many small jobs (say a really big webserver), then it would not be a good DSM candidate. If a cluster was built for large computations then it would certainly be a good candidate for DSM.
While looking at clusters I also noticed a Java implementation for cluster computing call JavaSpace that implements distributed shared memory. However instead of sharing memory pages, it shares java objects.

[[user:eltonc|eltonc]]: (Answer) I am guessing the computers that are used to compute or verify the largest prime number use DSM.

[[user:RobertB|RobertB]]: (Answer) The Kerrighed software project provides DSM as one of its features. As with IVY it has a sequential consistency model. It seems to be used for numerical analysis, and other large-scale scientific projects. FASTLINK is an application running on Tread Marks, another DSM implementation, which is used for genetic linkage analysis.

[[user:NeilDickson|NeilDickson]]: (Answer) To ask a slightly more blunt wording of the question: "Is there any large, useful problem to solve that wouldn't be better solved either on a standard HPC cluster (using MPI) or a standard distribution model like SETI@Home or Folding@Home?" My pseudo-answer to this question is probably not, and if such a problem is out there somewhere, we haven't found it yet. Regardless of whether there is some convoluted example where DSM has good convenience & performance characteristics, the key is that there needs to be a problem that matters before it can be called "useful". There's no doubt that DSM can work, but I'm not convinced it's worth any mind. As an example, as much as computing the quadrillionth bit of pi might be a humourous way to kill some time, 99.999% of the world doesn't care. Maybe DSM researchers should focus on finding some "problem" for their "solution"?

== Difference between DSM and NUMA? ==

[[User:Soma|Anil]] What are the differences between DSM and NUMA? Under what circumstances are each appropriate?

[[User:Alirez|Alireza]]: NUMA follows SMP paradigm where there is common memory bus for accessing shared memories. In addition, one of the most important aspects of the NUMA is that it provides different access time for the processers based on their locations. For instance local processors can have faster access to local memories. In addition, NUMA access to memory is hardware based.

[[User: Joshua Tessier|Joshua Tessier]]: Correct me if I'm wrong but NUMA is basically a type of DSM. In a NUMA system, each processor has access to a common memory, however this common memory is distributed across each of the processors. For example, if there are 8 processors, the total memory is divided into 8 sections. As stated above, the processors have different access times to memory stores. Meanwhile, DSM is just dynamic shared memory; not a specific type like NUMA.

[[User:ABecevello|ABecevello]]: It seems from doing some background reading on NUMA (since I noticed the articles didn't really describe it very well), that NUMA is primarily used for memory access of multiple processors in a single computer. DSM as described sounds like its designed for memory accesses between computers. From my reading, it looks like NUMA runs best when multiple processors do NOT attempt to access the same memory all at the same time.

[[User:eltonc|eltonc]]: From what I have read, the NUMA architechture first tries to see if the data requested can be found in local memory, but since there is a limitation on how much local memory a computer can have, it then tries to access the remote memory (memory on a different node than the CPU currently running the process). This remote memory is usually on the same machine where as with DSM, the memory is usually on a different computer.

[[User:laszlo|Laszlo]] NUMA and DSM both include separate memory for each processor for fast access, as well as the ability to access memory from other processors' memories. The different is that DSM is distributed and NUMA is not. NUMA does provide a means for using more than one processor to bypass the performance bottleneck, however NUMA was designed for SMP machines, where all the processors are linked together within the same physical machine. The main point of DSM is to have memory shared between physically distinct machines. This provides both the fault tolerance (a single multiprocessor machine will fail all at once), as well as the flexibility to add or remove machines to make the distributed system bigger or smaller. It is discussed in the paper that DSM would allow the extra cycles on each workstation to be made use of when the primary user was not making use of their workstation. NUMA does not have the option to do this, because a single multiprocessor machine cannot be used like separate workstations can. Additionally a DSM system simplifies the use of other system components such as the hard disk. Since each processor has their own peripherals, disks and circuits there may be less problems using these resources.

== DSM Implementations? ==
[[User:Azalia|Azalia]]:(Question) What are the different types of DSM Implementations?

[[User:Ywahyudi|Yohan]]:(Answer) There are 3 different types of DSM implementation. The first one is Software-level implementation which can be achieved in user-level, run-time library routine, the OS, or the programming language, for example IVY, Mermaid, Munin, etc. The second one is Hardware-level implementation which ensures that automatic replication of shared data in local memories and processors caches, transparently for software-layer, for example Memnet, Dash, SCI KSR1, etc. Since software is used in hardware support to optimize memory reference, and hardware is used in software solution such as virtual memory management, then the third one is Hybrid Level Implementation which is a combination of both implementation. Several examples of such implementation are Plus, Galactica Net, Alewife, etc.

[[User:Joshua Tessier|Joshua Tessier]]:(Question) Does the hybrid solution hold much relevance today? From what I got in the paper, it came to light due to some limitations of the hardware/OS layers at the time. Today, we have a ton of different tools at our disposal and these limitations are no longer present. How would such a solution be divided today?

[[User:William_Wilson|William]]:(Answer) While the general need for DSM systems may be reducing, when they are desired, the hybrid solution does hold some weight today. It is of course a balance of scalability vs. performance, I don't think there will ever be a day when software performance will ever exceed hardware. In order to reduce latencies a hybrid approach would be the most beneficial, but only in the correct circumstances. Software implementations make it much easier to integrate (especially those which do not modify OS functionality), while pure hardware, although costs have come down, would not be all that economical as the system would involve a lot of custom hardware and thus custom software to get the proper benefit from it. This would allow the hardware to perform certain processes, in parallel with the software, much like the internal workings of a single computer. The unfortunate downside would be for programmers, updating hardware and software simultaneously and efficiently is much more difficult than updating software alone and of course the money involved to develop such a system.

[[user:Alireza|Alireza]] : (Answer) I believe one of the main problems of utilizing any kind of hardware solution is portability and adaptability issue. As soon as we start combining hardware with our solution we introduce a factor that requires special machines and thus reduce the rate of adaptability of our solution. Obviously, customized hardware can surpass software running on general purpose hardware from performance perspective but we should always ask this question during early stages of our design: Do we want a special purpose solution or an adaptable solution?

[[user:ABecevello|ABecevello]]:(Question) What security features does DSM provide? More specifically, do the systems described in the papers offer any mechanisms to ensure untrusted machines cannot join the DSM system? (For general knowlege...)Do any current DSM systems offer additional security features?

[[user:Azalia|Azalia]] : (Answer) As far as the discussed material in the dissertation and the other paper, I did not notice any discussion about DSM security. However, I did some further research about this and found this paper : (http://www-static.cc.gatech.edu/~milos/rogers_pact06.pdf) This paper has been published in 2006 and according to the paper it is the first work related to DSM security.

== Performance vs. Ease of Use ==

[[User:NeilDickson|NeilDickson]]: (Inquiry for Opinions) Is it just me, or does the first paper discuss performance '''way''' too much considering that it can't provide better performance than message passing (by definition, since it uses message passing underneath)? I suppose that it's important for it to claim/show that the performance isn't too bad relative to message passing (which I can't tell because he badly botched his "real" performance analysis in section 6), but unlike the second paper, he never explains the only major benefit of DSM, which is ease of use. There's mention that it'd be better to use than message passing, but he doesn't explain how or why very clearly, whereas the second paper immediately addresses that well.

A semi-related question: did he actually get a PhD for this paper? If so, it speaks to why Yale is not known for computer science.

[[User:Ywahyudi|Yohan]]: (Question) Speaking about performance and reliability of a system, we know that performance is a huge issue for Li. And on chapter 7.4, he said that he didn't take reliability aspect into consideration at all. However, we all know that a fast application which has an almost linear running time will be useless if that application cannot deliver the expected result. So my question is, is performance a lot more important comparing to reliability? At least during Li's era? and What aspects (during Li's era) made performance of a system more important than its reliability? (Recall that when RPC was first introduced, reliability and security were not addressed too)

[[User:BoWang|Bo]]: (Answer) Performance and reliability more or less go hand in hand. If the RPC implementation loses packets, those packets would need to be resent and the programs would hang until some result was received, which would decrease the performance. Li's main reason for not factoring in reliability is the fact that his experiment set was reproducible such that if one computation failed to return, he would just recompute it.

[[user:adam_k|Adam_K]]: (Question) Regarding reliability more specifically: how does DSM deal with node failure? That is, if a node with the only copy of a certain block of the shared address space, without explicit redundancy, how does the system gaurd against loss of data?