Talk:Distributed Shared Memory: Difference between revisions

From Soma-notes
No edit summary
Line 73: Line 73:


A semi-related question: did he actually get a PhD for this paper?  If so, it speaks to why Yale is not known for computer science.
A semi-related question: did he actually get a PhD for this paper?  If so, it speaks to why Yale is not known for computer science.
[[User:Ywahyudi|Yohan]]: (Question) Speaking about performance and reliability of a system, we know that performance is a huge issue for Li. And on chapter 7.4, he said that he didn't take reliability aspect into consideration at all. However, we all know that a fast application which has an almost linear running time will be useless if that application cannot deliver the expected result. So my question is, is performance a lot more important comparing to reliability? At least during Li's era? and What aspects (during Li's era) made performance of a system more important than its reliability? (Recall that when RPC was first introduced, reliability and security were not addressed too)

Revision as of 19:40, 26 September 2008

Here is the page where we will be discussing the DSM readings.

IVY

Anil: What were the key characteristics of IVY? What exactly did Kai Li build?

Alireza : IVY was a software based DSM system that's been developed to allow users share their local memories in a distributed manner. IVY was designed to be used in loosely coupled environments. It had five main modules including memory allocation, process management, initialization, remote operation and memory mapping. The main advantage of IVY was gaining performance in parallel applications comparing.

Alireza :(Question) Name some of the applications that you would think benefit from using IVY environment? Distributed Database system is the one that is mentioned in the dissertation. Thinks of something different.

Azalia:(Answer) Some of the current sample applications can be (CRM) Customer Relationship Management or (ERP) Enterprise Resource Planning applications that serve multiple users across an organization.Another example, imagine a billing system that has to calculate the telephone bills of thousands of customers can benefit from this environment by calculating the bill of multiple customers at the same time on different machines distributed across the network. Even though, each customer bill calculation can be done separately, using a shared memory space for reading input values like cost per minute, or cost per text message can be very useful. In addition, since each customer bill is a separate object the write operation is done in different pages of the shared memory and even using multiple writer algorithm, in this case, does not introduce any concurrency issue.

Azalia:(Question) What is the potential problem in centralized manager algorithm? What is the alternative algorithm?

Abecevello:(Answer) There are actually several potential problems with using a centralized manager algorithm. Actually, we can even get philosophical about this question (and possible something not covered in the paper). A centralized manager algorithm represents a single point of failure. If the machine running the centralized algorithm fails, then the entire distributed system fails. A centralized component in a distributed system (not even a distributed shared memory system) is probably against the whole point of making a distributed system. You distributed a system, so why not distribute all of it? Especially today, distributed systems are made so that if one computer fails, the entire system doesn't fail. To answer the second question, the alternative is of course a distributed manager algorithm. Of course, in the Kai Li paper the centralized manager problem was discussed in terms of performance, but also, this goes in terms of centralized vs. distributed. In a distributed algorithm there are multiple machines to do the work, so is it not conceivable that it would be faster overall?

AlexC:(Answer) Certainly the central point of failure is a serious problem. It also represents a clear target for an attacker (as opposed to unintentional failure). This is the main reason why these massive spamming botnets have been so successful, they are effective distributed systems, there is no clear point to attack to disable it. More specifically to DSM implementations, the driving reason reason for a distributed manager algorithm was not to avoid a central point of failure (although it was likely a secondary consideration). The primary reason was to avoid a performance bottleneck.

RobertB:(Answer) While some form of distributed (or hybrid) manager algorithm is the alternative to a centralized one, it too has its own set of issues that need to be considered. With decisions being made in several locations the chances of data corruption, incorrect reads, preemptive writes, etc are all potentially increased. Thus the implementation of a distributed manager would be a lot more complex and time-consuming.. not to mention difficult to debug, which may impact one's decision to go that route.

Current DSM systems?

Anil: What is a current production system that uses distributed shared memory? What about the underlying problem makes DSM a good technological fit?

Azalia:(Answer) What is a current production system that uses distributed shared memory? Any application with complex independent steps that can be parallelized would be suitable for DSM environment. Some of the current sample applications can be (CRM) Customer Relationship Management or (ERP) Enterprise Resource Planning applications that serve multiple users across an organization. What about the underlying problem makes DSM a good technological fit? Apart from DSM there are alternative methods for using in distributed environment (e.g. RPC and message passing), they have some inadequacies that DSM has been introduced to address them. For instance, message passing and RPC, have difficulties in sending complex data structures and pointers over the network due to different memory address spaces. The distributed shared memory can be a solution of this problem since all the processors share the same memory address space. In addition, if we consider current RPC technologies like Web Services, we'll realize that for each task we have to pack and send a lot of XML data around. With DSM we can share a memory space and prevent overloading network by sending XML messages.

Colin:(Answer) Any systems with a great deal of variability in load on its processors could benefit from DSM. This is because the unified address space makes process migration, and thus load balancing, simpler. (Question) How much more efficient is the movement of data across the networks on a system that implements DSM? Does it not send a comparable amount of data on a page fault as message passing or RPC would to invoke a remote call?

AlexC:(Answer) Cluster computing comes to mind. It depends what type of cluster is implemented, a cluster built for High Availabilty would likely not implement DSM because the primary way to accomplish high availabily is redundency. A cluster build for Load Balancing may implement DSM if it was load balancing large computations with shared data, it would be a good candidate. However if the cluster was built for load balancing many small jobs (say a really big webserver), then it would not be a good DSM candidate. If a cluster was built for large computations then it would certainly be a good candidate for DSM. While looking at clusters I also noticed a Java implementation for cluster computing call JavaSpace that implements distributed shared memory. However instead of sharing memory pages, it shares java objects.

eltonc: (Answer) I am guessing the computers that are used to compute or verify the largest prime number use DSM.

RobertB: (Answer) The Kerrighed software project provides DSM as one of its features. As with IVY it has a sequential consistency model. It seems to be used for numerical analysis, and other large-scale scientific projects. FASTLINK is an application running on Tread Marks, another DSM implementation, which is used for genetic linkage analysis.

NeilDickson: (Answer) To ask a slightly more blunt wording of the question: "Is there any large, useful problem to solve that wouldn't be better solved either on a standard HPC cluster (using MPI) or a standard distribution model like SETI@Home or Folding@Home?" My pseudo-answer to this question is probably not, and if such a problem is out there somewhere, we haven't found it yet. Regardless of whether there is some convoluted example where DSM has good convenience & performance characteristics, the key is that there needs to be a problem that matters before it can be called "useful". There's no doubt that DSM can work, but I'm not convinced it's worth any mind. As an example, as much as computing the quadrillionth bit of pi might be a humourous way to kill some time, 99.999% of the world doesn't care. Maybe DSM researchers should focus on finding some "problem" for their "solution"?

Difference between DSM and NUMA?

Anil What are the differences between DSM and NUMA? Under what circumstances are each appropriate?

Alireza: NUMA follows SMP paradigm where there is common memory bus for accessing shared memories. In addition, one of the most important aspects of the NUMA is that it provides different access time for the processers based on their locations. For instance local processors can have faster access to local memories. In addition, NUMA access to memory is hardware based.

Joshua Tessier: Correct me if I'm wrong but NUMA is basically a type of DSM. In a NUMA system, each processor has access to a common memory, however this common memory is distributed across each of the processors. For example, if there are 8 processors, the total memory is divided into 8 sections. As stated above, the processors have different access times to memory stores. Meanwhile, DSM is just dynamic shared memory; not a specific type like NUMA.

ABecevello: It seems from doing some background reading on NUMA (since I noticed the articles didn't really describe it very well), that NUMA is primarily used for memory access of multiple processors in a single computer. DSM as described sounds like its designed for memory accesses between computers. From my reading, it looks like NUMA runs best when multiple processors do NOT attempt to access the same memory all at the same time.

eltonc: From what I have read, the NUMA architechture first tries to see if the data requested can be found in local memory, but since there is a limitation on how much local memory a computer can have, it then tries to access the remote memory (memory on a different node than the CPU currently running the process). This remote memory is usually on the same machine where as with DSM, the memory is usually on a different computer.

DSM Implementations?

Azalia:(Question) What are the different types of DSM Implementations?

Yohan:(Answer) There are 3 different types of DSM implementation. The first one is Software-level implementation which can be achieved in user-level, run-time library routine, the OS, or the programming language, for example IVY, Mermaid, Munin, etc. The second one is Hardware-level implementation which ensures that automatic replication of shared data in local memories and processors caches, transparently for software-layer, for example Memnet, Dash, SCI KSR1, etc. Since software is used in hardware support to optimize memory reference, and hardware is used in software solution such as virtual memory management, then the third one is Hybrid Level Implementation which is a combination of both implementation. Several examples of such implementation are Plus, Galactica Net, Alewife, etc.

Joshua Tessier:(Question) Does the hybrid solution hold much relevance today? From what I got in the paper, it came to light due to some limitations of the hardware/OS layers at the time. Today, we have a ton of different tools at our disposal and these limitations are no longer present. How would such a solution be divided today?

William:(Answer) While the general need for DSM systems may be reducing, when they are desired, the hybrid solution does hold some weight today. It is of course a balance of scalability vs. performance, I don't think there will ever be a day when software performance will ever exceed hardware. In order to reduce latencies a hybrid approach would be the most beneficial, but only in the correct circumstances. Software implementations make it much easier to integrate (especially those which do not modify OS functionality), while pure hardware, although costs have come down, would not be all that economical as the system would involve a lot of custom hardware and thus custom software to get the proper benefit from it. This would allow the hardware to perform certain processes, in parallel with the software, much like the internal workings of a single computer. The unfortunate downside would be for programmers, updating hardware and software simultaneously and efficiently is much more difficult than updating software alone and of course the money involved to develop such a system.

Alireza : (Answer) I believe one of the main problems of utilizing any kind of hardware solution is portability and adaptability issue. As soon as we start combining hardware with our solution we introduce a factor that requires special machines and thus reduce the rate of adaptability of our solution. Obviously, customized hardware can surpass software running on general purpose hardware from performance perspective but we should always ask this question during early stages of our design: Do we want a special purpose solution or an adaptable solution?

ABecevello:(Question) What security features does DSM provide? More specifically, do the systems described in the papers offer any mechanisms to ensure untrusted machines cannot join the DSM system? (For general knowlege...)Do any current DSM systems offer additional security features?

Azalia : (Answer) As far as the discussed material in the dissertation and the other paper, I did not notice any discussion about DSM security. However, I did some further research about this and found this paper : (http://www-static.cc.gatech.edu/~milos/rogers_pact06.pdf) This paper has been published in 2006 and according to the paper it is the first work related to DSM security.

Performance vs. Ease of Use

NeilDickson: (Inquiry for Opinions) Is it just me, or does the first paper discuss performance way too much considering that it can't provide better performance than message passing (by definition, since it uses message passing underneath)? I suppose that it's important for it to claim/show that the performance isn't too bad relative to message passing (which I can't tell because he badly botched his "real" performance analysis in section 6), but unlike the second paper, he never explains the only major benefit of DSM, which is ease of use. There's mention that it'd be better to use than message passing, but he doesn't explain how or why very clearly, whereas the second paper immediately addresses that well.

A semi-related question: did he actually get a PhD for this paper? If so, it speaks to why Yale is not known for computer science.

Yohan: (Question) Speaking about performance and reliability of a system, we know that performance is a huge issue for Li. And on chapter 7.4, he said that he didn't take reliability aspect into consideration at all. However, we all know that a fast application which has an almost linear running time will be useless if that application cannot deliver the expected result. So my question is, is performance a lot more important comparing to reliability? At least during Li's era? and What aspects (during Li's era) made performance of a system more important than its reliability? (Recall that when RPC was first introduced, reliability and security were not addressed too)