Talk:Distributed Shared Memory

Here is the page where we will be discussing the DSM readings.

IVY

Anil: What were the key characteristics of IVY? What exactly did Kai Li build?

Alireza : IVY was a software based DSM system that's been developed to allow users share their local memories in a distributed manner. IVY was designed to be used in loosely coupled environments. It had five main modules including memory allocation, process management, initialization, remote operation and memory mapping. The main advantage of IVY was gaining performance in parallel applications comparing.

Alireza :(Question) Name some of the applications that you would think benefit from using IVY environment? Distributed Database system is the one that is mentioned in the dissertation. Thinks of something different.

Azalia:(Answer) Some of the current sample applications can be (CRM) Customer Relationship Management or (ERP) Enterprise Resource Planning applications that serve multiple users across an organization.Another example, imagine a billing system that has to calculate the telephone bills of thousands of customers can benefit from this environment by calculating the bill of multiple customers at the same time on different machines distributed across the network. Even though, each customer bill calculation can be done separately, using a shared memory space for reading input values like cost per minute, or cost per text message can be very useful. In addition, since each customer bill is a separate object the write operation is done in different pages of the shared memory and even using multiple writer algorithm, in this case, does not introduce any concurrency issue.

Azalia:(Question) What is the potential problem in centralized manager algorithm? What is the alternative algorithm?

Abecevello:(Answer) There are actually several potential problems with using a centralized manager algorithm. Actually, we can even get philosophical about this question (and possible something not covered in the paper). A centralized manager algorithm represents a single point of failure. If the machine running the centralized algorithm fails, then the entire distributed system fails. A centralized component in a distributed system (not even a distributed shared memory system) is probably against the whole point of making a distributed system. You distributed a system, so why not distribute all of it? Especially today, distributed systems are made so that if one computer fails, the entire system doesn't fail. To answer the second question, the alternative is of course a distributed manager algorithm. Of course, in the Kai Li paper the centralized manager problem was discussed in terms of performance, but also, this goes in terms of centralized vs. distributed. In a distributed algorithm there are multiple machines to do the work, so is it not conceivable that it would be faster overall?

AlexC:(Answer) Certainly the central point of failure is a serious problem. It also represents a clear target for an attacker (as opposed to unintentional failure). This is the main reason why these massive spamming botnets have been so successful, they are effective distributed systems, there is no clear point to attack to disable it. More specifically to DSM implementations, the driving reason reason for a distributed manager algorithm was not to avoid a central point of failure (although it was likely a secondary consideration). The primary reason was to avoid a performance bottleneck.

RobertB:(Answer) While some form of distributed (or hybrid) manager algorithm is the alternative to a centralized one, it too has its own set of issues that need to be considered. With decisions being made in several locations the chances of data corruption, incorrect reads, preemptive writes, etc are all potentially increased. Thus the implementation of a distributed manager would be a lot more complex and time-consuming.. not to mention difficult to debug, which may impact one's decision to go that route.

Adam_K: (Answer) An important characteristic of DSM in general is to make use of processor cycles and memory resources in a loosely coupled system that otherwise would remain idle. This is desirable for more than just speeding up divide and conquer scenarios, but also for creating a capacity of memory larger than any individual system could accomodate. Such an environment is neccessary for certain applications, involving very large models or simulations where it may not be practical to partition the data set.

Yacoub A.:(Question) How would several processes trying to access the same data block in the dynamic distributed manager algorithm degrade significantly the performance of the system?

Current DSM systems?

Anil: What is a current production system that uses distributed shared memory? What about the underlying problem makes DSM a good technological fit?

Azalia:(Answer) What is a current production system that uses distributed shared memory? Any application with complex independent steps that can be parallelized would be suitable for DSM environment. Some of the current sample applications can be (CRM) Customer Relationship Management or (ERP) Enterprise Resource Planning applications that serve multiple users across an organization. What about the underlying problem makes DSM a good technological fit? Apart from DSM there are alternative methods for using in distributed environment (e.g. RPC and message passing), they have some inadequacies that DSM has been introduced to address them. For instance, message passing and RPC, have difficulties in sending complex data structures and pointers over the network due to different memory address spaces. The distributed shared memory can be a solution of this problem since all the processors share the same memory address space. In addition, if we consider current RPC technologies like Web Services, we'll realize that for each task we have to pack and send a lot of XML data around. With DSM we can share a memory space and prevent overloading network by sending XML messages.

Colin:(Answer) Any systems with a great deal of variability in load on its processors could benefit from DSM. This is because the unified address space makes process migration, and thus load balancing, simpler. (Question) How much more efficient is the movement of data across the networks on a system that implements DSM? Does it not send a comparable amount of data on a page fault as message passing or RPC would to invoke a remote call?

AlexC:(Answer) Cluster computing comes to mind. It depends what type of cluster is implemented, a cluster built for High Availabilty would likely not implement DSM because the primary way to accomplish high availabily is redundency. A cluster build for Load Balancing may implement DSM if it was load balancing large computations with shared data, it would be a good candidate. However if the cluster was built for load balancing many small jobs (say a really big webserver), then it would not be a good DSM candidate. If a cluster was built for large computations then it would certainly be a good candidate for DSM. While looking at clusters I also noticed a Java implementation for cluster computing call JavaSpace that implements distributed shared memory. However instead of sharing memory pages, it shares java objects.

eltonc: (Answer) I am guessing the computers that are used to compute or verify the largest prime number use DSM.

RobertB: (Answer) The Kerrighed software project provides DSM as one of its features. As with IVY it has a sequential consistency model. It seems to be used for numerical analysis, and other large-scale scientific projects. FASTLINK is an application running on Tread Marks, another DSM implementation, which is used for genetic linkage analysis.

NeilDickson: (Answer) To ask a slightly more blunt wording of the question: "Is there any large, useful problem to solve that wouldn't be better solved either on a standard HPC cluster (using MPI) or a standard distribution model like SETI@Home or Folding@Home?" My pseudo-answer to this question is probably not, and if such a problem is out there somewhere, we haven't found it yet. Regardless of whether there is some convoluted example where DSM has good convenience & performance characteristics, the key is that there needs to be a problem that matters before it can be called "useful". There's no doubt that DSM can work, but I'm not convinced it's worth any mind. As an example, as much as computing the quadrillionth bit of pi might be a humourous way to kill some time, 99.999% of the world doesn't care. Maybe DSM researchers should focus on finding some "problem" for their "solution"?

Difference between DSM and NUMA?

Anil What are the differences between DSM and NUMA? Under what circumstances are each appropriate?

Alireza: NUMA follows SMP paradigm where there is common memory bus for accessing shared memories. In addition, one of the most important aspects of the NUMA is that it provides different access time for the processers based on their locations. For instance local processors can have faster access to local memories. In addition, NUMA access to memory is hardware based.

Joshua Tessier: Correct me if I'm wrong but NUMA is basically a type of DSM. In a NUMA system, each processor has access to a common memory, however this common memory is distributed across each of the processors. For example, if there are 8 processors, the total memory is divided into 8 sections. As stated above, the processors have different access times to memory stores. Meanwhile, DSM is just dynamic shared memory; not a specific type like NUMA.

ABecevello: It seems from doing some background reading on NUMA (since I noticed the articles didn't really describe it very well), that NUMA is primarily used for memory access of multiple processors in a single computer. DSM as described sounds like its designed for memory accesses between computers. From my reading, it looks like NUMA runs best when multiple processors do NOT attempt to access the same memory all at the same time.

eltonc: From what I have read, the NUMA architechture first tries to see if the data requested can be found in local memory, but since there is a limitation on how much local memory a computer can have, it then tries to access the remote memory (memory on a different node than the CPU currently running the process). This remote memory is usually on the same machine where as with DSM, the memory is usually on a different computer.

Laszlo: (Answer) NUMA and DSM both include separate memory for each processor for fast access, as well as the ability to access memory from other processors' memories. The different is that DSM is distributed and NUMA is not. NUMA does provide a means for using more than one processor to bypass the performance bottleneck, however NUMA was designed for SMP machines, where all the processors are linked together within the same physical machine. The main point of DSM is to have memory shared between physically distinct machines. This provides both the fault tolerance (a single multiprocessor machine will fail all at once), as well as the flexibility to add or remove machines to make the distributed system bigger or smaller. It is discussed in the paper that DSM would allow the extra cycles on each workstation to be made use of when the primary user was not making use of their workstation. NUMA does not have the option to do this, because a single multiprocessor machine cannot be used like separate workstations can. Additionally a DSM system simplifies the use of other system components such as the hard disk. Since each processor has their own peripherals, disks and circuits there may be less problems using these resources.

DSM Implementations?

Azalia:(Question) What are the different types of DSM Implementations?

Yohan:(Answer) There are 3 different types of DSM implementation. The first one is Software-level implementation which can be achieved in user-level, run-time library routine, the OS, or the programming language, for example IVY, Mermaid, Munin, etc. The second one is Hardware-level implementation which ensures that automatic replication of shared data in local memories and processors caches, transparently for software-layer, for example Memnet, Dash, SCI KSR1, etc. Since software is used in hardware support to optimize memory reference, and hardware is used in software solution such as virtual memory management, then the third one is Hybrid Level Implementation which is a combination of both implementation. Several examples of such implementation are Plus, Galactica Net, Alewife, etc.

Joshua Tessier:(Question) Does the hybrid solution hold much relevance today? From what I got in the paper, it came to light due to some limitations of the hardware/OS layers at the time. Today, we have a ton of different tools at our disposal and these limitations are no longer present. How would such a solution be divided today?

William:(Answer) While the general need for DSM systems may be reducing, when they are desired, the hybrid solution does hold some weight today. It is of course a balance of scalability vs. performance, I don't think there will ever be a day when software performance will ever exceed hardware. In order to reduce latencies a hybrid approach would be the most beneficial, but only in the correct circumstances. Software implementations make it much easier to integrate (especially those which do not modify OS functionality), while pure hardware, although costs have come down, would not be all that economical as the system would involve a lot of custom hardware and thus custom software to get the proper benefit from it. This would allow the hardware to perform certain processes, in parallel with the software, much like the internal workings of a single computer. The unfortunate downside would be for programmers, updating hardware and software simultaneously and efficiently is much more difficult than updating software alone and of course the money involved to develop such a system.

Alireza : (Answer) I believe one of the main problems of utilizing any kind of hardware solution is portability and adaptability issue. As soon as we start combining hardware with our solution we introduce a factor that requires special machines and thus reduce the rate of adaptability of our solution. Obviously, customized hardware can surpass software running on general purpose hardware from performance perspective but we should always ask this question during early stages of our design: Do we want a special purpose solution or an adaptable solution?

ABecevello:(Question) What security features does DSM provide? More specifically, do the systems described in the papers offer any mechanisms to ensure untrusted machines cannot join the DSM system? (For general knowlege...)Do any current DSM systems offer additional security features?

Azalia : (Answer) As far as the discussed material in the dissertation and the other paper, I did not notice any discussion about DSM security. However, I did some further research about this and found this paper : (http://www-static.cc.gatech.edu/~milos/rogers_pact06.pdf) This paper has been published in 2006 and according to the paper it is the first work related to DSM security.

Dave : (Answer)After browsing the article mentioned above, it seems as though the earlier systems offered only software security measures, but failed to implement hardware security measures. This became apparent in later years as people began to attach "snooping devices" to various parts of the systems. They could then use these devices to capture information that happened to be flowing through that part of the system. Since the article is from 2006, I assume there are probably a few DSM systems out there that implement some sort of hardware security measures, however it seems to be a relatively new idea.

Performance vs. Ease of Use

NeilDickson: (Inquiry for Opinions) Is it just me, or does the first paper discuss performance way too much considering that it can't provide better performance than message passing (by definition, since it uses message passing underneath)? I suppose that it's important for it to claim/show that the performance isn't too bad relative to message passing (which I can't tell because he badly botched his "real" performance analysis in section 6), but unlike the second paper, he never explains the only major benefit of DSM, which is ease of use. There's mention that it'd be better to use than message passing, but he doesn't explain how or why very clearly, whereas the second paper immediately addresses that well.

A semi-related question: did he actually get a PhD for this paper? If so, it speaks to why Yale is not known for computer science.

Yohan: (Question) Speaking about performance and reliability of a system, we know that performance is a huge issue for Li. And on chapter 7.4, he said that he didn't take reliability aspect into consideration at all. However, we all know that a fast application which has an almost linear running time will be useless if that application cannot deliver the expected result. So my question is, is performance a lot more important comparing to reliability? At least during Li's era? and What aspects (during Li's era) made performance of a system more important than its reliability? (Recall that when RPC was first introduced, reliability and security were not addressed too)

Bo: (Answer) Performance and reliability more or less go hand in hand. If the RPC implementation loses packets, those packets would need to be resent and the programs would hang until some result was received, which would decrease the performance. Li's main reason for not factoring in reliability is the fact that his experiment set was reproducible such that if one computation failed to return, he would just recompute it.

Adam_K: (Question) Regarding reliability more specifically: how does DSM deal with node failure? That is, if a node with the only copy of a certain block of the shared address space, without explicit redundancy, how does the system gaurd against loss of data?

Laszlo: (Answer) That is a very good question about reliability of nodes. The advantage of having DSM is the ease of use because accessing memory by address is very easy to do. If a system runs out of memory and has to destroy a page of memory it will kill the process that owned that memory. If a node in the network is lost, one would assume the process would have to be terminated because it is hard to handle the condition of missing memory as a programmer. I think that usually the node with the required memory would also contain that process, so losing that node would implicitly kill the process. Maybe it could keep the process around in case the node came back online. To guard against the loss of data, a DSM system could be set up with a certain number of extra copies of each page. If the physical network was known to be very reliable the redundancy could be removed to make more memory available. This flexibility is one of the advantages of a DSM system.