DistOS 2021F 2021-09-23
Jump to navigation
Jump to search
Discussion questions
- What is the basic idea of distributed shared memory?
- How does distributed shared memory compare to virtual memory on a single CPU system? How about a system with many cores?
- How aware does a programmer need to be of DSM to use it? How aware to use it efficiently?
- What are the key mechanisms supporting DSM?
- How common do you think is DSM today in the cloud today? Why?
Notes
Lecture 5 --------- Group reports - so far they look mostly good - please try to add some structure to it - organizing around questions asked/topics discussed helps - section headings are nice - 1-2 pages is what most of you are turning in, which is good - but please use complete sentences, don't just use bullet points/phrases. It is just harder to understand/more ambiguous DSM What is the basic idea of distributed shared memory? - a process should be able to run across multiple computers - different threads on different hosts - but all sharing the same memory/address space - instead of multiple processes communicating over the network, we have one process sharing info with itself through shared memory, just like any multithreaded program - (we can just share part of a process's address space, then it is just like two processes sharing part of their memory on a single system) Do we like to program multithreaded programs? - in general, it is the hardest way to implement things - but on multicore systems, can be the fastest - because shared memory is a fast way to share state - avoid copies of data from messages Does this apply to a cluster of systems? - NO, not at all - because "shared memory" is an illusion implemented by COPYING DATA OVER THE NETWORK - so we can never be faster than just sending messages back and forth - so, why do DSM if not performance? - ease of use, or - legacy code How does distributed shared memory compare to virtual memory on a single CPU system? - very similar - basically we're swapping across the network instead of to disk - but data can change when it is swapped out How about a system with many cores? - very similar, except... - everything implemented in hardware - VERY fast network, ultra low latency - don't have to worry about network failures - system is dead then - copying has to happen all the time - between caches of cores and main memory How common do you think is DSM today in the cloud today? Why? - In a sense, DSM is alive and well, just on multicore systems - but everywhere else...not so much - just isn't more efficient in software - better to just send messages using some other abstraction Isn't a distributed cache a DSM? - not really, it is much more specialized - think content distribution networks (CDNs) How aware does a programmer need to be of DSM to use it? - not at all, it is transparent How aware to use it efficiently? - VERY aware, if you aren't careful performance will go into the toliet - and it is hard to tell you're making things slow, we don't think of memory access that way most of the time - think about how difficult it is to make cache-efficient code - add in a network and complexity goes up, as does cost of failures to do the right thing Eventual consistency vs strong consistency - if two nodes access the same memory, do they HAVE to see the same thing immediately? - if they don't, you can get away with eventual consistency and improve performance - but then, why use shared memory? If you're tuning your system, is it easier to - play with memory placement, DSM algorithms, or - optimize network usage? Note the trend here - when we try to fool the developer that there isn't a network, we have performance & scaling bottlenecks - need abstractions that are inherently network-aware - account for latency, bandwidth, reliability, (& security issues)