DistOS 2021F 2021-09-23

Discussion questions

What is the basic idea of distributed shared memory?
How does distributed shared memory compare to virtual memory on a single CPU system? How about a system with many cores?
How aware does a programmer need to be of DSM to use it? How aware to use it efficiently?
What are the key mechanisms supporting DSM?
How common do you think is DSM today in the cloud today? Why?

Notes

Lecture 5
---------

Group reports
 - so far they look mostly good
 - please try to add some structure to it
   - organizing around questions asked/topics discussed helps
   - section headings are nice
 - 1-2 pages is what most of you are turning in, which is good
 - but please use complete sentences, don't just use bullet points/phrases.  It
   is just harder to understand/more ambiguous

DSM
What is the basic idea of distributed shared memory?
 - a process should be able to run across multiple computers
   - different threads on different hosts
   - but all sharing the same memory/address space
   - instead of multiple processes communicating over the network,
     we have one process sharing info with itself through shared memory,
     just like any multithreaded program

   - (we can just share part of a process's address space, then it is just
      like two processes sharing part of their memory on a single system)

Do we like to program multithreaded programs?
 - in general, it is the hardest way to implement things
 - but on multicore systems, can be the fastest
    - because shared memory is a fast way to share state
       - avoid copies of data from messages

Does this apply to a cluster of systems?
 - NO, not at all
 - because "shared memory" is an illusion implemented by COPYING DATA OVER THE NETWORK
    - so we can never be faster than just sending messages back and forth
 - so, why do DSM if not performance?
    - ease of use, or
    - legacy code

How does distributed shared memory compare to virtual memory on a single CPU system?
 - very similar
 - basically we're swapping across the network instead of to disk
   - but data can change when it is swapped out


How about a system with many cores?
 - very similar, except...
 - everything implemented in hardware
 - VERY fast network, ultra low latency
 - don't have to worry about network failures
    - system is dead then
 - copying has to happen all the time
    - between caches of cores and main memory

How common do you think is DSM today in the cloud today? Why?
 - In a sense, DSM is alive and well, just on multicore systems
 - but everywhere else...not so much
   - just isn't more efficient in software
   - better to just send messages using some other abstraction

Isn't a distributed cache a DSM?
 - not really, it is much more specialized
 - think content distribution networks (CDNs)

How aware does a programmer need to be of DSM to use it?
 - not at all, it is transparent

How aware to use it efficiently?
 - VERY aware, if you aren't careful performance will go into the toliet
 - and it is hard to tell you're making things slow, we don't think of memory access that way most of the time
   - think about how difficult it is to make cache-efficient code
   - add in a network and complexity goes up, as does cost of failures to do
     the right thing

Eventual consistency vs strong consistency
  - if two nodes access the same memory, do they HAVE to see the same thing immediately?
  - if they don't, you can get away with eventual consistency and improve performance
     - but then, why use shared memory?

If you're tuning your system, is it easier to
 - play with memory placement, DSM algorithms, or
 - optimize network usage?

Note the trend here
 - when we try to fool the developer that there isn't a network, we have
   performance & scaling bottlenecks
 - need abstractions that are inherently network-aware
    - account for latency, bandwidth, reliability, (& security issues)