Distributed OS Overview

Distributed Operating Systems

At what level do you want to start the distribution?

Hardware Layer
If memory is shared, communication is trivial ie. parallel computers (multi core).

Kernel Layer
Distributing at the kernel layer will avoid API and User Space changes. So why don't we share at the lower kernel layer? The main reason is security and performance. We can assume that memory is fast (low latency) so we need to share memory across each computer in the distribution. To share memory in such a way presents a challenge. So why isn't virtual memory enough? Typically because of contention over memory. Virtual memory is slow; duplicating memory pages and synchronizing them across the network takes too much time.

Process Layer
At the process layer, we can perform the distribution over the network using TCP/IP. Unfortunately, TCP/IP has a high latency. We can use ethernet to reduce the latency, but it is only a viable solution for LAN based systems. Therefore, latency will always be present.

We can deal with latency by caching a local copy of data which effectively reduces the amount of communication required. The downside is, caching introduces a need for synchronization.

We can also deal with latency by compressing the data and splitting up the computation "wisely". Computers are not wise enough to do it effectively, leaving it up to the programmers. Programmers split up such computations by introducing client/server architectures, using web applications and web services as well as using distributes file systems. For example, spam uses internet resources to communicate by email with large numbers of users. Spam is a feature of the Internet, everyone should be able to send an email to everyone and spam uses that resource. Distributed operating systems on the scale of the Internet capable of wise resource management do not yet exist.

What Makes a Good Distributed Operating System?

A good distributed OS must be:

Reliable and support dynamically scalable storage.
More processing power (CPU - linear scaling).
Manageable (it should be easy to manage, like a single computer)
Easy to write programs
Support fore single sign on!!
A single system image
Reliability (fault tolerant to software and hardware errors)
Dynamic reconfiguration
Exploit local resources.