WebOS, PlanetLab, Starfish

From Soma-notes

Readings

Amin Vahat et al., "WebOS: Operating System Services for Wide Area Applications" (1998)

Adnan Agbaria and Roy Friedman, "Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations" (2003)

Larry Peterson et al., "Experiences Building PlanetLab" (2006)

Thomas Anderson and Timothy Roscoe, "Learning from PlanetLab" (2006)




WebOS

Key features

  • High Availability
  • Lower Latency
  • Fault Tolerance

Consensus: WebOS isn't really an distributed OS

Main components

  • Smart Client
  • WebFS
  • Global naming scheme based on URLs
  • Process control system
  • CRISIS authentication/authorization system (Certificates with ACLs)

Key ideas that were/were not not adopted from WebFS

Adopted:

  • General idea of wide area dynamic distribution -> Akamai (but primarily for static content)
  • Global naming using URLs

Not Adopted:

  • CRISIS
  • WebFS (Although WebDAV could be said to be related)
  • Smart client (for web sites)

What are the pros and cons of using smart clients to do load balancing?

Pro:

  • Distributes computation
  • More flexible

Con:

  • Vulnerable to Denial of Service or other forms of attacks
  • Extra network overhead to locate a service


Starfish

Key features

MPI
  • Fast message passing
  • Allows the programmers more control
OCaml
Pros
  • Recursive algorithms
  • Uses Bytecodes - portable in theory
  • Same code on heterogenous hardware
  • Well known language
Cons
  • Slow performance?
  • Not entirely portable, maybe Just-In-Time?

Checkpointing

  • Save / resume state
  • Provides process migration
  • Maybe better suited to be implemented in the application, not OS?

Management

  • Had a Java interface which allowed any node to login and see full system status
  • Used the distributed nature of daemons to communicate this information


Experiences Building PlanetLab

Key ideas

  • Global platform to distribute/test network services
  • Scale
  • Should be able to monitor and stop disruptive traffic

Challenges

  • Resource allocation is significant, especially in a distributed system such as PlanetLab
  • Providing a global platform for long-term services
  • Implementing a trust relationship between node owners and service developers (users)

PLC

Summary

  • Fulfills the trust mechanism required for the network
  • Acts as a middle man / mediator between node owners and users
  • If someone breaks into the PLC however, entire system is compromised

Implementation

  • Nodes are split into slices using VServers, lightweight process groups

Criticisms

  • Bandwidth allocation was slice-based, not node based

Learning from PlanetLab

Centralized Trusts

Pro

  • Easier to manage, one entity to trust

Con

  • One point of control, no competition as to who to trust

Centralized resource control

Pro

  • All resources are controlled by PLC

Con

  • No incentive for users or administrators to conservce resources

Decentralized management

Pro

  • Provide bare-bones management, try to foster competition between 3rd party services

Con

  • No motivation to do so, it's hard work

Treat bandwidth as free

Pro

  • Free bandwidth!

Con

  • No incentive to conserve bandwidth

Provide only best-effort service

Pro

  • No limit on the number of processes which can be run

Con

  • Other processes may crowd out computation of other processes (cpu/disk hogging)

Linux is the execution environment

Pro

  • Provides a familiar programming environment

Con

  • Weak isolation between experiments
  • No global allocation for resources
  • Having a homogenous test-bed is poor for distributed experimentation

Don't provide distributed OS services

Pro

Con

Evolve the API

Pro

  • Adaptable API, ground-up

Con

  • Never had a good API, inconsistent, ever changing, unstable programming environment

Focus on the machine room

Pro

  • Allocate big machine here, other there, etc...

Con

  • Bad for distributed OSes