WebOS, PlanetLab, Starfish: Difference between revisions

Latest revision as of 19:55, 19 March 2008

Readings

Amin Vahat et al., "WebOS: Operating System Services for Wide Area Applications" (1998)

Adnan Agbaria and Roy Friedman, "Starﬁsh: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations" (2003)

Larry Peterson et al., "Experiences Building PlanetLab" (2006)

Thomas Anderson and Timothy Roscoe, "Learning from PlanetLab" (2006)

WebOS

Key features

High Availability
Lower Latency
Fault Tolerance

Consensus: WebOS isn't really an distributed OS

Main components

Smart Client
WebFS
Global naming scheme based on URLs
Process control system
CRISIS authentication/authorization system (Certificates with ACLs)

Key ideas that were/were not not adopted from WebFS

Adopted:

General idea of wide area dynamic distribution -> Akamai (but primarily for static content)
Global naming using URLs

Not Adopted:

CRISIS
WebFS (Although WebDAV could be said to be related)
Smart client (for web sites)

What are the pros and cons of using smart clients to do load balancing?

Pro:

Distributes computation
More flexible

Con:

Vulnerable to Denial of Service or other forms of attacks
Extra network overhead to locate a service

Starfish

Key features

MPI

Fast message passing
Allows the programmers more control

OCaml

Pros

Recursive algorithms
Uses Bytecodes - portable in theory
Same code on heterogenous hardware
Well known language

Cons

Slow performance?
Not entirely portable, maybe Just-In-Time?

Checkpointing

Save / resume state
Provides process migration
Maybe better suited to be implemented in the application, not OS?

Management

Had a Java interface which allowed any node to login and see full system status
Used the distributed nature of daemons to communicate this information

Experiences Building PlanetLab

Key ideas

Global platform to distribute/test network services
Scale
Should be able to monitor and stop disruptive traffic

Challenges

Resource allocation is significant, especially in a distributed system such as PlanetLab
Providing a global platform for long-term services
Implementing a trust relationship between node owners and service developers (users)

PLC

Summary

Fulfills the trust mechanism required for the network
Acts as a middle man / mediator between node owners and users
If someone breaks into the PLC however, entire system is compromised

Implementation

Nodes are split into slices using VServers, lightweight process groups

Criticisms

Bandwidth allocation was slice-based, not node based

Learning from PlanetLab

Centralized Trusts

Pro

Easier to manage, one entity to trust

Con

One point of control, no competition as to who to trust

Centralized resource control

Pro

All resources are controlled by PLC

Con

No incentive for users or administrators to conservce resources

Decentralized management

Pro

Provide bare-bones management, try to foster competition between 3rd party services

Con

No motivation to do so, it's hard work

Treat bandwidth as free

Pro

Free bandwidth!

Con

No incentive to conserve bandwidth

Provide only best-effort service

Pro

No limit on the number of processes which can be run

Con

Other processes may crowd out computation of other processes (cpu/disk hogging)

Linux is the execution environment

Pro

Provides a familiar programming environment

Con

Weak isolation between experiments
No global allocation for resources
Having a homogenous test-bed is poor for distributed experimentation

Don't provide distributed OS services

Pro

Con

Evolve the API

Pro

Adaptable API, ground-up

Con

Never had a good API, inconsistent, ever changing, unstable programming environment

@@ Line 8: / Line 8: @@
 [http://homeostasis.scs.carleton.ca/~soma/distos/2008-03-17/anderson-planetlab-learning.pdf Thomas Anderson and Timothy Roscoe, "Learning from PlanetLab" (2006)]
+----
+== WebOS ==
+=== Key features ===
+* High Availability
+* Lower Latency
+* Fault Tolerance
+Consensus: WebOS isn't really an distributed OS
+=== Main components ===
+* Smart Client
+* WebFS
+* Global naming scheme based on URLs
+* Process control system
+* CRISIS authentication/authorization system (Certificates with ACLs)
+=== Key ideas that were/were not not adopted from WebFS ===
+Adopted:
+* General idea of wide area dynamic distribution -> Akamai (but primarily for static content)
+* Global naming using URLs
+Not Adopted:
+* CRISIS
+* WebFS (Although WebDAV could be said to be related)
+* Smart client (for web sites)
+=== What are the pros and cons of using smart clients to do load balancing? ===
+Pro:
+* Distributes computation
+* More flexible
+Con:
+* Vulnerable to Denial of Service or other forms of attacks
+* Extra network overhead to locate a service
+== Starfish ==
+=== Key features ===
+===== MPI =====
+* Fast message passing
+* Allows the programmers more control
+===== OCaml =====
+====== Pros ======
+* Recursive algorithms
+* Uses Bytecodes - portable in theory
+* Same code on heterogenous hardware
+* Well known language
+====== Cons ======
+* Slow performance?
+* Not entirely portable, maybe Just-In-Time?
+==== Checkpointing ====
+* Save / resume state
+* Provides process migration
+* Maybe better suited to be implemented in the application, not OS?
+==== Management ====
+* Had a Java interface which allowed any node to login and see full system status
+* Used the distributed nature of daemons to communicate this information
+== Experiences Building PlanetLab ==
+=== Key ideas ===
+* Global platform to distribute/test network services
+* Scale
+* Should be able to monitor and stop disruptive traffic
+=== Challenges ===
+* Resource allocation is significant, especially in a distributed system such as PlanetLab
+* Providing a global platform for long-term services
+* Implementing a trust relationship between node owners and service developers (users)
+=== PLC ===
+==== Summary ====
+* Fulfills the trust mechanism required for the network
+* Acts as a middle man / mediator between node owners and users
+* If someone breaks into the PLC however, entire system is compromised
+==== Implementation ====
+* Nodes are split into slices using VServers, lightweight process groups
+==== Criticisms ====
+* Bandwidth allocation was slice-based, not node based
+== Learning from PlanetLab==
+=== Centralized Trusts ===
+==== Pro ====
+* Easier to manage, one entity to trust
+==== Con ====
+* One point of control, no competition as to who to trust
+=== Centralized resource control ===
+==== Pro ====
+* All resources are controlled by PLC
+==== Con ====
+* No incentive for users or administrators to conservce resources
+=== Decentralized management ===
+==== Pro ====
+* Provide bare-bones management, try to foster competition between 3rd party services
+==== Con ====
+* No motivation to do so, it's hard work
+=== Treat bandwidth as free ===
+==== Pro ====
+* Free bandwidth!
+==== Con ====
+* No incentive to conserve bandwidth
+=== Provide only best-effort service ===
+==== Pro ====
+* No limit on the number of processes which can be run
+==== Con ====
+* Other processes may crowd out computation of other processes (cpu/disk hogging)
+=== Linux is the execution environment ===
+==== Pro ====
+* Provides a familiar programming environment
+==== Con ====
+* Weak isolation between experiments
+* No global allocation for resources
+* Having a homogenous test-bed is poor for distributed experimentation
+=== Don't provide distributed OS services ===
+==== Pro ====
+*
+==== Con ====
+*
+=== Evolve the API ===
+==== Pro ====
+* Adaptable API, ground-up
+==== Con ====
+* Never had a good API, inconsistent, ever changing, unstable programming environment
+=== Focus on the machine room ===
+==== Pro ====
+* Allocate big machine here, other there, etc...
+==== Con ====
+* Bad for distributed OSes