WebOS, PlanetLab, Starfish
Readings
Amin Vahat et al., "WebOS: Operating System Services for Wide Area Applications" (1998)
Larry Peterson et al., "Experiences Building PlanetLab" (2006)
Thomas Anderson and Timothy Roscoe, "Learning from PlanetLab" (2006)
WebOS
Key features
- High Availability
- Lower Latency
- Fault Tolerance
Consensus: WebOS isn't really an distributed OS
Main components
- Smart Client
- WebFS
- Global naming scheme based on URLs
- Process control system
- CRISIS authentication/authorization system (Certificates with ACLs)
Key ideas that were/were not not adopted from WebFS
Adopted:
- General idea of wide area dynamic distribution -> Akamai (but primarily for static content)
- Global naming using URLs
Not Adopted:
- CRISIS
- WebFS (Although WebDAV could be said to be related)
- Smart client (for web sites)
What are the pros and cons of using smart clients to do load balancing?
Pro:
- Distributes computation
- More flexible
Con:
- Vulnerable to Denial of Service or other forms of attacks
- Extra network overhead to locate a service
Starfish
Key features
MPI
- Fast message passing
- Allows the programmers more control
OCaml
Pros
- Recursive algorithms
- Uses Bytecodes - portable in theory
- Same code on heterogenous hardware
- Well known language
Cons
- Slow performance?
- Not entirely portable, maybe Just-In-Time?
Checkpointing
- Save / resume state
- Provides process migration
- Maybe better suited to be implemented in the application, not OS?
Management
- Had a Java interface which allowed any node to login and see full system status
- Used the distributed nature of daemons to communicate this information
Experiences Building PlanetLab
Key ideas
- Global platform to distribute/test network services
- Scale
- Should be able to monitor and stop disruptive traffic
Challenges
- Resource allocation is significant, especially in a distributed system such as PlanetLab
- Providing a global platform for long-term services
- Implementing a trust relationship between node owners and service developers (users)
PLC
Summary
- Fulfills the trust mechanism required for the network
- Acts as a middle man / mediator between node owners and users
- If someone breaks into the PLC however, entire system is compromised
Implementation
- Nodes are split into slices using VServers, lightweight process groups
Criticisms
- Bandwidth allocation was slice-based, not node based
Learning from PlanetLab
Centralized Trusts
Pro
- Easier to manage, one entity to trust
Con
- One point of control, no competition as to who to trust
Centralized resource control
Pro
- All resources are controlled by PLC
Con
- No incentive for users or administrators to conservce resources
Decentralized management
Pro
- Provide bare-bones management, try to foster competition between 3rd party services
Con
- No motivation to do so, it's hard work
Treat bandwidth as free
Pro
- Free bandwidth!
Con
- No incentive to conserve bandwidth
Provide only best-effort service
Pro
- No limit on the number of processes which can be run
Con
- Other processes may crowd out computation of other processes (cpu/disk hogging)
Linux is the execution environment
Pro
- Provides a familiar programming environment
Con
- Weak isolation between experiments
- No global allocation for resources
- Having a homogenous test-bed is poor for distributed experimentation
Don't provide distributed OS services
Pro
Con
Evolve the API
Pro
- Adaptable API, ground-up
Con
- Never had a good API, inconsistent, ever changing, unstable programming environment
Focus on the machine room
Pro
- Allocate big machine here, other there, etc...
Con
- Bad for distributed OSes