Early RPC & the Alto: Difference between revisions

From Soma-notes
m RPC moved to Early RPC
No edit summary
 
(3 intermediate revisions by one other user not shown)
Line 1: Line 1:
==Readings==
==Readings==
*'''[http://homeostasis.scs.carleton.ca/~soma/distos/fall2008/alto.pdf Thacker et al., "Alto: A Personal computer" (1979)]'''
Read this to learn about the Alto, the distributed computing system that pioneered the core technology of current computers.


*'''[http://homeostasis.scs.carleton.ca/~soma/distos/fall2008/nelson1981-rpc.pdf Bruce J. Nelson, ''Remote Procedure Call'' (1981)]:'''
*'''[http://homeostasis.scs.carleton.ca/~soma/distos/fall2008/nelson1981-rpc.pdf Bruce J. Nelson, ''Remote Procedure Call'' (1981)]:'''
Line 11: Line 15:
Compare the perspective of this RPC implementation description with
Compare the perspective of this RPC implementation description with
the more design-oriented focus of Nelson's thesis.
the more design-oriented focus of Nelson's thesis.
<!--


*'''[http://homeostasis.scs.carleton.ca/~soma/distos/fall2008/rfc1050.txt Sun Microsystems, "RPC: Remote Procedure Call Protocol Specification" (1988)]'''
*'''[http://homeostasis.scs.carleton.ca/~soma/distos/fall2008/rfc1050.txt Sun Microsystems, "RPC: Remote Procedure Call Protocol Specification" (1988)]'''
Line 19: Line 24:


This article explains the basics of a modern RPC service, SOAP, along with related technologies.
This article explains the basics of a modern RPC service, SOAP, along with related technologies.
-->
==Notes==
How is the implementation of RPC in the Birrell/Nelson paper similar to what we'd do today?
*it uses stubs
*RPC
*distributed naming/addressing
*exportable interfaces
More focus on practicality rather than theory in this paper.
They intended RPC to become the sole platform for communications.
====So how feasible is all-RPC based network communication?====
*Doubles the size of any given data transfer.
*Stateless communication; we want state.
'''RPC''' - Synchronous; call, response.
'''Message Passing''' - Asynchronous. We require ridiculous hacks to use RPC in this way.
'''Broadcasting/Multicasting''' - not feasible using RPC.
====Why RPC as the dominant platform, then?====
Ease of use vs. messages
The rest is just details, albeit important ones:
*Fault tolerance needs to be addressed
*Security
*Performance
XML was not feasible back then, because it's way too big.
So use compression? Requires CPU Cycles which they didn't have.
Making smaller leaner protocols reduces flexibility, increases performance.
XML - really flexible, not good performance.
====Implementations/Applications of RPC's at that time?====
*File sharing
*Games
For any of their applications, performance was the real show-stopper.
The entire system was based on elegance; the entire thing could have been done using message passing, but they wanted a good system
SOAP (XML + RPC) is an example of an RPC implementation
*Web services are really big blobs of code interconnecting via their external API's
Web Services are different from what XEROX was doing because their intent was fundamentally different.
They were looking to maximize the use of their machines.
You had very few authors building these applications. Not large programs, by necessity.
Written from the ground up, in microcode. Complete control of the systems by the programmers who built them.
Today, we mostly don't control the software stack. We're building on top of decades of development, and have to cross over administrative boundaries and deal with people with different intents, sometimes malicious.
====So where are RPC's still used today, outside of web services?====
Well, what is the point of a firewall?
*To let through a given set of services, and block others.
What do they block?
*File & Printer Sharing
*Net Send!
*Chat Clients/Messaging
*Service Discovery on the network (again, file and printer sharing)
Most Windows services talk to each other using RPC, either locally or to programs on other computers.
Interprocess communications can be implemented via RPC.
Almost everything a firewall blocks in practice is an RPC mechanism; RPC is everywhere.
'''Why are web services implemented over HTTP?'''
No firewall/security issues; port 80 is fine with the firewall, everything is let through.
'''Underlying security properties of SOAP?'''
Bad. Insecure, fundamentally.
Original sin of RPC: ease of use and transparency.
In effect they were trying to turn a network of computers into one computer.
You don't want random computers online to be part of your computer.
'''Before REST:'''
*CORBA
*COM
*SUN-RPC
*NFS (built on RPC) (also known as No File Security instead of Network File System)
====Step back; where are we with the study of distributed OS?====
Operating systems isolate processes from each other, but need means of communications. This is called IPC (interprocess communication). RPC is just IPC across different machines, different OSes, to work as one OS.
RPC's are still not a distributed operating system
*They don't handle resource sharing/allocation
*No security
====What if you had an underlying shared memory?====
Enables threads across multiple computers. This is more like a real system we want to create.
Why didn't they take RPC further?
Talking about original sin, XEROX Park folks realized
*Resources need to be shared
*Security issues in doing this
'''Why not just turn everything into one giant system?'''
Discrete machines; you want a degree of separation.
Even so, you could build mechanisms for doing that. What's the real reason?
Performance.
====Question: why create a distributed OS at all?====
Resources are being wasted/not used; why shouldn't other people use them? So resource sharing.
Why weren't normal OS processes being used?
Performance.
'''Fundamental reason''': OSes are tightly coupled systems.
Any bit of ram is the same as any other bit of ram;
Any CPU time is equivalent to any other bit of CPU time.
Building a distributed system, can we have that kind of coupling?
You can program it; it can be built.
What's the difference between cycles on my machine, and another machine? Latency.
We don't yet have a completely effective way to access other machines' resources.
RPC is a first step, but it's transparent; you never know when you're making a function call across the network.
To make performance feasible, either you have to know when you're doing it, or the machine has to be able to control it.
Virtual memory's an interesting analogy; it works, despite being transparent.
We haven't figured out a similar analogue for computation without killing performance.
What we're doing is identical to what people were doing with overlays before virtual memory.
Before virtual memory, people had to do the memory-to-disk, disk-to-memory manually. One way was with overlays, which were a huge pain.
Distributed shared memory; do external function calls, where the external machine knows your local state.
Latencies are a real problem in this kind of system.
RPC's make it easy to program this kind of thing, in exactly the wrong way.
Why hide the syntax, so long as it's easy to use? Why not just have a separate "external function call" syntax?
The problem with RPC is that it makes it easy to move entire API's at a time outside of the application.
Huge security problem.
This is why we have firewalls, to stop RPC's from talking to machines they shouldn't be talking to.
'''So How do we secure SOAP?'''
...XML Firewalls. Need to parse the XML and decide what to do with it. This is basically an arms race; what will we have next? Tunneling over SOAP?
'''So the things we need to understand are:''' what RPC's are, why they're useful, how they flake out on all the hard problems of a distributed OS.

Latest revision as of 14:13, 17 September 2008

Readings

Read this to learn about the Alto, the distributed computing system that pioneered the core technology of current computers.

You only need to read the thesis summary which starts at page 224 in the PDF. If you have time, however, I'd suggest looking at the rest, particularly the introduction and related work.

Compare the perspective of this RPC implementation description with the more design-oriented focus of Nelson's thesis.

Notes

How is the implementation of RPC in the Birrell/Nelson paper similar to what we'd do today?

  • it uses stubs
  • RPC
  • distributed naming/addressing
  • exportable interfaces

More focus on practicality rather than theory in this paper.

They intended RPC to become the sole platform for communications.

So how feasible is all-RPC based network communication?

  • Doubles the size of any given data transfer.
  • Stateless communication; we want state.

RPC - Synchronous; call, response. Message Passing - Asynchronous. We require ridiculous hacks to use RPC in this way. Broadcasting/Multicasting - not feasible using RPC.

Why RPC as the dominant platform, then?

Ease of use vs. messages The rest is just details, albeit important ones:

  • Fault tolerance needs to be addressed
  • Security
  • Performance

XML was not feasible back then, because it's way too big. So use compression? Requires CPU Cycles which they didn't have. Making smaller leaner protocols reduces flexibility, increases performance. XML - really flexible, not good performance.

Implementations/Applications of RPC's at that time?

  • File sharing
  • Games

For any of their applications, performance was the real show-stopper. The entire system was based on elegance; the entire thing could have been done using message passing, but they wanted a good system

SOAP (XML + RPC) is an example of an RPC implementation

  • Web services are really big blobs of code interconnecting via their external API's

Web Services are different from what XEROX was doing because their intent was fundamentally different. They were looking to maximize the use of their machines. You had very few authors building these applications. Not large programs, by necessity. Written from the ground up, in microcode. Complete control of the systems by the programmers who built them.

Today, we mostly don't control the software stack. We're building on top of decades of development, and have to cross over administrative boundaries and deal with people with different intents, sometimes malicious.

So where are RPC's still used today, outside of web services?

Well, what is the point of a firewall?

  • To let through a given set of services, and block others.

What do they block?

  • File & Printer Sharing
  • Net Send!
  • Chat Clients/Messaging
  • Service Discovery on the network (again, file and printer sharing)

Most Windows services talk to each other using RPC, either locally or to programs on other computers. Interprocess communications can be implemented via RPC.

Almost everything a firewall blocks in practice is an RPC mechanism; RPC is everywhere.

Why are web services implemented over HTTP? No firewall/security issues; port 80 is fine with the firewall, everything is let through.

Underlying security properties of SOAP? Bad. Insecure, fundamentally. Original sin of RPC: ease of use and transparency. In effect they were trying to turn a network of computers into one computer. You don't want random computers online to be part of your computer.

Before REST:

  • CORBA
  • COM
  • SUN-RPC
  • NFS (built on RPC) (also known as No File Security instead of Network File System)

Step back; where are we with the study of distributed OS?

Operating systems isolate processes from each other, but need means of communications. This is called IPC (interprocess communication). RPC is just IPC across different machines, different OSes, to work as one OS.

RPC's are still not a distributed operating system

  • They don't handle resource sharing/allocation
  • No security

What if you had an underlying shared memory?

Enables threads across multiple computers. This is more like a real system we want to create.

Why didn't they take RPC further? Talking about original sin, XEROX Park folks realized

  • Resources need to be shared
  • Security issues in doing this

Why not just turn everything into one giant system? Discrete machines; you want a degree of separation. Even so, you could build mechanisms for doing that. What's the real reason?

Performance.

Question: why create a distributed OS at all?

Resources are being wasted/not used; why shouldn't other people use them? So resource sharing. Why weren't normal OS processes being used? Performance.

Fundamental reason: OSes are tightly coupled systems. Any bit of ram is the same as any other bit of ram; Any CPU time is equivalent to any other bit of CPU time.

Building a distributed system, can we have that kind of coupling? You can program it; it can be built. What's the difference between cycles on my machine, and another machine? Latency.

We don't yet have a completely effective way to access other machines' resources. RPC is a first step, but it's transparent; you never know when you're making a function call across the network. To make performance feasible, either you have to know when you're doing it, or the machine has to be able to control it.

Virtual memory's an interesting analogy; it works, despite being transparent. We haven't figured out a similar analogue for computation without killing performance. What we're doing is identical to what people were doing with overlays before virtual memory. Before virtual memory, people had to do the memory-to-disk, disk-to-memory manually. One way was with overlays, which were a huge pain.

Distributed shared memory; do external function calls, where the external machine knows your local state. Latencies are a real problem in this kind of system.

RPC's make it easy to program this kind of thing, in exactly the wrong way. Why hide the syntax, so long as it's easy to use? Why not just have a separate "external function call" syntax? The problem with RPC is that it makes it easy to move entire API's at a time outside of the application. Huge security problem. This is why we have firewalls, to stop RPC's from talking to machines they shouldn't be talking to.

So How do we secure SOAP? ...XML Firewalls. Need to parse the XML and decide what to do with it. This is basically an arms race; what will we have next? Tunneling over SOAP?

So the things we need to understand are: what RPC's are, why they're useful, how they flake out on all the hard problems of a distributed OS.