DistOS 2014W Lecture 7: Difference between revisions

From Soma-notes
Sdp (talk | contribs)
m fixed a bug
Cdelahou (talk | contribs)
Merged Simon's and others notes together. Significant change
 
(7 intermediate revisions by 4 users not shown)
Line 6: Line 6:
* Find papers that cite that paper, papers it cites, etc. to collect a body of related work.
* Find papers that cite that paper, papers it cites, etc. to collect a body of related work.
* Don't just give a history, tell a story!
* Don't just give a history, tell a story!
* Do not try to summarize papers.
* Try to identify a pattern, a common ground between the papers
* Tell a story that connects several papers in the topic you choose
Pick a conference (usenix is pretty systems oriented, maybe Lisa), go through their papers and find something interesting.
Examples from OSDI 2012:
* datacenter (filesystems for doing X, heat management, etc...)
* web stuff
* distributed shared memory
* distributed network I/O infrastructure
* distributed databases (potentially)
* anonymity systems
==UNIX and Plan 9 (Jan. 28)==
* [http://homeostasis.scs.carleton.ca/~soma/distos/fall2008/unix.pdf Dennis M. Ritchie and Ken Thompson, "The UNIX Time-Sharing System" (1974)]
* [http://homeostasis.scs.carleton.ca/~soma/distos/2014w/presotto-plan9.pdf Presotto et. al, Plan 9, A Distributed System (1991)]
* [http://homeostasis.scs.carleton.ca/~soma/distos/2014w/pike-plan9.pdf Pike et al., Plan 9 from Bell Labs (1995)]


== Unix and Plan 9 ==
== Unix and Plan 9 ==


UNIX was built as "a castrated version of Multix", which was a very complex system. Multix was, arguably, so far ahead of its time that we are only just achieving their ambitions now. Unix was much more modest, and therefore much more achievable and successful. Just enough infrastructure to avoid reinventing the wheel. Just a couple of programmers making something for their own use. Unix was not designed as product or commercial entity at all. It was licensed out because AT&T was under severe antitrust scrutiny at the time.
* Multics was a complex system which was bad because it was used less, slower, etc...
* Multics was not for end users, it was designed to support "utility computing" wherein computation was a service to be charged for
 
UNIX was built as "a castrated version of Multics", which was a very complex system. Multics was, arguably, so far ahead of its time that we are only just achieving their ambitions now. Unix was much more modest, and therefore much more achievable and successful. Just enough infrastructure to avoid reinventing the wheel. Just a couple of programmers making something for their own use.
 
* Just enough infrastructure to run my programs
* It was really just supposed to be used by programmers
* "By programmers for programmers"
 
Unix was not designed as product or commercial entity at all. It was licensed out because AT&T was under severe antitrust scrutiny at the time.
 
They wanted few, simple abstractions so they made everything a file. The only difference amongst most files was that you could use seek on some and not on others. Berkeley promptly broke this abstraction by introducing sockets for networking.
 
Plan 9 finally introduced networking using the right abstractions, but was too late. Arguably the reason the BSD folks didn't use the file abstraction was because of the difference in reliability. SUN microsystems licensed Berkeley Unix and commercialized it. Files are generally reliable, and failures with them are catastrophic so many applications simply didn't include logic to handle such IO errors. Networks are much less reliable and applications have to be able to deal gracefully with timeouts and other errors.


They wanted few, simple abstractions so they made everything a file. Berkeley promptly broke this abstraction by introducing sockets for networking. Plan 9 finally introduced networking using the right abstractions, but was too late. Arguably the reason the BSD folks didn't use the file abstraction was because of the difference in reliability. Files are generally reliable, and failures with them are catastrophic so many applications simply didn't include logic to handle such IO errors. Networks are much less reliable and applications have to be able to deal gracefully with timeouts and other errors.
In Anil's opinion Plan 9's design of using file abstraction to represent Network wasn't a good design idea. The reason being file I/O breaking is uncommon but Network has an inherent flakiness and loss of connectivity is normal in networks. Using file system abstractions to represent Network doesn't properly takes care of the flakiness inherent in the Network. Put in other words Network doesn't have the reliability characteristics of mass storage and how to deal with this fact while using the file abstraction to deal with network was a major question which was left unanswered by the Plan 9 designers. Things that have different failure modes require different APIs. Anil also added that Plan 9 was a elegant attempt at representing everything using file abstraction but they were trying too hard with this approach as pointed out above.  


== Simon's Notes ==
In distributed systems the best approach to use is - if things have different semantics then they should have abstractions that reflect their characteristics, the APIs should reflect their characteristics rather than hide it away and try to pretend or treat them as if they were having characteristics of something else in an attempt towards too much generalizations.


* project proposal
Plan 9 implemented procfs, a directory that listed all processes as files. This was later adopted by Linux.
** We will discuss the primary papers we've chosen on Thursday, February 6th
* possible papers, remember to pick a topic you have some chance of understanding
** OSDI 2012
*** datacenter (filesystems for doing X, heat management, etc...)
*** web stuff
*** distributed shared memory
*** distributed network I/O infrastructure
*** distributed databases (potentially)
*** anonymity systems
** Pick a conference (usenix is pretty systems oriented, maybe Lisa), go through their papers and find something interesting
** tell a story that connects several papers in the topic you choose


* UNIX
In Anil's opinion another reason why Plan 9 was not widely adopted was that it was a bit late to the scene, by the time Plan 9 came out in the 90s systems running UNIX with networking was widely adopted driven by the success of Internet.
** Relation to multics
*** Multics was a complex system which was bad because it was used less, slower, etc...
*** Multics was not for end users, it was designed to support "utility computing" wherein computation was a service to be charged for
** What?
*** Just enough infrastructure to run my programs
*** It was really just supposed to be used by programmers
*** "By programmers for programmers"
*** Software and source licensed for a nominal fee
*** "Everything is a file"
*** only difference was files that you could use seek or ones you couldn't
*** simple abstractions
** Networking
*** Berkeley folks made sockets, not files which upset the folks at Bell labs
*** Networks aren't exactly like files because they're unreliable


Another valuable point Anil mentioned was that for a technology to get adopted and become successful it should serve or address a niche area for which there are no successful incumbents. There should be a champion use for the technology. Any tech won't continue existing just because it is cool.


* Plan 9
Tangent about programming languages: C was for system programming. Java was for enterprise programming.
** major ideas
*** procfs, later adopted by linux
** summary
*** a very elegant attempt to follow the philosophy "everything is a file"
*** trying too hard
** opinions
*** things that have different failure modes deserve different APIs

Latest revision as of 00:43, 24 February 2014

Project

We discussed moving the proposal due date back a week. We also discussed spending the class prior to that date discussing the primary papers people had chosen in order to provide preliminary feedback. Anil spent some time going through the papers from OSDI12 and discussing which ones would make good projects and why.

  • Pick a primary paper.
  • Find papers that cite that paper, papers it cites, etc. to collect a body of related work.
  • Don't just give a history, tell a story!
  • Do not try to summarize papers.
  • Try to identify a pattern, a common ground between the papers
  • Tell a story that connects several papers in the topic you choose


Pick a conference (usenix is pretty systems oriented, maybe Lisa), go through their papers and find something interesting.

Examples from OSDI 2012:

  • datacenter (filesystems for doing X, heat management, etc...)
  • web stuff
  • distributed shared memory
  • distributed network I/O infrastructure
  • distributed databases (potentially)
  • anonymity systems


UNIX and Plan 9 (Jan. 28)

Unix and Plan 9

  • Multics was a complex system which was bad because it was used less, slower, etc...
  • Multics was not for end users, it was designed to support "utility computing" wherein computation was a service to be charged for

UNIX was built as "a castrated version of Multics", which was a very complex system. Multics was, arguably, so far ahead of its time that we are only just achieving their ambitions now. Unix was much more modest, and therefore much more achievable and successful. Just enough infrastructure to avoid reinventing the wheel. Just a couple of programmers making something for their own use.

  • Just enough infrastructure to run my programs
  • It was really just supposed to be used by programmers
  • "By programmers for programmers"

Unix was not designed as product or commercial entity at all. It was licensed out because AT&T was under severe antitrust scrutiny at the time.

They wanted few, simple abstractions so they made everything a file. The only difference amongst most files was that you could use seek on some and not on others. Berkeley promptly broke this abstraction by introducing sockets for networking.

Plan 9 finally introduced networking using the right abstractions, but was too late. Arguably the reason the BSD folks didn't use the file abstraction was because of the difference in reliability. SUN microsystems licensed Berkeley Unix and commercialized it. Files are generally reliable, and failures with them are catastrophic so many applications simply didn't include logic to handle such IO errors. Networks are much less reliable and applications have to be able to deal gracefully with timeouts and other errors.

In Anil's opinion Plan 9's design of using file abstraction to represent Network wasn't a good design idea. The reason being file I/O breaking is uncommon but Network has an inherent flakiness and loss of connectivity is normal in networks. Using file system abstractions to represent Network doesn't properly takes care of the flakiness inherent in the Network. Put in other words Network doesn't have the reliability characteristics of mass storage and how to deal with this fact while using the file abstraction to deal with network was a major question which was left unanswered by the Plan 9 designers. Things that have different failure modes require different APIs. Anil also added that Plan 9 was a elegant attempt at representing everything using file abstraction but they were trying too hard with this approach as pointed out above.

In distributed systems the best approach to use is - if things have different semantics then they should have abstractions that reflect their characteristics, the APIs should reflect their characteristics rather than hide it away and try to pretend or treat them as if they were having characteristics of something else in an attempt towards too much generalizations.

Plan 9 implemented procfs, a directory that listed all processes as files. This was later adopted by Linux.

In Anil's opinion another reason why Plan 9 was not widely adopted was that it was a bit late to the scene, by the time Plan 9 came out in the 90s systems running UNIX with networking was widely adopted driven by the success of Internet.

Another valuable point Anil mentioned was that for a technology to get adopted and become successful it should serve or address a niche area for which there are no successful incumbents. There should be a champion use for the technology. Any tech won't continue existing just because it is cool.

Tangent about programming languages: C was for system programming. Java was for enterprise programming.