Using the Operating System
These notes have not yet been reviewed for correctness.
Lecture 3: Using the Operating System
Administrative
Course Notes
The course webpage has been changed to point to this wiki. Notes for lectures will continue to be posted here.
We still need volunteers to take notes and put them up. Don't forget the up to 3% bonus for doing so.
Lab 1
Lab 1 will be up soon. Starting tomorrow, there are labs. Please show up.
If you're clever, you can probably find most of the answers online. Avoid looking up the answers though, because you'll learn much more than the just answers if you explore while using the computer.
You'll do better on the tests if you do the labs.
The point of this course is to build up a conceptual model of how computers work. This conceptual model is not made up of answers, its made up of connections. You'll start to make these connections by doing the labs.
The Lab will posted as a PDF. You can print it out to bring with it with you. When you go to hand in the lab, print off your answers on a separate piece of paper. Answers will be due in two weeks.
All functioning lab machines are running Debian Linux 4.0 (etch). They should be connected to the internet. They should have a browser on them called IceWeasel. Its really Mozilla Firefox. Because Mozilla has trademarked the name Firefox, in order to use their name, you have to use exactly their binary distribution. Debian could have had a waiver, but because Debian is about freedom, Debian didn't want the users of Debian to be bound by the terms of the Firefox agreement.
One thing you'll notice while studying operating systems is that there's a lot of culture. This is because users get used to a particular way of doing things. For example, lots of us are probably used to Windows and how it works. If you changed the fundamentals of how Windows worked, many of us would be unhappy.
Some of the things we'll be studying are based on decisions made long ago, often arbitrarily of for a technical reason was true at the time. Even if it was wrong the, or is now wrong, we're often stuck with it.
We're going to look at some of the baggage in the operating systems as we progress in this course.
For the lab, we'll be given a set of questions in 2 parts.
Part A we should be able to finish during the hour, give or take a few minutes. If it takes you 3-4 hours, you're probably doing something wrong.
Part B will take longer. Will need more research, and a bit more reading. You should be working together. If you have trouble finding a buddy, talk to Dr. S. Talk to each other to learn. The purpose isn't just about the right answers, but more so about the operating systems.
We're going to look at:
- Processes, and how Unix deals with them. - How are the parts of the system divided up. - Dynamic libraries, where they are, where they fit in memory. - What are the dependencies. - How does the graphical subsystem fit in there. X-Windows, the classical Unix graphical environment. - Practice on the command line. (If you're wondering where these fit in in modern graphical environments, read Neal Stephenson's essay)
Try to finish part A in the lab. Answers due in class 2 weeks from today.
Term Paper
For the Term Paper... You don't have to do a pure literature review. You could do an original operating system extension. There's one caveat: You've got to get what you're going to do with Professor Somayaji. You have to get permission a week before the outline is due (The paper outline is due Oct 29).
What types of things is he thinking of: Say you wanted to implement a new file-system? This is inherently more work, because you still have to give a nice write-up. The report should still cite other work.
All of us should have started the process by next week, even if its just googling for 15 minutes. Just google and see what results come up. IF you start now, you'll have time to pick a topic that you like, instead of the first thing that comes along. Its better to work on something you like, instead of stuck reading papers you're not interested in.
If you want to find good OS papers:
- USENIX association has a number of systems oriented conferences.
- OSDI
- USENIX Annual Technical Conference
- LISA
Using the Operating System
Chapter 2 looks at the programming model of an operating system. The operating system provides certain abstractions to help programmers work with it.
What are some examples of abstractions?
Files
A file is a metaphor. What was the original metaphor? The manilla coloured folder that we put paper in. Its interesting to note that a file is used to hold many pages or documents, but that a computer file is a single document. Instead, a directory holds many files, which are each generally one document. The metaphor hasn't made much sense for a long time, but it is still in use.
What is a file?
A file is a bytestream you can read and write from.
We also have an abstraction called a byte, 256 possible values, 0-255. We as computer scientists think we can represent just about anything with these.
- file
- named bytestream(s).
In modern operating systems there are potentially more than one bytestream in a file. When there is more than one bytestream, we call this a forked file.
An early operating system that used forked files was OS 9. On a traditional system, you get a sequence of bytes when you open a file. In a forked file, when you read it, you get some data, but there is also other data hanging around. We'll talk about that later.
The standard API calls for a file are:
- open
- read
- write
- close
- seek
As well as a other operations that one might need to perform on files, such as:
- truncate
- append - (seek to end of file and write)
- execute
Why open and close? Why can't we just operate on a filename? Because it (usually) takes a long time to go through the filesystem to find the files. Open and close are optimizations -- the abstraction is a stateful interface. You start by using open to obtain some sort of "handle" representing the file, and pass this "handle" value to read and write. When you're done, closing the file frees the resources allocated when opening the file. On most systems you can only have a specific number of files open at any given time.
There are some filesystems where open and close don't do much of anything, such as some networked filesystems.
File represent storage, on disks... They're random access. If they're random access like RAM, why don't we access disks like we access RAM? Why couldn't we just allocate objects as we need them? We could indeed do this, but it turns out that there's a reason that we don't generally do this.
The file interface is a procedural interface.
One nice things about files is they're a minimal functionality interface. The concept of minimal functionality is a recurring theme you'll find when we discuss filesystems.
The abstraction used to interface the filesystem shouldn't prohibit you from creating particular forms of applications. If we chose to use an object model, we'd be implying you don't want to give arbitrary access to the data on disk, as objects tend to encapsulate their data.
The abstraction listed above is the minimal abstraction for efficiently managing persistent storage (disks).
This doesn't necessarily mean this is the most absolutely minimal abstraction. An even more minimal abstraction would be to just treat storage devices as a bunch of fixed size blocks. However, that's getting too low level, because now all programs have to worry about where they put files.
Because the files abstract model is reasonably good, its stuck around for decades.
Fundamentally, though, its a legacy. Some models of filesystems try to get away from it. Look at the PalmOS - it resisted having files for a long time, but eventually gave in to support removable media, but the primary OS and API still don't support files. Microsoft's been wanting to get away from the legacy files abstraction too, but somehow it doesn't seem to happen.
Processes and Threads
There's lots of other devices, but from an OS level, there are two other big ones: CPU and RAM. These two are generally abstracted with processes. The process is the basic abstraction in operating systems for these two, but is not the only abstraction. There are also threads.
CPU + RAM are abstracted as:
- processes
- threads
A process may have multiple threads. A thread shares memory with its processes.
- A process is an exclusive allocation of CPU and RAM.
- A thread is a non-exclusive allocation of RAM within a process,
but is an exclusive allocation of CPU.
- One or more threads constitute a process.
Another way to talk about processes is in terms of address spaces and execution context:
- An address space is just a virtual version of RAM. It may
be instantiated in physical memory, it may not be. Its a set of addresses you can call your own.
- Execution context is CPU state (Registers, processor status
words, etc.). There's lots of state surrounding the processor when its running a program. This state can be saved, and then restored later to resume execution at a later time.
- A thread is one execution context matched with an address space.
- A process is one or more execution contexts plus an address space.
- A single-threaded process has one execution context, and one address space.
- A multithreaded process has multiple execution contexts, and one address space.
The concept of multiple address spaces is somewhat new in modern computing. However, if you go back to the old days of MS-DOS, there was only one address space, the physical address space. We used to have things like TSRs, a 640kb limit, etc. There was no virtualization of memory. In order to run at the same time, they had to co-exist in the physical memory address space.
If you don't have multiple address spaces, you don't have processes and threads. At best, you have threads, sharing the one address space you have.
Historically, threads were abstracted differently than now: These are capitalized here to differentiate them from the newer terms: FORK, JOIN, QUIT.
Why FORK? Think of a fork in the road. You're going along, then things split. A FORK is supposed to represent that split.
By FORKing, the main thing to note is that you're creating two execution contexts, that may be sharing memory. The execution may start at the same place, but may be diverging. How do you stop creating more and more and more of these, to bring them back under control or stop them? That's the JOIN operation. Each thread tracks how many threads are running, if you JOIN and you're not the last one running, you just go away, otherwise you need to synchronize back into the main thread.
What's QUIT? QUIT stops the the whole program -- all execution. It will cut all threads off, even if the thread is one of the branches and not the main thread.
This was one of the earliest ways to abstract multiple execution context.
What if, when you did the fork, you made a copy of the entire process? There are now two separate instances of the program, with the same state? The difference here, is if you quit one, the other will stay around -- but the difference is more profound: they're not sharing the same address space (nor execution context). This is the Unix model of processes.
In the Unix process model the system starts with only one process: init. It starts running, then it creates a copy of itself with fork, then another, etc.
In this diagram, What is the value of x? on the bottom-left-most branch? x is 5 in the Unix process model. However, if this was multithreaded, x could be 7 or 5, depending on how fast the threads are running. It might be 5 if the thread asking for the value of x runs before the thread setting x to 7. This is known as a race condition, because we don't know which thread will run or finish first.
In Unix, they decided to make it easy and have different processes. These processes can't change the state of their parents or children. To share a value, you have to set the value before forking. (Or through other means)
There's a small glitch with what we've said so far about Unix processes: That they have exactly the same state when you fork. If this was true, they'd always do the same thing. How do they know that they're different?
Turns out that Unix fork is very simple, yet it helps with this. The idiom you'll usually see is:
pid = fork();
fork takes no arguments.
When you fork, the result of fork is the pid (process ID) of the new process, or 0 if you're the child.
The tree of processes effectively becomes a family tree. (However, with some bizarre genealogy that we'll see later)
What you usually do is check the value of pid, and if its 0, do one thing, otherwise do something else. If pid is nonzero, it is the pid of the child process we just created by forking. You usually use this to track your child. The classic use of fork is to create disposable children that do a specific task for a short while, then go away.
The nice thing about this model is that it keeps things separate. You don't need to worry about what the child is doing. If you want to communicate, you have to explicitly set up to do this. There are some standard ways of doing that communication. We'll look at these later too.
So now we know how to make new processes? How do we do something different? In principle we don't need anything else. We could open a file, read new code, then jump to the new code. However, we have the idea of exec(). Exec replaces the running program with the specified program, but preserves the pid.
In Unix, to start a new program you usually fork() then you exec() the desired program on the child. If you don't fork() first, then exec() will kill the original process, replacing it with the program you called exec on.
Exec causes the kernel to throw away the old address space, and give a new address space, with the new binary. The pid stays the same though.
The Windows equivalent is CreateProcess()
CreateProcess() takes lots of arguments about how to create the new process (what to load, permissions, etc). Fork takes none. With fork(), you can set things up yourself, and most of the settings will carry over to the new program. (Including open files). Note how different these two are.
In Unix, you have the building blocks to do things, and you have to put them together yourself. In Windows, you have the single API call to do them all at once. Neither is strictly right or wrong.
On older systems, when a big process was forked, everything was copied. On newer systems, fork doesn't necessarily copy everything. With virtual memory you can share much of the memory between two processes.
In older APIs there was vfork() - suspend parent, fork, exec, then let the parent and child both start to go again. This idea avoided the copying when the first thing you were going to do was exec.
The basic idea to make this efficient is that the descriptions of the virtual memory address spaces don't have to be mutually exclusive. You could have 10 programs sharing portions of their address space -- such as the read-only portions like the program code, but not the read-write portions.
What if you didn't want to do an exec after forking? A classic one is a daemon. One listening on the network for an incoming connection. When that incoming request comes in, the main program can deal with the request, but it would also have to keep checking for more requests at the same time. Instead, in Unix the typically idiom is to fork off a child to process that connection, and then go back and wait for more.
You can have shared memory. But the default model for processes is that nothing is shared, but the threading model is everything is shared. For threads you have to implement protections, but for processes, you have to opt-in to share.
Processes win out on reliability: fewer chances for errors. You control exactly what state is shared.
Another thing we'll talk about later regarding threads versus processes is how does this play on multiple cores? This depends on the implementation, and sometimes is a little tricky.
Chapter two is talking about the model presented to the programmer. An API for your processes and threads to talk to the world.
This course is fundamentally about how these things are implemented. Its useful to know about these tricks, so that you know how the computer is used. It turns out the same tricks are useful in lots of other circumstances. Such as concurrency - when you have to deal with an application that has to deal with this - which happens to be most applications. You'll learn this because the OS guys did this first.
Graphics
This part of the lecture should help you with the lab. Its about graphics.
We've talked about some standard abstractions so far: files, processes, threads.
However, the thing you really interact with is the keyboard, mouse, and display. In the standard Unix model, these are not a part of the operating system. They're implemented in an application.
The Unix philosophy is that if you don't have to put it in the kernel, don't put it there, or if you do, make it interchangeable.
The standard way to do graphics in Unix is X-Windows, or X for short. Before X there was the W system. There was a Y system at one point, as well as Sun NeWs.
There was also a system called Display Postscript. Postscript is a fully fledged programming language. Originally used for printers. It was developed for laser printers, by a little company called Adobe. When laser printers came out, they had really high resolutions. It was hard to get the data necessary to print a page to the printer fast enough... So postscript programs were sent to the printer. In the early days of the MacIntosh, the processor in the printer was more powerful than the processor in the computer. Postscript is a funny little language. Its a post-fix operator language. Instead of saying things like "4+5" you say "4 5 +" -- you push them onto the stack, then run an operator on them. The same with function calls.
In the 80s, there were many competing technologies for how to do graphics in the Unix world. X won. But Display Postscript also kind of won, because Macs use Display PDF in a system called Quartz, which was created as a successor to Display Postscript. Because Postscript was linear, it was hard to parallelize. PDF is easier to parallelize.
NeXT was the one that used Display Postscript first... NeXT was founded by Steve Jobs. OS X is Unix with Display PDF... And you can run X-Windows on top of that.
X-Windows lets you open windows on remote computers. The way you create a window on your local computer is the same way that you open a window on a remote computer, 1000s of miles away. X is based on something called the X Window Protocol. It just happens to work locally as well (with some optimization like shared memory), but the messages were designed to work well over ethernet.
This was created by folks that wanted to talk to hundreds of computers, such as the supercomputer in another room... but they wanted to see the windows on their own computer.
Consider what you have to do to see a remote window in Windows. You fire up Remote Desktop Client, and you get the whole desktop remotely. If you want to do 10 computers, you end up with 10 windows with 10 desktops and 10 start buttons. This difference is a result of X-Windows being designed for networks and Windows being designed for one computer.
The terminology for X-Windows is a bit backwards from what we're used to: The server is what we mostly think of of as a client. The server is what controls access to the display: it runs where your display is to control your display, mouse, keyboard... And to display a window, remotely or locally, you run a program known as a client in X-Windows which connects over the network to display a window on your X-Windows server.
A funny thing about X is it took the abstraction to an extreme. The people who created X-Windows didn't know anything about usability or graphics or art. The original X-Windows tools were created by regular programmers. Technically underneath its very nice.. But they knew they didn't know, so they made it so the user could decide what it should look like themselves. So that you can just switch out a few programs and things keep on working.
This means that when you do things like moving your mouse to a window -- what happens? Do you take focus or not? This is something known as click to focus. In older X Systems, you could just point your mouse there, and focus followed. This is potentially very efficient, but also very confusing if you're not used to it... Or how do you handle key sequences, or minimize? Who decides how to do this all? They had the idea of something called a Window Manager. This goes back to X Servers providing the technical minimums so that you're not limited to one behaviour. The Window Manager is just another X client, with some special privileges so it can run anywhere. It could run 1000s of miles away.
This is why on Linux there's Gnome, KDE, etc. There's Motif, GTK, QT, IceWM, Aferstep, Blackbox, Sawfish, fvwm, twm. Etc. Other graphical toolkits too, abstracted away. These choices are all available there because the X-Windows people left it very open by not making the choice for us. This does make things a little confusing at times, though, because each application could have different assumptions.