Operating Systems 2014F Lecture 2
Audio from the lecture given on September 10, 2014 is now available.
{{{ machine state program counter process states paging / swapping process running program virtualizing time/space sharing mechanisms policy }}}
Chapter 4 in the book:
processes - key abstraction in a modern operating system
Sitting all day kills you - seriously reduces your life expectancy. Working out, doesn't necessarily make up for sitting all day. If you walk around for 5 minutes every hour. Anyone use typing break programs? Occupational hazard of the career path you have chosen is that you sit in front of the computer typing. Anil started typing dvorak early in order to avoid repetitive strain injuries.
xwrits link to save your wrists on a *nix machine gives you hand gestures to say it's time to get up. you can get it to insult you with hand gestures. It does hand gesture depending on culture. In order to tell you to take a break.
When typing it is important to take breaks.
You need to distinguish between programs and processes. A program is inprecise in the context of O/S.
Program is precise in context of operating systems - Your web browser is that a program? Web browser is a lot of little programs, but makes up one big program. It is not a precise thing. What is precise is an executable. An executable is a file on disk that can be exec'd. (Disks are no longer disks they are all kinds of things such as filestate, etc) This is the unix version of the statement. There is a system call - called execve - takes as one of it's parameters a file and that file is then loaded into a process obliterating whatever else was in the process.
Code can take many forms in a computer systems, this is just one form of data.
For example, you ahve a text file that has a javascript / perl program / something, that is a program, it is also a text document, the operating system kernel does not really recognize it as an executable. You cannot give it as an argument to the execve system call. It has to run it indirectly, it has to find another exec executable to run that code. You have executables and you have processes.
A process - is an executable that has been executed - loaded into memory and started running. A process you should think of as an abstraction of a computer that can only run one program at a time. (Older personal computers, early 1960's or something, there is no abstraction of a process. There is no notion of running more than one program at a time. Logically speaking: when you wanted to run a program, all of memory would be loaded with that program, when you wanted to quit the program, you cut the power (turn the computer off).) They run one program at a time, you load it off the disk, and it has complete control of the machine. A process is the abstraction you get when you say, we don't want every program to have complete control of the computer because I do not want to have to reboot the computer to switch programs. I want to run different programs concurrently, for multiple reasons. Want to chain multiple programs in order to produce a result. (A Unix pipeline) The process - giving each running program (each executable) it's own virtual computer to run.
Virtualizing / virtualization (term is rather overloaded) What am I talking about when I say virtual? Something that isn't real. It's not a real thing. When people talk about virtual reality, they are talking about something that can be experienced. What we are saying in a computer science context: When we say virtual, we are really talking about an abstraction - What we actually have, the real thing is not good enough, it doesn't have qualities that you want, so you want to transform it into something more useful (in some way). When we talk about a virtual machine, we are talking about a machine (computer) that does not exist, in the sense that it is not embodied in actual hardware.
(from the theoretical side of computer science): All programming languages or programming system to a first approximation are equivalent, a system is known as Turing complete it can run anything. Turning one Turing complete system into another Turing complete system is the process of virtualization. The ones you've often heard of are: Language Based Virtual machine - an example: java virtual machine. Really you could talk about any time you run a higher level language (perl, javascript, python, etc) That code does not run directly on the processor. It runs inside of another program which has some kind of virtual machine. Strictly speaking, a lot of languages can be interpreted, which means that you have a program that goes through line by line and figures out what that line is supposed to do and what the next instruction is. The point is that no modern language operates that way. What they all go through is some sort of translation phase, converts it to some binary code, and then it runs the byte code. That runtime is what's called a virtual machine. But virtual machines are everywhere when we are talking about trying to run programs. Operating systems can be thought of as implementing a virtual machine and that virtual machine it implements is the process. Key difference between a virtual machine that makes processes and the vm that is typically in these language based virtual machines. The difference between these is getting smaller. Any idea what this difference is?
Java based Virtual Machine - executes byte codes. hardware can't interpret byte code
What is the nature of the binary format that is being run in an operating system process? What format is that code? - machine code - it's the code that is understood by the processor. Machine code here, byte code here, what's the difference? The hardware can't interpret this, This language the processor needs another program to translate. Why can't the processor understand java byte code? It could, there are chips that run java byte code natively. What's worse, the machine code that your processor understand? it actually doesn't. Modern processors such as x86 or x86-64 these are the most common things for a pc, arm byte code, arm machine language, that sort of thing. This language is too annoying to use internally inside the microprocessor, it's not efficient, it was not designed to run very fast. It actually has a front end that takes that code and translates it to another byte code. There have been processor startups, where instead of having it done directly on the chip, they put something like a java virtual machine on the processor. Why am I saying this? The virtual and the real in computer science are often hard to tell apart. Virtual to one group could actually be real to another group. When you are coding in java / c, that is the language, that is real to you. That is the abstraction you are working i. but there are actually other levels below you. That generally is not the real level. when you are dealing with millions of transistors, there is a lot of abstraction. The process is the virtual machine that you run processes in. You take a file and loads it to disk. there is a little problem with this concept: Program on disk, is it a one to one mapping between programs on disk, and programs in memory? Not at all! Most programs on disk are not running at any given time. A given program on disk can be running in many different processes, you can have multiple instances of the same program running at the same time. Logically in an API you have to distinguish between the
A thread is not a process.
Process = thread(s) + address space
having more than one cpu running around inside the address space. What happens if they try running the same code twice? Change the loop index in the loop from outside of the loop. The right way to think of this is don't do that. When you put more than one cpu inside the address space. For a long time operating system supported one cpu inside the address space. They supported lots of processes but only one cpu. That's kind of limiting, but why do you want your running programs to be sharing memory, the main reason you want them to share memory is to communicate. Shared memory has one advantage - it can be very fast. How do you make sure you don't overwrite each other's messages.
Modern computation - big systems - you do almost everything you can - when you share memory you put some sort of API on top of it to control access. The only problem is - other than having potential overhead.
an operating system is a set of mechanisms and policies to allow for time and space sharing of the processor. The processor is a limited resource. one program gets it for a while, another gets it for another while. Space sharing means you have RAM, and memory.
Virtual memory and physical memory
A program running with full privileges can still have a segmentation fault. The kernel can also have segmentation fault. When this happens the machine will crash hard.
Environment variables for X:
new view - direct graphical output from another computer on the network - bad because of latency - new
How to get around lag - run more of the code on the client instead of the server. Have the xclients have some code - transfer code to the xserver, to run on the server. Invisible website, downloads the page and it runs in your browser. Same thing different technology stack.
Mechanisms vs. Policy -
mechanisms - things to do things - the knobs that let us manipulate program state - should be maximally flexible so that they can implement whatever policies you want to do.
policy are what you should do
X Server <= mechanism
window manager, toolkit <= policy
Windows - two calls - create process () <--- many different parameters
unix - fork() and execve(file, cmdline, env)