Soma-notes - User contributions [en]

Distributed OS Overview

2008-01-15T01:49:15Z

Bishopdesmond33: minor typos

== Distributed Operating Systems ==

[[Image:OS4000_Distributed.png|A distributed operating system.]]

At what level do you want to start the distribution?
'''Hardware Layer''' 
If memory is shared, communication is trivial ie. parallel computers (multi core).

'''Kernel Layer''' 
Distributing at the kernel layer will avoid API and User Space changes. So why don't we share at the lower kernel layer? The main reason is security and performance. We can assume that memory is fast (low latency) so we need to share memory across each computer in the distribution. To share memory in such a way presents a challenge. So why isn't virtual memory enough? Typically because of contention over memory. Virtual memory is slow; duplicating memory pages and synchronizing them across the network takes too much time.

'''Process Layer''' 
At the process layer, we can perform the distribution over the network using TCP/IP. Unfortunately, TCP/IP has a high latency. We can use ethernet to reduce the latency, but it is only a viable solution for LAN based systems. Therefore, latency will always be present.
We can deal with latency by caching a local copy of data which effectively reduces the amount of communication required. The downside is, caching introduces a need for synchronization.

We can also deal with latency by compressing the data and splitting up the computation "wisely". Computers are not wise enough to do it effectively, leaving it up to the programmers. Programmers split up such computations by introducing client/server architectures, using web applications and web services as well as using distributes file systems. For example, spam uses internet resources to communicate by email with large numbers of users. Spam is a feature of the Internet, everyone should be able to send an email to everyone and spam uses that resource. Distributed operating systems on the scale of the Internet capable of wise resource management do not yet exist. 
 
== What Makes a Good Distributed Operating System? ==

A good distributed OS must be:
* Reliable and support dynamically scalable storage.
* More processing power (CPU - linear scaling).
* Manageable (it should be easy to manage, like a single computer)
* Easy to write programs
* Support fore single sign on!!
* A single system image
* Reliability (fault tolerant to software and hardware errors)
* Dynamic reconfiguration
* Exploit local resources.

Distributed OS Overview

2008-01-15T01:48:52Z

Bishopdesmond33: minor typos

== Distributed Operating Systems ==

[[Image:OS4000_Distributed.png|A distributed operating system.]]

At what level do you want to start the distribution?
'''Hardware Layer''' 
If memory is shared, communication is trivial ie. parallel computers (multi core).

'''Kernel Layer''' 
Distributing at the kernel layer will avoid API and User Space changes. So why don't we share at the lower kernel layer? The main reason is security and performance. We can assume that memory is fast (low latency) so we need to share memory across each computer in the distribution. To share memory in such a way presents a challenge. So why isn't virtual memory enough? Typically because of contention over memory. Virtual memory is slow; duplicating memory pages and synchronizing them across the network takes too much time.

'''Process Layer''' 
At the process layer, we can perform the distribution over the network using TCP/IP. Unfortunately, TCP/IP has a high latency. We can use ethernet to reduce the latency, but it is only a viable solution for LAN based systems. Therefor, latency will always be present.
We can deal with latency by caching a local copy of data which effectively reduces the amount of communication required. The downside is, caching introduces a need for synchronization.

We can also deal with latency by compressing the data and splitting up the computation "wisely". Computers are not wise enough to do it effectively, leaving it up to the programmers. Programmers split up such computations by introducing client/server architectures, using web applications and web services as well as using distributes file systems. For example, spam uses internet resources to communicate by email with large numbers of users. Spam is a feature of the Internet, everyone should be able to send an email to everyone and spam uses that resource. Distributed operating systems on the scale of the Internet capable of wise resource management do not yet exist. 
 
== What Makes a Good Distributed Operating System? ==

A good distributed OS must be:
* Reliable and support dynamically scalable storage.
* More processing power (CPU - linear scaling).
* Manageable (it should be easy to manage, like a single computer)
* Easy to write programs
* Support fore single sign on!!
* A single system image
* Reliability (fault tolerant to software and hardware errors)
* Dynamic reconfiguration
* Exploit local resources.

Operating System Organization

2007-10-01T02:13:42Z

Bishopdesmond33: minor typo & grammar fixes

==Operating System Organization==

===Device Management===

How do kernels communicate with devices such as a network card? How do drivers for such devices fit into the kernel? We need a mechanism to allow applications to communicate with the devices. Most kernels use a form of message passing, often using a registration system. For example, a network card device driver would register itself with the kernel and identify that it is in fact a network card (as opposed to say, a mouse). MS DOS used interrupt handlers instead.

Glenn's talk focused mostly on the how the Linux Kernel works.

====File Abstraction in Device Management====
In Linux there exists a "/proc" directory which contains special information. This directory does not actually exist on disk. When the kernel receives a request to read a file in one of these directories, it retrieves system information and serves it up as a file. For example, executing more /proc/cupinfo gives:

<pre>
$ more /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : Intel(R) Pentium(R) M processor 1.73GHz
stepping : 8
cpu MHz : 798.000
cache size : 2048 KB
...
</pre>

Each process that is currently running on the system gets its own directory in /proc, with the process ID (pid) as the directory name. For example for process with pid 2, there exists "/proc/2/" which contains more information about that process.

=====/dev=====

The "/dev" directory actually exists on the file system and contains entries for devices (called nodes). For example, the first hard drive on the system might reside in "/dev/hda/". Each device entry has a major node number and a minor node number. For example, the hard drive specified by "/dev/hda" might have major node number "3" and minor node number "0". At first the node numbers were pre-defined and there could be no more than 255 of them. These major/minor node numbers are used to link the specific device types into the kernel. These nodes existed in "/dev" even if the devices were not connected to the system.

=====devfs=====

This was eventually replaced with a new system called "devfs" (device file system), which was a pseudo-file system similar to /proc. Devfs is implemented in the kernel and knows about the currently available hardware. Some problems still existed with this system: it was implemented in the kernel, and thus a change to hardware required an update to the kernel; and the major and minor node numbers were still fixed in the kernel. It would be nice to dynamically reassign major nodes to unknown devices that are actually present on the system. Devfs also prevented the renaming of nodes in /dev. For example, previously one could rename /dev/hda to /dev/cdrom, but it would still actually point at hard-drive a. This behaviour was prevented in devfs.

=====udev=====

Another problem existed with hot-pluggable devices (such as USB devices). Minor node numbers were assigned by the kernel in the order by which they were discovered. Devices might have different node numbers after a reboot. Also no notifications occur when a device is connected or disconnected from the system.

Devfs has since been replaced by a new system named "udev", which was implemented in the user space, not the kernel space. The ability to rename nodes in /dev was enabled again by udev. The issue regarding hot-pluggable devices was addressed by permitting minor node numbers to be dynamically assigned. udev also notifies applications when hardware is connected or disconnected. Network cards are a special case - the kernel knows about network protocols, so network cards must be accessed using a different interface.

=====Other files, pipes and sockets=====

An example call for opening the CD-ROM may look something like:
<pre>
handle = open(/dev/cdrom, ...)
</pre>
This is a call to the kernel which will use the cd-rom devices drivers to read from the disc.

Pipes and sockets both operate as files and support the basic operations:
* open
* read
* write
* close

Pipes are used for inter-process communication. Each process can open one end of the pipe, and then they can read or write to it.

Sockets are used in a similar manner to communicate over a network.

===Kernel Development===

Standard development tools aren't always helpful when debugging during kernel development. How can you debug a kernel that crashes before the display drivers work? The Linux kernel outputted Morse code to the LED lights on the keyboard.

Often developers must work around bugs in hardware, as it is usually much cheaper to fix it in software than to change the hardware design.

===Process and Thread Management===

Context switching between different process is very expensive in terms of execution time. Different situations call for different strategies for managing context switches. For example, consider terminals and servers. Server systems can get away with fewer context switches, as they can completely serve up a request for a web-page before moving on to the next request. On a terminal, the user is present and expects instant feedback. If the mouse is moved, and the process that moves the mouse pointer does not respond quickly, the user will perceive that the machine is slow, or unresponsive.

===Memory Management===

Chapter 3 gives just a brief introduction to memory management.

Processes have their own virtual memory map. P3s and P4s used 32-bit addressing, which gave a maximum address space of 4GB. Note that there may not even be 4GB of physical RAM available to be used. When a process requests memory that is not currently in RAM, the operating system must retrieve it from disk (aka paging).

The operating system also needs to protect the kernel's memory space from other applications. Supervisor (root) vs user mode determines the level of access to memory.

===Kernel Design===

Monolithic vs micro - how much stuff should be implemented in the kernel. Microkernel design means that processes and applications do more work, which requires more context switching, but this permits the kernel to be more reliable.

An example monolithic kernel might include things such as:
* network driver
* display
* clock

A micro kernel might only include the minimal items:
* memory allocation
* process switching

===File Systems===

Consist of many items:
* directories
* files
* device nodes
* links (in Windows these are called shortcuts)
* pipes

In DOS, only directories, files and device nodes were used. In Windows, they are well hidden under the path "\\" or they are given special names, such as "AUX", "COM0" or "LPT0".

Computer Organization

2007-10-01T02:11:47Z

Bishopdesmond33: /* Bootstrapping */

==Computer Organization==

===Introduction===

* Information has been posted on the [http://homeostasis.scs.carleton.ca/os/index.php/Main_Page class website] regarding running a Linux distribution on your own home machine (or a virtual desktop environment). Students are encouraged to familiarize themselves with the linux/unix environment as it is far less restricted and more robust than Windows/OSX.
* Our research paper (ie. Operating Systems) should be well on its way. Students should have key words and topics narrowed down.

===Stored Program Computer===

The concept of a stored program computer began in the early 1800s with Charles Babbage and his "''Difference Engine''". Since then it has evolved into what we now consider '''Von Neumann Architecture'''. Although Von Neumann was only a piece of the puzzle of modern day computer architecture, we still refer to the architecture as being named after him. In more recent times, the overall Von Neumann model has changed, but conceptually, it is still the same. Several different components talk to each other (via buses) all at the same time - this scenario can be very complicated at times.

[[Image:Neumann.jpg]]

===Central Processing Unit===

The '''central processing unit''' (CPU) is made up of an Arithmetic Logic Unit (ALU) and Control Unit (CU).

====Arithmetic Logic Unit====

The '''ALU''' can perform all arithmetic (addition, subtraction, multiplication, division) and logical (AND, OR, NOT ... ) calculations at a very rapid rate.

====Control Unit====

The '''CU''' is made up of several components and duties :

* ''Program counter'' or PC is the address of the currently executing instruction.
* ''Status word'' is used to store information regarding the current state of the control unit. Information such as overflows and execution status can be stored. A word is typically what the processor natively deals with in terms of data size. Most modern processors are 32 or 64 - bits. That is, their registers store data segments that are 32 or 64 bits wide - which we collectively call a word. The size of the word will vary from CPU to CPU.
* Responsible for the ''fetch-decode cycle'' which is an algorithm that grabs information from main memory, reads it, decodes it and then hands it off to the specific part of the processor to handle the instruction (oftentimes in involving the ALU).

The above description of the control unit is very simplified. It is actually a reflection of a CPU in the early to mid 1980s. For instance, '''Moore's Law''' states that that every 18 months, the number of transistors that sit on a CPU chip will double. This "law" was introduced in the 1960s and currently still holds true. '''Why?''' Mainly because the industry targets themselves towards this figure. Given '''Moore's Law''', you can imagine how much CPU architecture has changed since the 1980s.

====Transistors====

With all of these extra transistors, what can we do with all of them? One (or both) of two things :

=====Integration=====

'''Integration''' refers to more and more components getting packed into the CPU. Devices such as :

* ''Memory Controller'' (circuitry needed to drive the RAM)
* ''Math Co-Processor'' (handles floating point values)
* ''Memory'' Management Unit (MMU - used for virtualization)
* ''L1 & L2 Cache'' (used for fast memory access when we don't want to go all the way to main memory. L1 cache is small but fast. L2 cache is large but slower).
* ''Systems on a Chip'' : systems that have all transistors incorporated into a single chip. These currently exist, but are fairly slow in regards to overall performance.

=====Duplication=====

'''Why have one ALU when we can have 10?''' Modern processors use this technology quite a bit. Certain components can only be duplicated so much. '''Why?''' Aside from hardware constraints, we also have bottlenecks. For instance, a CPU has only one PC (program counter). What is the point of having several ALUs if the program counter can not keep up?

The most common forms of '''duplication''' include :

* ''Pipelining'' : In pipelined CPUs a specific function unit of a CPU is partitioned into K smaller parts, called '''pipelined stages'''. When an operation is to be executed, it goes through the K stages of execution. The stages can execute independently and in parallel because they are separate units of hardware. The operation can be broken up into pieces where each stage can work on a particular piece so no part of the CPU is left idle. What happens if two instructions need the same piece of hardware? The CPU can guess which way the instruction will go afterwards and continue on its way. If it later learns that the path it took was incorrect, it can roll back its changes and carry on with the correct path. If the guesses are constantly wrong, the system is slowed down to a crawl. This entire process of 'guessing' is known as '''speculative execution'''.

[[Image:Pipeline.jpg]]

* ''Thread Level Parallelism'' : if we can't get a single instruction stream parallelized, we can just have several separate threads. This means we can share all of the address space information such as cache because the threads will be running in the same process.

* ''Multi core'' : the most current technology in practice. After several years of packing CPUs with more and more features, bottlenecks began to arise. '''Solution?''' Duplicate the CPU! As the industry currently shows, more and more computers are being released with '''dual''' or even '''quad core''' processors. What are some bottlenecks that can arise with '''multi core processors'''? The bus is one obvious answer. If we increase the amount of information the buses can handle, we can accommodate multiple CPUs.

* ''CPU cache'' : Why not get rid of RAM altogether and simply put all of it on the CPU as L1 or L2 cache? Cache memory is known as '''static memory''', whereas RAM is known as '''dynamic memory''' or '''DRAM'''. DRAM can store information using about 1/4 of the amount of transistors as static memory. This is why there is only around 4MB of CPU cache available in most modern day processors. What RAM lacks in speed it makes up in size and parallelism.

The book " [http://books.google.ca/books?id=R7Frpn3g9AEC&dq=&pg=PP1&ots=f0kVd8_wKX&sig=66x6OTPMjfzJneY-mOHyQJOC1iM&prev=http://www.google.ca/search%3Fhl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26hs%3DdDa%26sa%3DX%26oi%3Dspell%26resnum%3D0%26ct%3Dresult%26cd%3D1%26q%3DComputer%2BArchitecture%2BHennessy%2BPAtterson%26spell%3D1&sa=X&oi=print&ct=title#PPP1,M1 Computer Architecture : A Quantitative Approach] " by Hennessy and Patterson is an excellent book on CPU Architecture.

What does all of this have to do with Operating Systems? If the OS does not understand resources (underlying hardware), then the system is doomed. We have to design the OS around the architecture (especially important with multi-core systems).

===Latency & Bandwidth===

Imagine the following situation : We have two facilities, A & B. A very large amount of information has to be transfered between the locations. We can do it in one of two ways :

* Load up a semi-truck full of hard drives that are full of the information needed.
* Connect the locations together via fiber optical cable to transfer the information.

'''Q''' : What method will allow the most information to be transported?

'''A''' : Obviously the truck full of hard drives, especially with modern day drives being capable of storing up to a Tera-bye of information. The amount of information to be transfered is known as '''bandwidth'''.

'''Q''' : What is the quickest way to get one single bit of information from location B to location A?

'''A''' : This time, the answer is the fiber optic cable. How quickly information is transfered is known as '''latency'''.

[[Image:Hds.jpg]]

The above situation is also apparent when dealing with RAM. RAM is known as high-latency, high bandwidth. Moving large amounts of data to the CPU can sometimes be sped up using the cache. This issue is a very commonly studied topic in the world of RAM - CPU connections.

Other devices and their latency :

* CPU registers : lowest latency
* L2 & L1 Cache : low latency
* Hard disk : highest latency

====I/O====

There are several different ways '''Input/Output''' devices can communication to the CPU and RAM :

* ''Polled I/O'' : Think of it as giving a person a task. You can poke the person every 5 minutes to see if they have completed the task. If the person has finished the task, they have to respond with a 'yes', otherwise 'no'. In this scenario, one party has to poke every 5 seconds and another party has to respond every 5 seconds as they get poked. Obviously, not the most efficient solution.

* ''Interrupt I/O'' : In this situation, when giving the person the task, we simply ask them to tell us when they have completed it. Much more efficient than polling.

If we have a '''small''' amount of data being processed by the I/O device, then our system runs relatively efficient. If the data is a large in size, the communication involved between the I/O device, the CPU and the RAM is frequent. There are two solutions to this problem:

* ''Memory Mapped I/O'' : devices are associated with logical primary memory addresses rather than having a specialized device address table.

* ''Direct Memory Access (DMA)'' is when devices and their controllers are able to read and write information directly from the primary memory without any CPU getting in the way. The DMA contains the same algorithms the CPU uses, except conceptually it does not need any registers. This direct access can increase the speed significantly.

* ''Intelligent Devices'' : a final alternative is to have memory and chip sets that act in a way the CPU does on the I/O device itself. This means we do not need to talk to the CPU or RAM at all (it can all be done on the device itself).

===SIMD Processors===

'''Single Instruction Multiple Data (SIMD)''' processors handle one instruction accompanied with an entire array of data. This type of architecture is quite efficient at processing large amounts of data that all need to be operated on using the same instruction. Processing speeds are increased significantly because the processor does not have to decode every single introduction (since all pieces of data are only using one single instruction). Multimedia devices like MP3 players and gaming consoles all take advantage of this technology, these machines are also known as vector machines.

The Sony PS3 has one MIMD (multiple instruction, multiple data) processor and several SIMD processors. Compilers need to be adjusted and game programmers need to adapt their writing skills in favor of this new technology. Someone eventually has to slave over a library of machine language instructions in order to optimize the particular application to SIMD technology (not the most exciting task).

===Boot Process===

How is the OS started? How does the computer go from powering on to running the Operating Systems (Windows, OSX ...). This process takes a really long time when you consider how fast computers have become, '''why''' is this? This occurs because the system must determine all of the hardware present in the machine, each hardware's manufacturer, specifications, requirements etc. It can become a real mess! Before the OS can even begin to start, all of the hardware in the system must be properly configured. This all begins with the BIOS.

====BIOS====

'''BIOS (basic input/output system)''' is the code that runs before any other code runs. It is built into the system. In older times, the BIOS was burned into the machine and never changed. Nowadays, it can be flashed and reprogrammed. This can occur when bugs in the BIOS are so troublesome, that an update is needed (typically when the OS does not work well with the hardware). The BIOS configures all the hardware by running diagnostics etc. After hardware is configured, how is the OS launched? This process is not direct, mainly because the BIOS has limited capabilities because of legacy issues. The BIOS is configured in a way that it believes your computer's hardware is reflective of the year 1982. This means it believes you only have several megabytes of RAM, an 8086 processor, limited disk space and so on. It does however, know how to load information of up to 512 bytes. This is where bootstrapping comes into play.

====Bootstrapping====

After hardware is configured, the BIOS runs a small amount of code (approx 400 bytes, while the remaining amount is used to store partition tables) known as the '''boot block'''. The BIOS loads this algorithm into memory where it is executed. The '''boot block''' is just enough code to load another algorithm that makes the machine reflect more of what it really is (not an 8086). From here we can initialize RAM, hard disk and eventually start the OS.

Macintosh computers, run a next generation, more sophisticated BIOS called '''EFI'''. The EFI allows the computer to go from BIOS to OS startup more easily and direct. This is because the EFI recognizes the hardware of the computer for what it truly is. Older macs use '''open boot''', which is a BIOS that allows the user to enter commands via a prompt.

Computer Organization

2007-10-01T02:11:06Z

Bishopdesmond33: minor typo & grammar fixes

==Computer Organization==

===Introduction===

* Information has been posted on the [http://homeostasis.scs.carleton.ca/os/index.php/Main_Page class website] regarding running a Linux distribution on your own home machine (or a virtual desktop environment). Students are encouraged to familiarize themselves with the linux/unix environment as it is far less restricted and more robust than Windows/OSX.
* Our research paper (ie. Operating Systems) should be well on its way. Students should have key words and topics narrowed down.

===Stored Program Computer===

The concept of a stored program computer began in the early 1800s with Charles Babbage and his "''Difference Engine''". Since then it has evolved into what we now consider '''Von Neumann Architecture'''. Although Von Neumann was only a piece of the puzzle of modern day computer architecture, we still refer to the architecture as being named after him. In more recent times, the overall Von Neumann model has changed, but conceptually, it is still the same. Several different components talk to each other (via buses) all at the same time - this scenario can be very complicated at times.

[[Image:Neumann.jpg]]

===Central Processing Unit===

The '''central processing unit''' (CPU) is made up of an Arithmetic Logic Unit (ALU) and Control Unit (CU).

====Arithmetic Logic Unit====

The '''ALU''' can perform all arithmetic (addition, subtraction, multiplication, division) and logical (AND, OR, NOT ... ) calculations at a very rapid rate.

====Control Unit====

The '''CU''' is made up of several components and duties :

* ''Program counter'' or PC is the address of the currently executing instruction.
* ''Status word'' is used to store information regarding the current state of the control unit. Information such as overflows and execution status can be stored. A word is typically what the processor natively deals with in terms of data size. Most modern processors are 32 or 64 - bits. That is, their registers store data segments that are 32 or 64 bits wide - which we collectively call a word. The size of the word will vary from CPU to CPU.
* Responsible for the ''fetch-decode cycle'' which is an algorithm that grabs information from main memory, reads it, decodes it and then hands it off to the specific part of the processor to handle the instruction (oftentimes in involving the ALU).

The above description of the control unit is very simplified. It is actually a reflection of a CPU in the early to mid 1980s. For instance, '''Moore's Law''' states that that every 18 months, the number of transistors that sit on a CPU chip will double. This "law" was introduced in the 1960s and currently still holds true. '''Why?''' Mainly because the industry targets themselves towards this figure. Given '''Moore's Law''', you can imagine how much CPU architecture has changed since the 1980s.

====Transistors====

With all of these extra transistors, what can we do with all of them? One (or both) of two things :

=====Integration=====

'''Integration''' refers to more and more components getting packed into the CPU. Devices such as :

* ''Memory Controller'' (circuitry needed to drive the RAM)
* ''Math Co-Processor'' (handles floating point values)
* ''Memory'' Management Unit (MMU - used for virtualization)
* ''L1 & L2 Cache'' (used for fast memory access when we don't want to go all the way to main memory. L1 cache is small but fast. L2 cache is large but slower).
* ''Systems on a Chip'' : systems that have all transistors incorporated into a single chip. These currently exist, but are fairly slow in regards to overall performance.

=====Duplication=====

'''Why have one ALU when we can have 10?''' Modern processors use this technology quite a bit. Certain components can only be duplicated so much. '''Why?''' Aside from hardware constraints, we also have bottlenecks. For instance, a CPU has only one PC (program counter). What is the point of having several ALUs if the program counter can not keep up?

The most common forms of '''duplication''' include :

* ''Pipelining'' : In pipelined CPUs a specific function unit of a CPU is partitioned into K smaller parts, called '''pipelined stages'''. When an operation is to be executed, it goes through the K stages of execution. The stages can execute independently and in parallel because they are separate units of hardware. The operation can be broken up into pieces where each stage can work on a particular piece so no part of the CPU is left idle. What happens if two instructions need the same piece of hardware? The CPU can guess which way the instruction will go afterwards and continue on its way. If it later learns that the path it took was incorrect, it can roll back its changes and carry on with the correct path. If the guesses are constantly wrong, the system is slowed down to a crawl. This entire process of 'guessing' is known as '''speculative execution'''.

[[Image:Pipeline.jpg]]

* ''Thread Level Parallelism'' : if we can't get a single instruction stream parallelized, we can just have several separate threads. This means we can share all of the address space information such as cache because the threads will be running in the same process.

* ''Multi core'' : the most current technology in practice. After several years of packing CPUs with more and more features, bottlenecks began to arise. '''Solution?''' Duplicate the CPU! As the industry currently shows, more and more computers are being released with '''dual''' or even '''quad core''' processors. What are some bottlenecks that can arise with '''multi core processors'''? The bus is one obvious answer. If we increase the amount of information the buses can handle, we can accommodate multiple CPUs.

* ''CPU cache'' : Why not get rid of RAM altogether and simply put all of it on the CPU as L1 or L2 cache? Cache memory is known as '''static memory''', whereas RAM is known as '''dynamic memory''' or '''DRAM'''. DRAM can store information using about 1/4 of the amount of transistors as static memory. This is why there is only around 4MB of CPU cache available in most modern day processors. What RAM lacks in speed it makes up in size and parallelism.

The book " [http://books.google.ca/books?id=R7Frpn3g9AEC&dq=&pg=PP1&ots=f0kVd8_wKX&sig=66x6OTPMjfzJneY-mOHyQJOC1iM&prev=http://www.google.ca/search%3Fhl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26hs%3DdDa%26sa%3DX%26oi%3Dspell%26resnum%3D0%26ct%3Dresult%26cd%3D1%26q%3DComputer%2BArchitecture%2BHennessy%2BPAtterson%26spell%3D1&sa=X&oi=print&ct=title#PPP1,M1 Computer Architecture : A Quantitative Approach] " by Hennessy and Patterson is an excellent book on CPU Architecture.

What does all of this have to do with Operating Systems? If the OS does not understand resources (underlying hardware), then the system is doomed. We have to design the OS around the architecture (especially important with multi-core systems).

===Latency & Bandwidth===

Imagine the following situation : We have two facilities, A & B. A very large amount of information has to be transfered between the locations. We can do it in one of two ways :

* Load up a semi-truck full of hard drives that are full of the information needed.
* Connect the locations together via fiber optical cable to transfer the information.

'''Q''' : What method will allow the most information to be transported?

'''A''' : Obviously the truck full of hard drives, especially with modern day drives being capable of storing up to a Tera-bye of information. The amount of information to be transfered is known as '''bandwidth'''.

'''Q''' : What is the quickest way to get one single bit of information from location B to location A?

'''A''' : This time, the answer is the fiber optic cable. How quickly information is transfered is known as '''latency'''.

[[Image:Hds.jpg]]

The above situation is also apparent when dealing with RAM. RAM is known as high-latency, high bandwidth. Moving large amounts of data to the CPU can sometimes be sped up using the cache. This issue is a very commonly studied topic in the world of RAM - CPU connections.

Other devices and their latency :

* CPU registers : lowest latency
* L2 & L1 Cache : low latency
* Hard disk : highest latency

====I/O====

There are several different ways '''Input/Output''' devices can communication to the CPU and RAM :

* ''Polled I/O'' : Think of it as giving a person a task. You can poke the person every 5 minutes to see if they have completed the task. If the person has finished the task, they have to respond with a 'yes', otherwise 'no'. In this scenario, one party has to poke every 5 seconds and another party has to respond every 5 seconds as they get poked. Obviously, not the most efficient solution.

* ''Interrupt I/O'' : In this situation, when giving the person the task, we simply ask them to tell us when they have completed it. Much more efficient than polling.

If we have a '''small''' amount of data being processed by the I/O device, then our system runs relatively efficient. If the data is a large in size, the communication involved between the I/O device, the CPU and the RAM is frequent. There are two solutions to this problem:

* ''Memory Mapped I/O'' : devices are associated with logical primary memory addresses rather than having a specialized device address table.

* ''Direct Memory Access (DMA)'' is when devices and their controllers are able to read and write information directly from the primary memory without any CPU getting in the way. The DMA contains the same algorithms the CPU uses, except conceptually it does not need any registers. This direct access can increase the speed significantly.

* ''Intelligent Devices'' : a final alternative is to have memory and chip sets that act in a way the CPU does on the I/O device itself. This means we do not need to talk to the CPU or RAM at all (it can all be done on the device itself).

===SIMD Processors===

'''Single Instruction Multiple Data (SIMD)''' processors handle one instruction accompanied with an entire array of data. This type of architecture is quite efficient at processing large amounts of data that all need to be operated on using the same instruction. Processing speeds are increased significantly because the processor does not have to decode every single introduction (since all pieces of data are only using one single instruction). Multimedia devices like MP3 players and gaming consoles all take advantage of this technology, these machines are also known as vector machines.

The Sony PS3 has one MIMD (multiple instruction, multiple data) processor and several SIMD processors. Compilers need to be adjusted and game programmers need to adapt their writing skills in favor of this new technology. Someone eventually has to slave over a library of machine language instructions in order to optimize the particular application to SIMD technology (not the most exciting task).

===Boot Process===

How is the OS started? How does the computer go from powering on to running the Operating Systems (Windows, OSX ...). This process takes a really long time when you consider how fast computers have become, '''why''' is this? This occurs because the system must determine all of the hardware present in the machine, each hardware's manufacturer, specifications, requirements etc. It can become a real mess! Before the OS can even begin to start, all of the hardware in the system must be properly configured. This all begins with the BIOS.

====BIOS====

'''BIOS (basic input/output system)''' is the code that runs before any other code runs. It is built into the system. In older times, the BIOS was burned into the machine and never changed. Nowadays, it can be flashed and reprogrammed. This can occur when bugs in the BIOS are so troublesome, that an update is needed (typically when the OS does not work well with the hardware). The BIOS configures all the hardware by running diagnostics etc. After hardware is configured, how is the OS launched? This process is not direct, mainly because the BIOS has limited capabilities because of legacy issues. The BIOS is configured in a way that it believes your computer's hardware is reflective of the year 1982. This means it believes you only have several megabytes of RAM, an 8086 processor, limited disk space and so on. It does however, know how to load information of up to 512 bytes. This is where bootstrapping comes into play.

====Bootstrapping====

After hardware is configured, the BIOS runs a small amount of code (approx 400 bytes, while the remaining amount is used to store partition tables) known as the '''boot block'''. The BIOS loads this algorithm into memory where it is executed. The '''boot block''' is just enough code to load another algorithm that makes the machine reflect more of what it really is (not an 8086). From here we can initialize RAM, hard disk and eventually start the OS.

Macintosh computers, run a next generation, more sophisticated BIOS called '''EFI'''. The EFI allows the computer to go from BIOS to OS startup more easily and direct. This is because the EFI recognizes the hardware of the computer to what it truly is. Older macs use '''open boot''', which is a BIOS that allows the user to enter commands via a prompt.

Computer Organization

2007-10-01T02:07:25Z

Bishopdesmond33:

==Computer Organization==

===Introduction===

* Information has been posted on the [http://homeostasis.scs.carleton.ca/os/index.php/Main_Page class website] regarding running a Linux distribution on your own home machine (or a virtual desktop environment). Students are encouraged to familiarize themselves with the linux/unix environment as it is far less restricted and more robust than Windows/OSX.
* Our research paper (ie. Operating Systems) should be well on its way. Students should have key words and topics narrowed down.

===Stored Program Computer===

The concept of a stored program computer began in the early 1800s with Charles Babbage and his "''Difference Engine''". Since then it has evolved into what we now consider '''Von Neumann Architecture'''. Although Von Neumann was only a piece of the puzzle of modern day computer architecture, we still refer to the architecture as being named after him. In more recent times, the overall Von Neumann model has changed, but conceptually, it is still the same. Several different components talk to each other (via buses) all at the same time - this scenario can be very complicated at times.

[[Image:Neumann.jpg]]

===Central Processing Unit===

The '''central processing unit''' (CPU) is made up of an Arithmetic Logic Unit (ALU) and Control Unit (CU).

====Arithmetic Logic Unit====

The '''ALU''' can perform all arithmetic (addition, subtraction, multiplication, division) and logical (AND, OR, NOT ... ) calculations at a very rapid rate.

====Control Unit====

The '''CU''' is made up of several components and duties :

* ''Program counter'' or PC is the address of the currently executing instruction.
* ''Status word'' is used to store information regarding the current state of the control unit. Information such as overflows and execution status can be stored. A word is typically what the processor natively deals with in terms of data size. Most modern processors are 32 or 64 - bits. That is, their registers store data segments that are 32 or 64 bits wide - which we collectively call a word. The size of the word will vary from CPU to CPU.
* Responsible for the ''fetch-decode cycle'' which is an algorithm that grabs information from main memory, reads it, decodes it and then hands it off to the specific part of the processor to handle the instruction (oftentimes in involving the ALU).

The above description of the control unit is very simplified. It is actually a reflection of a CPU in the early to mid 1980s. For instance, '''Moore's Law''' states that that every 18 months, the number of transistors that sit on a CPU chip will double. This "law" was introduced in the 1960s and currently still holds true. '''Why?''' Mainly because the industry targets themselves towards this figure. Given '''Moore's Law''', you can imagine how much CPU architecture has changed since the 1980s.

====Transistors====

With all of these extra transistors, what can we do with all of them? One (or both) of two things :

=====Integration=====

'''Integration''' refers to more and more components getting packed into the CPU. Devices such as :

* ''Memory Controller'' (circuitry needed to drive the RAM)
* ''Math Co-Processor'' (handles floating point values)
* ''Memory'' Management Unit (MMU - used for virtualization)
* ''L1 & L2 Cache'' (used for fast memory access when we don't want to go all the way to main memory. L1 cache is small but fast. L2 cache is large but slower).
* ''Systems on a Chip'' : systems that have all transistors incorporated into a single chip. These currently exist, but are fairly slow in regards to overall performance.

=====Duplication=====

'''Why have one ALU when we can have 10?''' Modern processors use this technology quite a bit. Certain components can only be duplicated so much. '''Why?''' Aside from hardware constraints, we also have bottlenecks. For instance, a CPU has only one PC (program counter). What is the point of having several ALUs if the program counter can not keep up?

The most common forms of '''duplication''' include :

* ''Pipelining'' : In pipelined CPUs a specific function unit of a CPU is partitioned into K smaller parts, called '''pipelined stages'''. When an operation is to be executed, it goes through the K stages of execution. The stages can execute independently and in parallel because they are separate units of hardware. The operation can be broken up into pieces where each stage can work on a particular piece so no part of the CPU is left idle. What happens if two instructions need the same piece of hardware? The CPU can guess which way the instruction will go afterwards and continue on its way. If it later learns that the path it took was incorrect, it can roll back its changes and carry on with the correct path. If the guesses are constantly wrong, the system is slowed down to a crawl. This entire process of 'guessing' is known as '''speculative execution'''.

[[Image:Pipeline.jpg]]

* ''Thread Level Parallelism'' : if we can't get a single instruction stream parallelized, we can just have several separate threads. This means we can share all of the address space information such as cache because the threads will be running in the same process.

* ''Multi core'' : the most current technology in practice. After several years of packing CPUs with more and more features, bottlenecks began to arise. '''Solution?''' Duplicate the CPU! As the industry currently shows, more and more computers are being released with '''dual''' or even '''quad core''' processors. What are some bottlenecks that can arise with '''multi core processors'''? The bus is one obvious answer. If we increase the amount of information the buses can handle, we can accommodate multiple CPUs.

* ''CPU cache'' : Why not get rid of RAM altogether and simply put all of it on the CPU as L1 or L2 cache? Cache memory is known as '''static memory''', whereas RAM is known as '''dynamic memory''' or '''DRAM'''. DRAM can store information using about 1/4 of the amount of transistors as static memory. This is why there is only around 4MB of CPU cache available in most modern day processors. What RAM lacks in speed it makes up in size and parallelism.

The book " [http://books.google.ca/books?id=R7Frpn3g9AEC&dq=&pg=PP1&ots=f0kVd8_wKX&sig=66x6OTPMjfzJneY-mOHyQJOC1iM&prev=http://www.google.ca/search%3Fhl%3Den%26client%3Dfirefox-a%26rls%3Dorg.mozilla:en-US:official%26hs%3DdDa%26sa%3DX%26oi%3Dspell%26resnum%3D0%26ct%3Dresult%26cd%3D1%26q%3DComputer%2BArchitecture%2BHennessy%2BPAtterson%26spell%3D1&sa=X&oi=print&ct=title#PPP1,M1 Computer Architecture : A Quantitative Approach] " by Hennessy and Patterson is an excellent book on CPU Architecture.

What does all of this have to do with Operating Systems? If the OS does not understand resources (underlying hardware), then the system is doomed. We have to design the OS around the architecture (especially important with multi-core systems).

===Latency & Bandwidth===

Imagine the following situation : We have two facilities, A & B. A very large amount of information has to be transfered between the locations. We can do it in one of two ways :

* Load up a semi-truck full of hard drives that are full of the information needed.
* Connect the locations together via fiber optical cable to transfer the information.

'''Q''' : What method will allow the most information to be transported?

'''A''' : Obviously the truck full of hard drives, especially with modern day drives being capable of storing up to a Tera-bye of information. The amount of information to be transfered is known as '''bandwidth'''.

'''Q''' : What is the quickest way to get one single bit of information from location B to location A?

'''A''' : This time, the answer s the fiber optic cable. How quickly information is transfered is known as '''latency'''.

[[Image:Hds.jpg]]

The above situation is also apparent when dealing with RAM. RAM is known as high-latency, high bandwidth. Moving large amounts of data to the CPU can sometimes be sped up using the cache. This issue is a very commonly studied topic in the world of RAM - CPU connections.

Other devices and their latency :

* CPU registers : lowest latency
* L2 & L1 Cache : low latency
* Hard disk : highest latency

====I/O====

There are several different ways '''Input/Output''' devices can communication to the CPU and RAM :

* ''Polled I/O'' : Think of it as giving a person a task. You can poke the person every 5 minutes to see if they have completed the task. If the person has finished the task, they have to respond with a 'yes', otherwise 'no'. In this scenario, one party has to poke every 5 seconds and another party has to respond every 5 seconds as they get poked. Obviously, not the most efficient solution.

* ''Interrupt I/O'' : In this situation, when giving the person the task, we simply ask them to tell us when they have completed it. Much more efficient than polling.

If we have a '''small''' amount of data being processed by the I/O device, then our system runs relatively efficient. If the data is a large in size, the communication involved between the I/O device, the CPU and the RAM is frequent. There are two solutions to this problem:

* ''Memory Mapped I/O'' : devices are associated with logical primary memory addresses rather than having a specialized device address table.

* ''Direct Memory Access (DMA)'' is when devices and their controllers are able to read and write information directly from the primary memory without any CPU getting in the way. The DMA contains the same algorithms the CPU uses, except conceptually it does not need any registers. This direct access can increase the speed significantly.

* ''Intelligent Devices'' : a final alternative is to have memory and chip sets that act in a way the CPU does on the I/O device itself. This means we do not need to talk to the CPU or RAM at all (it can all be done on the device itself).

===SIMD Processors===

'''Single Instruction Multiple Data (SIMD)''' processors handle one instruction accompanied with an entire array of data. This type of architecture is quite efficient at processing large amounts of data that all need to be operated on using the same instruction. Processing speeds are increased significantly because the processor does not have to decode every single introduction (since all pieces of data are only using one single instruction). Multimedia devices like MP3 players and gaming consoles all take advantage of this technology, these machines are also known as vector machines.

The Sony PS3 has one MIMD (multiple instruction, multiple data) processor and several SIMD processors. Compilers need to be adjusted and game programmers need to adapt their writing skills in favor of this new technology. Someone eventually has to slave over a library of machine language instructions in order to optimize the particular application to SIMD technology (not the most exciting task).

===Boot Process===

How is the OS started? How does the computer go from powering on to running the Operating Systems (Windows, OSX ...). This process takes a really long time when you consider how fast computers have become, '''why''' is this? This occurs because the system must determine all of the hardware present in the machine, each hardware's manufacturer, specifications, requirements etc. It can become a real mess! Before the OS can even begin to start, all of the hardware in the system must be properly configured. This all begins with the BIOS.

====BIOS====

'''BIOS (basic input/output system)''' is the code that runs before any other code runs. It is built into the system. In older times, the BIOS was burned into the machine and never changed. Nowadays, it can be flashed and reprogrammed. This can occur when bugs in the BIOS are so troublesome, that an update is needed (typically when the OS does not work well with the hardware). The BIOS configures all the hardware by running diagnostics etc. After hardware is configured, how is the OS launched? This process is not direct, mainly because the BIOS has limited capabilities because of legacy issues. The BIOS is configured in a way that it believes your computer's hardware is reflective of the year 1982. This means it believes you only have several megabytes of RAM, an 8086 processor, limited disk space and so on. It does however, know how to load information of up to 512 bytes. This is where bootstrapping comes into play.

====Bootstrapping====

After hardware is configured, the BIOS runs a small amount of code (approx 400 bytes, while the remaining amount is used to store partition tables) known as the '''boot block'''. The BIOS loads this algorithm into memory where it is executed. The '''boot block''' is just enough code to load another algorithm that makes the machine reflect more of what it really is (not an 8086). From here we can initialize RAM, hard disk and eventually start the OS.

Macintosh computers, run a next generation, more sophisticated BIOS called '''EFI'''. The EFI allows the computer to go from BIOS to OS startup more easily and direct. This is because the EFI recognizes the hardware of the computer to what it truly is. Older macs use '''open boot''', which is a BIOS that allows the user to enter commands via a prompt.