Soma-notes - User contributions [en]

COMP 3000 Essay 2 2010 Question 6

2010-12-02T14:56:58Z

Abondio2: /* Background Concepts */

=Paper=
'''Effective Data-Race Detection for the Kernel'''

Paper: http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf

Video: http://homeostasis.scs.carleton.ca/osdi/video/erickson.mp4

Authors: John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk from Microsoft Research

=Background Concepts=

A data race is a potentially catastrophic event which can be alarmingly common in modern concurrent systems. When two threads access the same memory location at the same same time, and at least one of those accesses is a write operation, there exists a potential data race condition. If the race is not handled properly, it could have a wide range of negative consequences. In the best case, there might be corruption rendering the affected data unreadable; this may not be a major problem if there exist archived, non-corrupted versions of the data. In the worst case, a process (possibly even the kernel itself) may freak out and crash, unable to decide what to do about the unexpected input it receives.

Traditional dynamic data-race detection programs operate by running an isolated runtime and comparing it with the currently active runtime, to find situations that would have resulted in a data race if the runtimes were not isolated. DataCollider operates by temporarily setting up breakpoints at random memory access instances. If a certain memory access hits a breakpoint, DataCollider springs into action. The breakpoint causes the memory access instruction to be postponed, and so the instruction pretty much goes to sleep until DataCollider has finished its job. The job is like taking before and after photographs of something; DataCollider records the data stored at the address the instruction was attempting to access, then allows the instruction to execute. Then DataCollider records the data again. If the before and after records do not match, then another thread has tampered with the data at the same time that this instruction was trying to read it; this is precisely the definition of a data race.

Most existing data race detectors use static detection techniques. These involve analysing program source code to determine where simultaneous accesses occur. This method is typically seen as less effective because it produces a warning every time synchronous accesses occur; the program then has to sort out all the false warnings from the legitimate error reports. The problem is that there are no heuristics that can consistently eliminate the false warnings without also eliminating some of the legitimate reports. DataCollider uses a dynamic detection technique, which involves analysing program output and recognizing anomalous data accesses. Dynamic detectors also produce false warnings, but not nearly as often as static detectors.

=Research problem=
What is the research problem being addressed by the paper? How does this problem relate to past related work?

The research problem being addressed by this paper is the detection of erroneous data races inside the kernel without creating much overhead. This problem occurs because read/write access instructions in processes are not always atomic (e.g two read/write commands may happen simultaneously). There are so many ways a data race error may occur that it is very hard to catch them all.

The research team’s program DataCollider needs to detect errors between the hardware and kernel as well as errors in context thread synchronization in the kernel which must synchronize between user-mode processes, interrupts and deferred procedure calls. As shown in the Background Concepts section, this error can create unwanted problems in kernel modules. The research group created DataCollider which puts breakpoints in memory accesses to check if two system calls are calling the same piece of memory. There have been attempts at a solution to this problem in the past that ran in user-mode, but not in kernel mode, and they produced excessive overhead. There are many problems with trying to apply these techniques to a kernel.

One technique that some detectors in the past have used is the “happens before” method. This checks whether one access happened before another or if the other happened first, and if neither of those options were the case, the two accesses were done simultaneously. This method gathers true data race errors but is very hard to implement.

Another method used is the “lock-set” approach. This method checks all of the locks that are held currently by a thread, and if all the accesses do not have at least one common lock, the method sends a warning. This method has many false alarms since many variables nowadays are shared using other ways than locks or have very complex locking systems that lockset cannot understand.

Both these methods produce excessive overhead due to the fact that they have to check every single memory call at runtime. In the next section we will discuss how DataCollider uses a new way to check for data race errors, that produces barely any overhead.

=Contribution=
What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Proving that there is a problem with classic race detectors: 
The main contribution that DataCollider provides is the unique idea of using hardware breakpoints in a data race detector. The question is why is a unique idea necessary. Why does DataCollider have to "reinvent the wheel". There has been a plethora of race condition testers invented in the last two decades, and almost all of the dynamic data race detectors can be lumped into three categories. They either implement lock-set, happens-before, or a hybrid of the two types of detection. The research team for DataCollider looked at several of these implementations of race condition testers to find ways of improving their own program, and found that there are major problems in the classic ways of detecting race conditions. 

Some of the programs that were referenced were: 

* Eraser: A Dynamic Data Race Detector for Multithreaded Programs 
* RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking 
* PACER: Proportional Detection of Data Races 
* LiteRace: Effective Sampling for Lightweight Data-Race Detection 
* MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs 

Eraser: A Dynamic Data Race Detector for Multithreaded Programs[http://delivery.acm.org/10.1145/270000/265927/p391-savage.pdf?key1=265927&key2=7323721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437]
 
lock-set based reasoning 

Eraser, a data race detector programmed in 1997, was one of the earlier data race detectors invented. It may have been a useful and revolutionary program of its time, however, it uses very low level techniques compared to most data race detectors today. One of the reason why it is unsuccessful is because it only checks whether memory accesses use proper locking techniques. If a memory access is found that does not use a lock, then Eraser will report a data race. In many cases, the misuse of proper locking techniques is a conscious decision by the programmer, so Eraser will report many false positives. Modern locking systems are also very complicated and have several different kinds of locks for different situations. It is difficult for one program to handle upwards of 12 types of locks, especially when they are very complicated. This does not take into account all of the benign problems such as date of access variables. Locking systems are notorious for reporting false positives such as this, and it is near impossible to change the architecture of the algorithm to ignore benign cases. 

PACER: Proportional Detection of Data Races[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
Pacer, a happens-before data race detector, uses the FastTrack algorithm to detect data races. FastTrack uses vector-clocks to keep track of two potentially conflicting threads. If the two threads conflict, a data race is thrown, and the state of the program is saved. Pacer samples a percentage of each memory access, (from 1 to 3 percent) and runs the FastTrack algorithm on each thread that accesses that part of memory. Similar to Pacer, DataCollider samples a percentage of the program's memory accesses, but instead of using vector-clocks to catch the second thread, hardware breakpoints are used. Pacer runs with an overhead of approximately one to three times the speed of the original program because it requires a fair amount of processing power to maintain the vector-clocks. Hardware break points are considerably faster than vector-clocks, and as a consequence, DataCollider runs with less overhead than Pacer. 

LiteRace: Effective Sampling for Lightweight Data-Race Detection[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
LiteRace, similar to Pacer, samples a percentage of memory accesses from a program. Where it differs is the parts of memory that LiteRace samples the most. The "hot spot" regions of memory are ones that are accessed most by the program. Since they are accessed the most, chances are that they have already been successfully debugged, or if there are data races there, they are benign. LiteRace detects these areas in memory as hot spots, and samples them at a much lower rate. This improves LiteRace's chances of capturing a valid data race at a much lower sampling rate. Where DataCollider bests LiteRace is based on LiteRace's installing mechanism. LiteRace needs to be recompiled into the software it is trying to debug, whereas DataColleder's breakpoints do not require any code changes to the program. This is a major success for DataCollider because often third party testers do not have the source code for a program. 

RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Trackings[http://delivery.acm.org/10.1145/1100000/1095832/p221-yu.pdf?key1=1095832&key2=8433721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437] 
combination of lock-set and happens-before reasoning 

RaceTrack uses a unique technique in order to detect data races. The program being debugged is run on top of RaceTrack as a virtual machine using the .NET framework, and it will examine all of the memory accesses that the program requests. As soon as suspicious behavior is exhibited, a warning is sent off to be later evaluated when the program terminates. RaceTrack uses this technique because several process intensive inspections of the state of the machine must be checked, and doing this on the fly is expensive. There are many problems with RaceTrack. It is very successful at detecting a vast percentage of data races, however, it has a high overhead and requires extreme amounts of memory. RaceTrack must save the state of the entire machine every time a warning is produced, and it also has to save each threads memory accesses to check which memory access "happened before". Since most warnings thrown are found to be benign, saving the state of the machine wastes computational power and memory. Long running programs also prove to be a problem, where the computer being debugged will run out of memory to store all of the warning states before the program terminates. It then will have to either increase overhead significantly to store the warnings on disk, or it will have to delete some warnings to make room for new ones. 

MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs[http://docs.google.com/viewer?a=v&q=cache:C8gWk-H3GmEJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.73.9551%26rep%3Drep1%26type%3Dpdf+MultiRace:+Efficient+on-the-fly+data+race+detection+in+multithreaded+C%2B%2B+programs&hl=en&gl=ca&pid=bl&srcid=ADGEESj1jYlzXMOwgbh7SVntUsHxVeI1TvmkU8Oslkm-L9gq-NIyglj5eD48rtkcziUQUynmjOmZojsyzw_tBRiLN6T0n6iiDZyUiFjBUfLijQbzNsRpDQCsMpn-xTiIqK2PUj4DXwoM&sig=AHIEtbRBHpMvb5fel3XOi5oASAogumY-rg] 
combination of lock-set and happens-before reasoning 

MultiRace is another hybrid style race condition debugger that uses two unique algorithms. The first algorithm, Djit is the happens-before iteration, which INSERT STUFF HERE. The second is an improved iteration of the lock-set algorithm. MultiRace is the most similar program to DataCollider in terms of their goals. Both strive to decrease overhead to near standard running times of the program itself, and to increase the program transparency for maximum user compatibility. MultiRace itself is several orders of magnitude more complicated than DataCollider, but since MultiRace hides its complexity from the user with transparency, it is still simple to use. It is arguable that MultiRace is superior for detecting races for C++ programs, however, MultiRace is not compatible with any other programming language. Since DataCollider uses hardware breakpoints, the coding language of the program is irrelevant. Also, since DataCollider avoids using both lock-set and happens before algorithms, it is versatile enough to even debug kernels. 

DataCollider is a very unique program. Most other dynamic race condition testers can be lumped into the three groups lock-set, happens-before, or hybrid. DataCollider, however, recognizes the errors of these styles of detection, and manages to avoid them completely. Even though there are issues with false positives and benign races, DataCollider provides very simple, versatile, and lightweight functionality in debugging a program. Future programs may take this unique style of race detection and add their own functionality to improve upon it. It could be that DataCollider could inspire a ground breaking solution to race conditions and how to detect them.

=Critique=

===Style===
This paper is well put together. It has a strong flow and there is nothing that seems out of place. The authors start with an introduction and then immediately identify key definitions that are used throughout the paper. In the second section which follows the introduction the authors identify the definition of a Data-Race as it relates to their paper. This is important since it is a key concept that is required to understand the entire paper. This definition is required because as the authors state there is no standard for exactly how to define a data-race.[1] In addition to important definitions any background information that is relevant to this paper is presented at the beginning. The key idea which the paper is based on in this case Data Collider and its implementation is explained. An evaluation and conclusion of Data Collider follow its description. The order of the sections makes sense and the author is not jumping around from one concept to another. The organization of the sections and information provided make the paper easy to follow and understand.

===Content===
=====Data Collider:=====
DataCollider seems like a very innovative piece of software. It’s new use of breakpoints inside kernel-space instead of lock-set or happens-before methods in user-mode let it check data race errors in the very kernel itself without producing as much overhead as its old contenders (it even finds data races for overheads less than five percent). One thing to note about DataCollider is that ninety percent of its output to the user is false alarms. This means that after running DataCollider, the user has to sift through all of the gathered data to find the ten percent of data that actually contains real data race errors.[1] The team of creator’s were able to create a to sort through all of the material it collects to only spit out the valuable information, but the creators still found some false alarms in the output . They have noted though that some users like to see the benign reports so that they can make design changes to their programs to make them more portable and scalable and therefore decided not to implement this. Even though DataCollider returns 90% false alarms the projects team have still been able to locate 25 errors in the Windows operating system. Of those 25 errors 12 have already been fixed.[1] This shows that DataCollider is an effective tool in locating data race errors within the kernel effectively enough that they can be corrected.

The overhead of any application running is very important to all users. The developers of DataCollider ran various tests to determine the overhead of running DataCollider based on the number of breakpoints. These results were included in the final paper. DataCollider has a low overall base overhead and it is only after 1000 breakpoints a second does the run time overhead increase drastically.[1] This adds to the effectiveness of DataCollider. Having a low overhead is very important to use of an application.

=References=
[1] Erickson, Musuvathi, Burchhardt, Olynyk, Effective Data-Race Detection for the Kernel, Microsoft Research, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf PDF]

COMP 3000 Essay 2 2010 Question 6

2010-12-02T14:51:23Z

Abondio2: /* Background Concepts */

=Paper=
'''Effective Data-Race Detection for the Kernel'''

Paper: http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf

Video: http://homeostasis.scs.carleton.ca/osdi/video/erickson.mp4

Authors: John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk from Microsoft Research

=Background Concepts=

A data race is a potentially catastrophic event which can be alarmingly common in modern concurrent systems. When one thread attempts to read or write on a memory location at the same time that another thread is writing on the same location, there exists a potential data race condition. If the race is not handled properly, it could have a wide range of negative consequences. In the best case, there might be data corruption rendering the affected files unreadable and useless; this may not be a major problem if there exist archived, non-corrupted versions of the data. In the worst case, a process (possibly even the operating system itself) may freak out and crash, unable to decide what to do about the unexpected input it receives.

Traditional data-race detection programs operate by running an isolated runtime and comparing it with the currently active runtime, to find situations that would have resulted in a data race if the runtimes were not isolated. DataCollider operates by temporarily setting up breakpoints at random memory access instances. If a certain memory access hits a breakpoint, DataCollider springs into action. The breakpoint causes the memory access instruction to be postponed, and so the instruction pretty much goes to sleep until DataCollider has finished its job. The job is like taking a before and after photograph of something; DataCollider records the data stored at the address the instruction was attempting to access, then allows the instruction to execute. Then DataCollider records the data again. If the before and after records do not match, then another thread has tampered with the data at the same time that this instruction was trying to read it; this is precisely the definition of a data race.

Most existing data race detectors use static detection techniques. These involve analysing program source code to determine where simultaneous accesses occur. This method is typically seen as less effective because it produces a warning every time synchronous accesses occur; the program then has to sort out all the false warnings from the legitimate error reports. The problem is that there are no heuristics that can consistently eliminate the false warnings without also eliminating some of the legitimate reports. DataCollider uses a dynamic detection technique, which involves analysing program output and recognizing anomalous data accesses. Dynamic detectors also produce false warnings, but not nearly as often as static detectors.

=Research problem=
What is the research problem being addressed by the paper? How does this problem relate to past related work?

The research problem being addressed by this paper is the detection of erroneous data races inside the kernel without creating much overhead. This problem occurs because read/write access instructions in processes are not always atomic (e.g two read/write commands may happen simultaneously). There are so many ways a data race error may occur that it is very hard to catch them all.

The research team’s program DataCollider needs to detect errors between the hardware and kernel as well as errors in context thread synchronization in the kernel which must synchronize between user-mode processes, interrupts and deferred procedure calls. As shown in the Background Concepts section, this error can create unwanted problems in kernel modules. The research group created DataCollider which puts breakpoints in memory accesses to check if two system calls are calling the same piece of memory. There have been attempts at a solution to this problem in the past that ran in user-mode, but not in kernel mode, and they produced excessive overhead. There are many problems with trying to apply these techniques to a kernel.

One technique that some detectors in the past have used is the “happens before” method. This checks whether one access happened before another or if the other happened first, and if neither of those options were the case, the two accesses were done simultaneously. This method gathers true data race errors but is very hard to implement.

Another method used is the “lock-set” approach. This method checks all of the locks that are held currently by a thread, and if all the accesses do not have at least one common lock, the method sends a warning. This method has many false alarms since many variables nowadays are shared using other ways than locks or have very complex locking systems that lockset cannot understand.

Both these methods produce excessive overhead due to the fact that they have to check every single memory call at runtime. In the next section we will discuss how DataCollider uses a new way to check for data race errors, that produces barely any overhead.

=Contribution=
What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Proving that there is a problem with classic race detectors: 
The main contribution that DataCollider provides is the unique idea of using hardware breakpoints in a data race detector. The question is why is a unique idea necessary. Why does DataCollider have to "reinvent the wheel". There has been a plethora of race condition testers invented in the last two decades, and almost all of the dynamic data race detectors can be lumped into three categories. They either implement lock-set, happens-before, or a hybrid of the two types of detection. The research team for DataCollider looked at several of these implementations of race condition testers to find ways of improving their own program, and found that there are major problems in the classic ways of detecting race conditions. 

Some of the programs that were referenced were: 

* Eraser: A Dynamic Data Race Detector for Multithreaded Programs 
* RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking 
* PACER: Proportional Detection of Data Races 
* LiteRace: Effective Sampling for Lightweight Data-Race Detection 
* MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs 

Eraser: A Dynamic Data Race Detector for Multithreaded Programs[http://delivery.acm.org/10.1145/270000/265927/p391-savage.pdf?key1=265927&key2=7323721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437]
 
lock-set based reasoning 

Eraser, a data race detector programmed in 1997, was one of the earlier data race detectors invented. It may have been a useful and revolutionary program of its time, however, it uses very low level techniques compared to most data race detectors today. One of the reason why it is unsuccessful is because it only checks whether memory accesses use proper locking techniques. If a memory access is found that does not use a lock, then Eraser will report a data race. In many cases, the misuse of proper locking techniques is a conscious decision by the programmer, so Eraser will report many false positives. Modern locking systems are also very complicated and have several different kinds of locks for different situations. It is difficult for one program to handle upwards of 12 types of locks, especially when they are very complicated. This does not take into account all of the benign problems such as date of access variables. Locking systems are notorious for reporting false positives such as this, and it is near impossible to change the architecture of the algorithm to ignore benign cases. 

PACER: Proportional Detection of Data Races[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
Pacer, a happens-before data race detector, uses the FastTrack algorithm to detect data races. FastTrack uses vector-clocks to keep track of two potentially conflicting threads. If the two threads conflict, a data race is thrown, and the state of the program is saved. Pacer samples a percentage of each memory access, (from 1 to 3 percent) and runs the FastTrack algorithm on each thread that accesses that part of memory. Similar to Pacer, DataCollider samples a percentage of the program's memory accesses, but instead of using vector-clocks to catch the second thread, hardware breakpoints are used. Pacer runs with an overhead of approximately one to three times the speed of the original program because it requires a fair amount of processing power to maintain the vector-clocks. Hardware break points are considerably faster than vector-clocks, and as a consequence, DataCollider runs with less overhead than Pacer. 

LiteRace: Effective Sampling for Lightweight Data-Race Detection[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
LiteRace, similar to Pacer, samples a percentage of memory accesses from a program. Where it differs is the parts of memory that LiteRace samples the most. The "hot spot" regions of memory are ones that are accessed most by the program. Since they are accessed the most, chances are that they have already been successfully debugged, or if there are data races there, they are benign. LiteRace detects these areas in memory as hot spots, and samples them at a much lower rate. This improves LiteRace's chances of capturing a valid data race at a much lower sampling rate. Where DataCollider bests LiteRace is based on LiteRace's installing mechanism. LiteRace needs to be recompiled into the software it is trying to debug, whereas DataColleder's breakpoints do not require any code changes to the program. This is a major success for DataCollider because often third party testers do not have the source code for a program. 

RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Trackings[http://delivery.acm.org/10.1145/1100000/1095832/p221-yu.pdf?key1=1095832&key2=8433721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437] 
combination of lock-set and happens-before reasoning 

RaceTrack uses a unique technique in order to detect data races. The program being debugged is run on top of RaceTrack as a virtual machine using the .NET framework, and it will examine all of the memory accesses that the program requests. As soon as suspicious behavior is exhibited, a warning is sent off to be later evaluated when the program terminates. RaceTrack uses this technique because several process intensive inspections of the state of the machine must be checked, and doing this on the fly is expensive. There are many problems with RaceTrack. It is very successful at detecting a vast percentage of data races, however, it has a high overhead and requires extreme amounts of memory. RaceTrack must save the state of the entire machine every time a warning is produced, and it also has to save each threads memory accesses to check which memory access "happened before". Since most warnings thrown are found to be benign, saving the state of the machine wastes computational power and memory. Long running programs also prove to be a problem, where the computer being debugged will run out of memory to store all of the warning states before the program terminates. It then will have to either increase overhead significantly to store the warnings on disk, or it will have to delete some warnings to make room for new ones. 

MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs[http://docs.google.com/viewer?a=v&q=cache:C8gWk-H3GmEJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.73.9551%26rep%3Drep1%26type%3Dpdf+MultiRace:+Efficient+on-the-fly+data+race+detection+in+multithreaded+C%2B%2B+programs&hl=en&gl=ca&pid=bl&srcid=ADGEESj1jYlzXMOwgbh7SVntUsHxVeI1TvmkU8Oslkm-L9gq-NIyglj5eD48rtkcziUQUynmjOmZojsyzw_tBRiLN6T0n6iiDZyUiFjBUfLijQbzNsRpDQCsMpn-xTiIqK2PUj4DXwoM&sig=AHIEtbRBHpMvb5fel3XOi5oASAogumY-rg] 
combination of lock-set and happens-before reasoning 

MultiRace is another hybrid style race condition debugger that uses two unique algorithms. The first algorithm, Djit is the happens-before iteration, which INSERT STUFF HERE. The second is an improved iteration of the lock-set algorithm. MultiRace is the most similar program to DataCollider in terms of their goals. Both strive to decrease overhead to near standard running times of the program itself, and to increase the program transparency for maximum user compatibility. MultiRace itself is several orders of magnitude more complicated than DataCollider, but since MultiRace hides its complexity from the user with transparency, it is still simple to use. It is arguable that MultiRace is superior for detecting races for C++ programs, however, MultiRace is not compatible with any other programming language. Since DataCollider uses hardware breakpoints, the coding language of the program is irrelevant. Also, since DataCollider avoids using both lock-set and happens before algorithms, it is versatile enough to even debug kernels. 

DataCollider is a very unique program. Most other dynamic race condition testers can be lumped into the three groups lock-set, happens-before, or hybrid. DataCollider, however, recognizes the errors of these styles of detection, and manages to avoid them completely. Even though there are issues with false positives and benign races, DataCollider provides very simple, versatile, and lightweight functionality in debugging a program. Future programs may take this unique style of race detection and add their own functionality to improve upon it. It could be that DataCollider could inspire a ground breaking solution to race conditions and how to detect them.

=Critique=

===Style===
This paper is well put together. It has a strong flow and there is nothing that seems out of place. The authors start with an introduction and then immediately identify key definitions that are used throughout the paper. In the second section which follows the introduction the authors identify the definition of a Data-Race as it relates to their paper. This is important since it is a key concept that is required to understand the entire paper. This definition is required because as the authors state there is no standard for exactly how to define a data-race.[1] In addition to important definitions any background information that is relevant to this paper is presented at the beginning. The key idea which the paper is based on in this case Data Collider and its implementation is explained. An evaluation and conclusion of Data Collider follow its description. The order of the sections makes sense and the author is not jumping around from one concept to another. The organization of the sections and information provided make the paper easy to follow and understand.

===Content===
=====Data Collider:=====
DataCollider seems like a very innovative piece of software. It’s new use of breakpoints inside kernel-space instead of lock-set or happens-before methods in user-mode let it check data race errors in the very kernel itself without producing as much overhead as its old contenders (it even finds data races for overheads less than five percent). One thing to note about DataCollider is that ninety percent of its output to the user is false alarms. This means that after running DataCollider, the user has to sift through all of the gathered data to find the ten percent of data that actually contains real data race errors.[1] The team of creator’s were able to create a to sort through all of the material it collects to only spit out the valuable information, but the creators still found some false alarms in the output . They have noted though that some users like to see the benign reports so that they can make design changes to their programs to make them more portable and scalable and therefore decided not to implement this. Even though DataCollider returns 90% false alarms the projects team have still been able to locate 25 errors in the Windows operating system. Of those 25 errors 12 have already been fixed.[1] This shows that DataCollider is an effective tool in locating data race errors within the kernel effectively enough that they can be corrected.

The overhead of any application running is very important to all users. The developers of DataCollider ran various tests to determine the overhead of running DataCollider based on the number of breakpoints. These results were included in the final paper. DataCollider has a low overall base overhead and it is only after 1000 breakpoints a second does the run time overhead increase drastically.[1] This adds to the effectiveness of DataCollider. Having a low overhead is very important to use of an application.

=References=
[1] Erickson, Musuvathi, Burchhardt, Olynyk, Effective Data-Race Detection for the Kernel, Microsoft Research, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf PDF]

COMP 3000 Essay 2 2010 Question 6

2010-12-02T14:47:54Z

Abondio2: /* Background Concepts */

=Paper=
'''Effective Data-Race Detection for the Kernel'''

Paper: http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf

Video: http://homeostasis.scs.carleton.ca/osdi/video/erickson.mp4

Authors: John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk from Microsoft Research

=Background Concepts=

A data race is a potentially catastrophic event which can be alarmingly common in modern concurrent systems. When one thread attempts to read or write on a memory location at the same time that another thread is writing on the same location, there exists a potential data race condition. If the race is not handled properly, it could have a wide range of negative consequences. In the best case, there might be data corruption rendering the affected files unreadable and useless; this may not be a major problem if there exist archived, non-corrupted versions of the data. In the worst case, a process (possibly even the operating system itself) may freak out and crash, unable to decide what to do about the unexpected input it receives.

Traditional data-race detection programs operate by running an isolated runtime and comparing it with the currently active runtime, to find situations that would have resulted in a data race if the runtimes were not isolated. DataCollider operates by temporarily setting up breakpoints at random memory access instances. If a certain memory access hits a breakpoint, DataCollider springs into action. The breakpoint causes the memory access instruction to be postponed, and so the instruction pretty much goes to sleep until DataCollider has finished its job. The job is like taking a before and after photograph of something; DataCollider records the data stored at the address the instruction was attempting to access, then allows the instruction to execute. Then DataCollider records the data again. If the before and after records do not match, then another thread has tampered with the data at the same time that this instruction was trying to read it; this is precisely the definition of a data race.

Most existing data race detectors use static detection techniques. These involve analysing program source code to determine where simultaneous accesses occur. This method is typically seen as less effective because it produces a warning every time synchronous accesses occur; the program then has to sort out all the false warnings from the legitimate error reports. The problem is that there are no heuristics that can consistently eliminate the false warnings without also eliminating some of the legitimate reports. DataCollider uses a dynamic detection technique, which involves analysing program output and recognizing anomalous data accesses. Dynamic detectors also produce false warnings, but not nearly as often as static detectors.

--[[User:Abondio2|Austin Bondio]] 14:47, 2 December 2010 (UTC)

=Research problem=
What is the research problem being addressed by the paper? How does this problem relate to past related work?

The research problem being addressed by this paper is the detection of erroneous data races inside the kernel without creating much overhead. This problem occurs because read/write access instructions in processes are not always atomic (e.g two read/write commands may happen simultaneously). There are so many ways a data race error may occur that it is very hard to catch them all.

The research team’s program DataCollider needs to detect errors between the hardware and kernel as well as errors in context thread synchronization in the kernel which must synchronize between user-mode processes, interrupts and deferred procedure calls. As shown in the Background Concepts section, this error can create unwanted problems in kernel modules. The research group created DataCollider which puts breakpoints in memory accesses to check if two system calls are calling the same piece of memory. There have been attempts at a solution to this problem in the past that ran in user-mode, but not in kernel mode, and they produced excessive overhead. There are many problems with trying to apply these techniques to a kernel.

One technique that some detectors in the past have used is the “happens before” method. This checks whether one access happened before another or if the other happened first, and if neither of those options were the case, the two accesses were done simultaneously. This method gathers true data race errors but is very hard to implement.

Another method used is the “lock-set” approach. This method checks all of the locks that are held currently by a thread, and if all the accesses do not have at least one common lock, the method sends a warning. This method has many false alarms since many variables nowadays are shared using other ways than locks or have very complex locking systems that lockset cannot understand.

Both these methods produce excessive overhead due to the fact that they have to check every single memory call at runtime. In the next section we will discuss how DataCollider uses a new way to check for data race errors, that produces barely any overhead.

=Contribution=
What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Proving that there is a problem with classic race detectors: 
The main contribution that DataCollider provides is the unique idea of using hardware breakpoints in a data race detector. The question is why is a unique idea necessary. Why does DataCollider have to "reinvent the wheel". There has been a plethora of race condition testers invented in the last two decades, and almost all of the dynamic data race detectors can be lumped into three categories. They either implement lock-set, happens-before, or a hybrid of the two types of detection. The research team for DataCollider looked at several of these implementations of race condition testers to find ways of improving their own program, and found that there are major problems in the classic ways of detecting race conditions. 

Some of the programs that were referenced were: 

* Eraser: A Dynamic Data Race Detector for Multithreaded Programs 
* RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking 
* PACER: Proportional Detection of Data Races 
* LiteRace: Effective Sampling for Lightweight Data-Race Detection 
* MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs 

Eraser: A Dynamic Data Race Detector for Multithreaded Programs[http://delivery.acm.org/10.1145/270000/265927/p391-savage.pdf?key1=265927&key2=7323721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437]
 
lock-set based reasoning 

Eraser, a data race detector programmed in 1997, was one of the earlier data race detectors invented. It may have been a useful and revolutionary program of its time, however, it uses very low level techniques compared to most data race detectors today. One of the reason why it is unsuccessful is because it only checks whether memory accesses use proper locking techniques. If a memory access is found that does not use a lock, then Eraser will report a data race. In many cases, the misuse of proper locking techniques is a conscious decision by the programmer, so Eraser will report many false positives. Modern locking systems are also very complicated and have several different kinds of locks for different situations. It is difficult for one program to handle upwards of 12 types of locks, especially when they are very complicated. This does not take into account all of the benign problems such as date of access variables. Locking systems are notorious for reporting false positives such as this, and it is near impossible to change the architecture of the algorithm to ignore benign cases. 

PACER: Proportional Detection of Data Races[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
Pacer, a happens-before data race detector, uses the FastTrack algorithm to detect data races. FastTrack uses vector-clocks to keep track of two potentially conflicting threads. If the two threads conflict, a data race is thrown, and the state of the program is saved. Pacer samples a percentage of each memory access, (from 1 to 3 percent) and runs the FastTrack algorithm on each thread that accesses that part of memory. Similar to Pacer, DataCollider samples a percentage of the program's memory accesses, but instead of using vector-clocks to catch the second thread, hardware breakpoints are used. Pacer runs with an overhead of approximately one to three times the speed of the original program because it requires a fair amount of processing power to maintain the vector-clocks. Hardware break points are considerably faster than vector-clocks, and as a consequence, DataCollider runs with less overhead than Pacer. 

LiteRace: Effective Sampling for Lightweight Data-Race Detection[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
LiteRace, similar to Pacer, samples a percentage of memory accesses from a program. Where it differs is the parts of memory that LiteRace samples the most. The "hot spot" regions of memory are ones that are accessed most by the program. Since they are accessed the most, chances are that they have already been successfully debugged, or if there are data races there, they are benign. LiteRace detects these areas in memory as hot spots, and samples them at a much lower rate. This improves LiteRace's chances of capturing a valid data race at a much lower sampling rate. Where DataCollider bests LiteRace is based on LiteRace's installing mechanism. LiteRace needs to be recompiled into the software it is trying to debug, whereas DataColleder's breakpoints do not require any code changes to the program. This is a major success for DataCollider because often third party testers do not have the source code for a program. 

RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Trackings[http://delivery.acm.org/10.1145/1100000/1095832/p221-yu.pdf?key1=1095832&key2=8433721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437] 
combination of lock-set and happens-before reasoning 

RaceTrack uses a unique technique in order to detect data races. The program being debugged is run on top of RaceTrack as a virtual machine using the .NET framework, and it will examine all of the memory accesses that the program requests. As soon as suspicious behavior is exhibited, a warning is sent off to be later evaluated when the program terminates. RaceTrack uses this technique because several process intensive inspections of the state of the machine must be checked, and doing this on the fly is expensive. There are many problems with RaceTrack. It is very successful at detecting a vast percentage of data races, however, it has a high overhead and requires extreme amounts of memory. RaceTrack must save the state of the entire machine every time a warning is produced, and it also has to save each threads memory accesses to check which memory access "happened before". Since most warnings thrown are found to be benign, saving the state of the machine wastes computational power and memory. Long running programs also prove to be a problem, where the computer being debugged will run out of memory to store all of the warning states before the program terminates. It then will have to either increase overhead significantly to store the warnings on disk, or it will have to delete some warnings to make room for new ones. 

MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs[http://docs.google.com/viewer?a=v&q=cache:C8gWk-H3GmEJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.73.9551%26rep%3Drep1%26type%3Dpdf+MultiRace:+Efficient+on-the-fly+data+race+detection+in+multithreaded+C%2B%2B+programs&hl=en&gl=ca&pid=bl&srcid=ADGEESj1jYlzXMOwgbh7SVntUsHxVeI1TvmkU8Oslkm-L9gq-NIyglj5eD48rtkcziUQUynmjOmZojsyzw_tBRiLN6T0n6iiDZyUiFjBUfLijQbzNsRpDQCsMpn-xTiIqK2PUj4DXwoM&sig=AHIEtbRBHpMvb5fel3XOi5oASAogumY-rg] 
combination of lock-set and happens-before reasoning 

MultiRace is another hybrid style race condition debugger that uses two unique algorithms. The first algorithm, Djit is the happens-before iteration, which INSERT STUFF HERE. The second is an improved iteration of the lock-set algorithm. MultiRace is the most similar program to DataCollider in terms of their goals. Both strive to decrease overhead to near standard running times of the program itself, and to increase the program transparency for maximum user compatibility. MultiRace itself is several orders of magnitude more complicated than DataCollider, but since MultiRace hides its complexity from the user with transparency, it is still simple to use. It is arguable that MultiRace is superior for detecting races for C++ programs, however, MultiRace is not compatible with any other programming language. Since DataCollider uses hardware breakpoints, the coding language of the program is irrelevant. Also, since DataCollider avoids using both lock-set and happens before algorithms, it is versatile enough to even debug kernels. 

DataCollider is a very unique program. Most other dynamic race condition testers can be lumped into the three groups lock-set, happens-before, or hybrid. DataCollider, however, recognizes the errors of these styles of detection, and manages to avoid them completely. Even though there are issues with false positives and benign races, DataCollider provides very simple, versatile, and lightweight functionality in debugging a program. Future programs may take this unique style of race detection and add their own functionality to improve upon it. It could be that DataCollider could inspire a ground breaking solution to race conditions and how to detect them.

=Critique=

===Style===
This paper is well put together. It has a strong flow and there is nothing that seems out of place. The authors start with an introduction and then immediately identify key definitions that are used throughout the paper. In the second section which follows the introduction the authors identify the definition of a Data-Race as it relates to their paper. This is important since it is a key concept that is required to understand the entire paper. This definition is required because as the authors state there is no standard for exactly how to define a data-race.[1] In addition to important definitions any background information that is relevant to this paper is presented at the beginning. The key idea which the paper is based on in this case Data Collider and its implementation is explained. An evaluation and conclusion of Data Collider follow its description. The order of the sections makes sense and the author is not jumping around from one concept to another. The organization of the sections and information provided make the paper easy to follow and understand.

===Content===
=====Data Collider:=====
DataCollider seems like a very innovative piece of software. It’s new use of breakpoints inside kernel-space instead of lock-set or happens-before methods in user-mode let it check data race errors in the very kernel itself without producing as much overhead as its old contenders (it even finds data races for overheads less than five percent). One thing to note about DataCollider is that ninety percent of its output to the user is false alarms. This means that after running DataCollider, the user has to sift through all of the gathered data to find the ten percent of data that actually contains real data race errors.[1] The team of creator’s were able to create a to sort through all of the material it collects to only spit out the valuable information, but the creators still found some false alarms in the output . They have noted though that some users like to see the benign reports so that they can make design changes to their programs to make them more portable and scalable and therefore decided not to implement this. Even though DataCollider returns 90% false alarms the projects team have still been able to locate 25 errors in the Windows operating system. Of those 25 errors 12 have already been fixed.[1] This shows that DataCollider is an effective tool in locating data race errors within the kernel effectively enough that they can be corrected.

The overhead of any application running is very important to all users. The developers of DataCollider ran various tests to determine the overhead of running DataCollider based on the number of breakpoints. These results were included in the final paper. DataCollider has a low overall base overhead and it is only after 1000 breakpoints a second does the run time overhead increase drastically.[1] This adds to the effectiveness of DataCollider. Having a low overhead is very important to use of an application.

=References=
[1] Erickson, Musuvathi, Burchhardt, Olynyk, Effective Data-Race Detection for the Kernel, Microsoft Research, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf PDF]

COMP 3000 Essay 2 2010 Question 6

2010-12-02T14:46:09Z

Abondio2: /* Background Concepts */

=Paper=
'''Effective Data-Race Detection for the Kernel'''

Paper: http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf

Video: http://homeostasis.scs.carleton.ca/osdi/video/erickson.mp4

Authors: John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk from Microsoft Research

=Background Concepts=
Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.

A data race is a potentially catastrophic event which can be alarmingly common in modern concurrent systems. When one thread attempts to read or write on a memory location at the same time that another thread is writing on the same location, there exists a potential data race condition. If the race is not handled properly, it could have a wide range of negative consequences. In the best case, there might be data corruption rendering the affected files unreadable and useless; this may not be a major problem if there exist archived, non-corrupted versions of the data. In the worst case, a process (possibly even the operating system itself) may freak out and crash, unable to decide what to do about the unexpected input it receives.

Traditional data-race detection programs operate by running an isolated runtime and comparing it with the currently active runtime, to find situations that would have resulted in a data race if the runtimes were not isolated. DataCollider operates by temporarily setting up breakpoints at random memory access instances. If a certain memory access hits a breakpoint, DataCollider springs into action. The breakpoint causes the memory access instruction to be postponed, and so the instruction pretty much goes to sleep until DataCollider has finished its job. The job is like taking a before and after photograph of something; DataCollider records the data stored at the address the instruction was attempting to access, then allows the instruction to execute. Then DataCollider records the data again. If the before and after records do not match, then another thread has tampered with the data at the same time that this instruction was trying to read it; this is precisely the definition of a data race.

Most existing data race detectors use static detection techniques. These involve analysing program source code to determine where simultaneous accesses occur. This method is typically seen as less effective because it produces a warning every time synchronous accesses occur; the program then has to sort out all the false warnings from the legitimate error reports. The problem is that there are no heuristics that can consistently eliminate the false warnings without also eliminating some of the legitimate reports. DataCollider uses a dynamic detection technique, which involves analysing program output and recognizing anomalous data accesses. Dynamic detectors also produce false warnings, but not nearly as often as static detectors.

[Don't worry guys; that's not all I've got. I'm still working on it.]

--[[User:Abondio2|Austin Bondio]] 01:56, 2 December 2010 (UTC)

=Research problem=
What is the research problem being addressed by the paper? How does this problem relate to past related work?

The research problem being addressed by this paper is the detection of erroneous data races inside the kernel without creating much overhead. This problem occurs because read/write access instructions in processes are not always atomic (e.g two read/write commands may happen simultaneously). There are so many ways a data race error may occur that it is very hard to catch them all.

The research team’s program DataCollider needs to detect errors between the hardware and kernel as well as errors in context thread synchronization in the kernel which must synchronize between user-mode processes, interrupts and deferred procedure calls. As shown in the Background Concepts section, this error can create unwanted problems in kernel modules. The research group created DataCollider which puts breakpoints in memory accesses to check if two system calls are calling the same piece of memory. There have been attempts at a solution to this problem in the past that ran in user-mode, but not in kernel mode, and they produced excessive overhead. There are many problems with trying to apply these techniques to a kernel.

One technique that some detectors in the past have used is the “happens before” method. This checks whether one access happened before another or if the other happened first, and if neither of those options were the case, the two accesses were done simultaneously. This method gathers true data race errors but is very hard to implement.

Another method used is the “lock-set” approach. This method checks all of the locks that are held currently by a thread, and if all the accesses do not have at least one common lock, the method sends a warning. This method has many false alarms since many variables nowadays are shared using other ways than locks or have very complex locking systems that lockset cannot understand.

Both these methods produce excessive overhead due to the fact that they have to check every single memory call at runtime. In the next section we will discuss how DataCollider uses a new way to check for data race errors, that produces barely any overhead.

=Contribution=
What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Proving that there is a problem with classic race detectors: 
The main contribution that DataCollider provides is the unique idea of using hardware breakpoints in a data race detector. The question is why is a unique idea necessary. Why does DataCollider have to "reinvent the wheel". There has been a plethora of race condition testers invented in the last two decades, and almost all of the dynamic data race detectors can be lumped into three categories. They either implement lock-set, happens-before, or a hybrid of the two types of detection. The research team for DataCollider looked at several of these implementations of race condition testers to find ways of improving their own program, and found that there are major problems in the classic ways of detecting race conditions. 

Some of the programs that were referenced were: 

* Eraser: A Dynamic Data Race Detector for Multithreaded Programs 
* RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking 
* PACER: Proportional Detection of Data Races 
* LiteRace: Effective Sampling for Lightweight Data-Race Detection 
* MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs 

Eraser: A Dynamic Data Race Detector for Multithreaded Programs[http://delivery.acm.org/10.1145/270000/265927/p391-savage.pdf?key1=265927&key2=7323721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437]
 
lock-set based reasoning 

Eraser, a data race detector programmed in 1997, was one of the earlier data race detectors invented. It may have been a useful and revolutionary program of its time, however, it uses very low level techniques compared to most data race detectors today. One of the reason why it is unsuccessful is because it only checks whether memory accesses use proper locking techniques. If a memory access is found that does not use a lock, then Eraser will report a data race. In many cases, the misuse of proper locking techniques is a conscious decision by the programmer, so Eraser will report many false positives. Modern locking systems are also very complicated and have several different kinds of locks for different situations. It is difficult for one program to handle upwards of 12 types of locks, especially when they are very complicated. This does not take into account all of the benign problems such as date of access variables. Locking systems are notorious for reporting false positives such as this, and it is near impossible to change the architecture of the algorithm to ignore benign cases. 

PACER: Proportional Detection of Data Races[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
Pacer, a happens-before data race detector, uses the FastTrack algorithm to detect data races. FastTrack uses vector-clocks to keep track of two potentially conflicting threads. If the two threads conflict, a data race is thrown, and the state of the program is saved. Pacer samples a percentage of each memory access, (from 1 to 3 percent) and runs the FastTrack algorithm on each thread that accesses that part of memory. Similar to Pacer, DataCollider samples a percentage of the program's memory accesses, but instead of using vector-clocks to catch the second thread, hardware breakpoints are used. Pacer runs with an overhead of approximately one to three times the speed of the original program because it requires a fair amount of processing power to maintain the vector-clocks. Hardware break points are considerably faster than vector-clocks, and as a consequence, DataCollider runs with less overhead than Pacer. 

LiteRace: Effective Sampling for Lightweight Data-Race Detection[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
LiteRace, similar to Pacer, samples a percentage of memory accesses from a program. Where it differs is the parts of memory that LiteRace samples the most. The "hot spot" regions of memory are ones that are accessed most by the program. Since they are accessed the most, chances are that they have already been successfully debugged, or if there are data races there, they are benign. LiteRace detects these areas in memory as hot spots, and samples them at a much lower rate. This improves LiteRace's chances of capturing a valid data race at a much lower sampling rate. Where DataCollider bests LiteRace is based on LiteRace's installing mechanism. LiteRace needs to be recompiled into the software it is trying to debug, whereas DataColleder's breakpoints do not require any code changes to the program. This is a major success for DataCollider because often third party testers do not have the source code for a program. 

RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Trackings[http://delivery.acm.org/10.1145/1100000/1095832/p221-yu.pdf?key1=1095832&key2=8433721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437] 
combination of lock-set and happens-before reasoning 

RaceTrack uses a unique technique in order to detect data races. The program being debugged is run on top of RaceTrack as a virtual machine using the .NET framework, and it will examine all of the memory accesses that the program requests. As soon as suspicious behavior is exhibited, a warning is sent off to be later evaluated when the program terminates. RaceTrack uses this technique because several process intensive inspections of the state of the machine must be checked, and doing this on the fly is expensive. There are many problems with RaceTrack. It is very successful at detecting a vast percentage of data races, however, it has a high overhead and requires extreme amounts of memory. RaceTrack must save the state of the entire machine every time a warning is produced, and it also has to save each threads memory accesses to check which memory access "happened before". Since most warnings thrown are found to be benign, saving the state of the machine wastes computational power and memory. Long running programs also prove to be a problem, where the computer being debugged will run out of memory to store all of the warning states before the program terminates. It then will have to either increase overhead significantly to store the warnings on disk, or it will have to delete some warnings to make room for new ones. 

MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs[http://docs.google.com/viewer?a=v&q=cache:C8gWk-H3GmEJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.73.9551%26rep%3Drep1%26type%3Dpdf+MultiRace:+Efficient+on-the-fly+data+race+detection+in+multithreaded+C%2B%2B+programs&hl=en&gl=ca&pid=bl&srcid=ADGEESj1jYlzXMOwgbh7SVntUsHxVeI1TvmkU8Oslkm-L9gq-NIyglj5eD48rtkcziUQUynmjOmZojsyzw_tBRiLN6T0n6iiDZyUiFjBUfLijQbzNsRpDQCsMpn-xTiIqK2PUj4DXwoM&sig=AHIEtbRBHpMvb5fel3XOi5oASAogumY-rg] 
combination of lock-set and happens-before reasoning 

MultiRace is another hybrid style race condition debugger that uses two unique algorithms. The first algorithm, Djit is the happens-before iteration, which INSERT STUFF HERE. The second is an improved iteration of the lock-set algorithm. MultiRace is the most similar program to DataCollider in terms of their goals. Both strive to decrease overhead to near standard running times of the program itself, and to increase the program transparency for maximum user compatibility. MultiRace itself is several orders of magnitude more complicated than DataCollider, but since MultiRace hides its complexity from the user with transparency, it is still simple to use. It is arguable that MultiRace is superior for detecting races for C++ programs, however, MultiRace is not compatible with any other programming language. Since DataCollider uses hardware breakpoints, the coding language of the program is irrelevant. Also, since DataCollider avoids using both lock-set and happens before algorithms, it is versatile enough to even debug kernels. 

DataCollider is a very unique program. Most other dynamic race condition testers can be lumped into the three groups lock-set, happens-before, or hybrid. DataCollider, however, recognizes the errors of these styles of detection, and manages to avoid them completely. Even though there are issues with false positives and benign races, DataCollider provides very simple, versatile, and lightweight functionality in debugging a program. Future programs may take this unique style of race detection and add their own functionality to improve upon it. It could be that DataCollider could inspire a ground breaking solution to race conditions and how to detect them.

=Critique=

===Style===
This paper is well put together. It has a strong flow and there is nothing that seems out of place. The authors start with an introduction and then immediately identify key definitions that are used throughout the paper. In the second section which follows the introduction the authors identify the definition of a Data-Race as it relates to their paper. This is important since it is a key concept that is required to understand the entire paper. This definition is required because as the authors state there is no standard for exactly how to define a data-race.[1] In addition to important definitions any background information that is relevant to this paper is presented at the beginning. The key idea which the paper is based on in this case Data Collider and its implementation is explained. An evaluation and conclusion of Data Collider follow its description. The order of the sections makes sense and the author is not jumping around from one concept to another. The organization of the sections and information provided make the paper easy to follow and understand.

===Content===
=====Data Collider:=====
DataCollider seems like a very innovative piece of software. It’s new use of breakpoints inside kernel-space instead of lock-set or happens-before methods in user-mode let it check data race errors in the very kernel itself without producing as much overhead as its old contenders (it even finds data races for overheads less than five percent). One thing to note about DataCollider is that ninety percent of its output to the user is false alarms. This means that after running DataCollider, the user has to sift through all of the gathered data to find the ten percent of data that actually contains real data race errors.[1] The team of creator’s were able to create a to sort through all of the material it collects to only spit out the valuable information, but the creators still found some false alarms in the output . They have noted though that some users like to see the benign reports so that they can make design changes to their programs to make them more portable and scalable and therefore decided not to implement this. Even though DataCollider returns 90% false alarms the projects team have still been able to locate 25 errors in the Windows operating system. Of those 25 errors 12 have already been fixed.[1] This shows that DataCollider is an effective tool in locating data race errors within the kernel effectively enough that they can be corrected.

The overhead of any application running is very important to all users. The developers of DataCollider ran various tests to determine the overhead of running DataCollider based on the number of breakpoints. These results were included in the final paper. DataCollider has a low overall base overhead and it is only after 1000 breakpoints a second does the run time overhead increase drastically.[1] This adds to the effectiveness of DataCollider. Having a low overhead is very important to use of an application.

=References=
[1] Erickson, Musuvathi, Burchhardt, Olynyk, Effective Data-Race Detection for the Kernel, Microsoft Research, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf PDF]

COMP 3000 Essay 2 2010 Question 6

2010-12-02T14:43:10Z

Abondio2: /* Background Concepts */

=Paper=
'''Effective Data-Race Detection for the Kernel'''

Paper: http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf

Video: http://homeostasis.scs.carleton.ca/osdi/video/erickson.mp4

Authors: John Erickson, Madanlal Musuvathi, Sebastian Burckhardt, Kirk Olynyk from Microsoft Research

=Background Concepts=
Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.

A data race is a potentially catastrophic event which can be alarmingly common in modern concurrent systems. When one thread attempts to read or write on a memory location at the same time that another thread is writing on the same location, there exists a potential data race condition. If the race is not handled properly, it could have a wide range of negative consequences. In the best case, there might be data corruption rendering the affected files unreadable and useless; this may not be a major problem if there exist archived, non-corrupted versions of the data. In the worst case, a process (possibly even the operating system itself) may freak out and crash, unable to decide what to do about the unexpected input it receives.

Traditional data-race detection programs operate by running an isolated runtime and comparing it with the currently active runtime, to find situations that would have resulted in a data race if the runtimes were not isolated. DataCollider operates by temporarily setting up breakpoints at random memory access instances. If a certain memory access hits a breakpoint, DataCollider springs into action. The breakpoint causes the memory access instruction to be postponed, and so the instruction pretty much goes to sleep until DataCollider has finished its job. The job is like taking a before and after photograph of something; DataCollider records the data stored at the address the instruction was attempting to access, then allows the instruction to execute. Then DataCollider records the data again. If the before and after records do not match, then another thread has tampered with the data at the same time that this instruction was trying to read it; this is precisely the definition of a data race.

Most existing data race detectors use static detection techniques. These involve analysing program source code to determine where simultaneous accesses occur. This method is typically seen as less effective because it produces a warning every time synchronous accesses occur; the program then has to sort out all the false warnings from the legitimate error reports. The problem is that there are no heuristics that can consistently eliminate the false warnings without also eliminating some of the legitimate reports.

[Don't worry guys; that's not all I've got. I'm still working on it.]

--[[User:Abondio2|Austin Bondio]] 01:56, 2 December 2010 (UTC)

=Research problem=
What is the research problem being addressed by the paper? How does this problem relate to past related work?

The research problem being addressed by this paper is the detection of erroneous data races inside the kernel without creating much overhead. This problem occurs because read/write access instructions in processes are not always atomic (e.g two read/write commands may happen simultaneously). There are so many ways a data race error may occur that it is very hard to catch them all.

The research team’s program DataCollider needs to detect errors between the hardware and kernel as well as errors in context thread synchronization in the kernel which must synchronize between user-mode processes, interrupts and deferred procedure calls. As shown in the Background Concepts section, this error can create unwanted problems in kernel modules. The research group created DataCollider which puts breakpoints in memory accesses to check if two system calls are calling the same piece of memory. There have been attempts at a solution to this problem in the past that ran in user-mode, but not in kernel mode, and they produced excessive overhead. There are many problems with trying to apply these techniques to a kernel.

One technique that some detectors in the past have used is the “happens before” method. This checks whether one access happened before another or if the other happened first, and if neither of those options were the case, the two accesses were done simultaneously. This method gathers true data race errors but is very hard to implement.

Another method used is the “lock-set” approach. This method checks all of the locks that are held currently by a thread, and if all the accesses do not have at least one common lock, the method sends a warning. This method has many false alarms since many variables nowadays are shared using other ways than locks or have very complex locking systems that lockset cannot understand.

Both these methods produce excessive overhead due to the fact that they have to check every single memory call at runtime. In the next section we will discuss how DataCollider uses a new way to check for data race errors, that produces barely any overhead.

=Contribution=
What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

Proving that there is a problem with classic race detectors: 
The main contribution that DataCollider provides is the unique idea of using hardware breakpoints in a data race detector. The question is why is a unique idea necessary. Why does DataCollider have to "reinvent the wheel". There has been a plethora of race condition testers invented in the last two decades, and almost all of the dynamic data race detectors can be lumped into three categories. They either implement lock-set, happens-before, or a hybrid of the two types of detection. The research team for DataCollider looked at several of these implementations of race condition testers to find ways of improving their own program, and found that there are major problems in the classic ways of detecting race conditions. 

Some of the programs that were referenced were: 

* Eraser: A Dynamic Data Race Detector for Multithreaded Programs 
* RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Tracking 
* PACER: Proportional Detection of Data Races 
* LiteRace: Effective Sampling for Lightweight Data-Race Detection 
* MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs 

Eraser: A Dynamic Data Race Detector for Multithreaded Programs[http://delivery.acm.org/10.1145/270000/265927/p391-savage.pdf?key1=265927&key2=7323721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437]
 
lock-set based reasoning 

Eraser, a data race detector programmed in 1997, was one of the earlier data race detectors invented. It may have been a useful and revolutionary program of its time, however, it uses very low level techniques compared to most data race detectors today. One of the reason why it is unsuccessful is because it only checks whether memory accesses use proper locking techniques. If a memory access is found that does not use a lock, then Eraser will report a data race. In many cases, the misuse of proper locking techniques is a conscious decision by the programmer, so Eraser will report many false positives. Modern locking systems are also very complicated and have several different kinds of locks for different situations. It is difficult for one program to handle upwards of 12 types of locks, especially when they are very complicated. This does not take into account all of the benign problems such as date of access variables. Locking systems are notorious for reporting false positives such as this, and it is near impossible to change the architecture of the algorithm to ignore benign cases. 

PACER: Proportional Detection of Data Races[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
Pacer, a happens-before data race detector, uses the FastTrack algorithm to detect data races. FastTrack uses vector-clocks to keep track of two potentially conflicting threads. If the two threads conflict, a data race is thrown, and the state of the program is saved. Pacer samples a percentage of each memory access, (from 1 to 3 percent) and runs the FastTrack algorithm on each thread that accesses that part of memory. Similar to Pacer, DataCollider samples a percentage of the program's memory accesses, but instead of using vector-clocks to catch the second thread, hardware breakpoints are used. Pacer runs with an overhead of approximately one to three times the speed of the original program because it requires a fair amount of processing power to maintain the vector-clocks. Hardware break points are considerably faster than vector-clocks, and as a consequence, DataCollider runs with less overhead than Pacer. 

LiteRace: Effective Sampling for Lightweight Data-Race Detection[http://www.cs.ucla.edu/~dlmarino/pubs/pldi09.pdf] 
happens-before reasoning 
LiteRace, similar to Pacer, samples a percentage of memory accesses from a program. Where it differs is the parts of memory that LiteRace samples the most. The "hot spot" regions of memory are ones that are accessed most by the program. Since they are accessed the most, chances are that they have already been successfully debugged, or if there are data races there, they are benign. LiteRace detects these areas in memory as hot spots, and samples them at a much lower rate. This improves LiteRace's chances of capturing a valid data race at a much lower sampling rate. Where DataCollider bests LiteRace is based on LiteRace's installing mechanism. LiteRace needs to be recompiled into the software it is trying to debug, whereas DataColleder's breakpoints do not require any code changes to the program. This is a major success for DataCollider because often third party testers do not have the source code for a program. 

RaceTrack: Efficient Detection of Data Race Conditions via Adaptive Trackings[http://delivery.acm.org/10.1145/1100000/1095832/p221-yu.pdf?key1=1095832&key2=8433721921&coll=DL&dl=ACM&CFID=116768888&CFTOKEN=55577437] 
combination of lock-set and happens-before reasoning 

RaceTrack uses a unique technique in order to detect data races. The program being debugged is run on top of RaceTrack as a virtual machine using the .NET framework, and it will examine all of the memory accesses that the program requests. As soon as suspicious behavior is exhibited, a warning is sent off to be later evaluated when the program terminates. RaceTrack uses this technique because several process intensive inspections of the state of the machine must be checked, and doing this on the fly is expensive. There are many problems with RaceTrack. It is very successful at detecting a vast percentage of data races, however, it has a high overhead and requires extreme amounts of memory. RaceTrack must save the state of the entire machine every time a warning is produced, and it also has to save each threads memory accesses to check which memory access "happened before". Since most warnings thrown are found to be benign, saving the state of the machine wastes computational power and memory. Long running programs also prove to be a problem, where the computer being debugged will run out of memory to store all of the warning states before the program terminates. It then will have to either increase overhead significantly to store the warnings on disk, or it will have to delete some warnings to make room for new ones. 

MultiRace: Efficient on-the-fly data race detection in multithreaded C++ programs[http://docs.google.com/viewer?a=v&q=cache:C8gWk-H3GmEJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.73.9551%26rep%3Drep1%26type%3Dpdf+MultiRace:+Efficient+on-the-fly+data+race+detection+in+multithreaded+C%2B%2B+programs&hl=en&gl=ca&pid=bl&srcid=ADGEESj1jYlzXMOwgbh7SVntUsHxVeI1TvmkU8Oslkm-L9gq-NIyglj5eD48rtkcziUQUynmjOmZojsyzw_tBRiLN6T0n6iiDZyUiFjBUfLijQbzNsRpDQCsMpn-xTiIqK2PUj4DXwoM&sig=AHIEtbRBHpMvb5fel3XOi5oASAogumY-rg] 
combination of lock-set and happens-before reasoning 

MultiRace is another hybrid style race condition debugger that uses two unique algorithms. The first algorithm, Djit is the happens-before iteration, which INSERT STUFF HERE. The second is an improved iteration of the lock-set algorithm. MultiRace is the most similar program to DataCollider in terms of their goals. Both strive to decrease overhead to near standard running times of the program itself, and to increase the program transparency for maximum user compatibility. MultiRace itself is several orders of magnitude more complicated than DataCollider, but since MultiRace hides its complexity from the user with transparency, it is still simple to use. It is arguable that MultiRace is superior for detecting races for C++ programs, however, MultiRace is not compatible with any other programming language. Since DataCollider uses hardware breakpoints, the coding language of the program is irrelevant. Also, since DataCollider avoids using both lock-set and happens before algorithms, it is versatile enough to even debug kernels. 

DataCollider is a very unique program. Most other dynamic race condition testers can be lumped into the three groups lock-set, happens-before, or hybrid. DataCollider, however, recognizes the errors of these styles of detection, and manages to avoid them completely. Even though there are issues with false positives and benign races, DataCollider provides very simple, versatile, and lightweight functionality in debugging a program. Future programs may take this unique style of race detection and add their own functionality to improve upon it. It could be that DataCollider could inspire a ground breaking solution to race conditions and how to detect them.

=Critique=

===Style===
This paper is well put together. It has a strong flow and there is nothing that seems out of place. The authors start with an introduction and then immediately identify key definitions that are used throughout the paper. In the second section which follows the introduction the authors identify the definition of a Data-Race as it relates to their paper. This is important since it is a key concept that is required to understand the entire paper. This definition is required because as the authors state there is no standard for exactly how to define a data-race.[1] In addition to important definitions any background information that is relevant to this paper is presented at the beginning. The key idea which the paper is based on in this case Data Collider and its implementation is explained. An evaluation and conclusion of Data Collider follow its description. The order of the sections makes sense and the author is not jumping around from one concept to another. The organization of the sections and information provided make the paper easy to follow and understand.

===Content===
=====Data Collider:=====
DataCollider seems like a very innovative piece of software. It’s new use of breakpoints inside kernel-space instead of lock-set or happens-before methods in user-mode let it check data race errors in the very kernel itself without producing as much overhead as its old contenders (it even finds data races for overheads less than five percent). One thing to note about DataCollider is that ninety percent of its output to the user is false alarms. This means that after running DataCollider, the user has to sift through all of the gathered data to find the ten percent of data that actually contains real data race errors.[1] The team of creator’s were able to create a to sort through all of the material it collects to only spit out the valuable information, but the creators still found some false alarms in the output . They have noted though that some users like to see the benign reports so that they can make design changes to their programs to make them more portable and scalable and therefore decided not to implement this. Even though DataCollider returns 90% false alarms the projects team have still been able to locate 25 errors in the Windows operating system. Of those 25 errors 12 have already been fixed.[1] This shows that DataCollider is an effective tool in locating data race errors within the kernel effectively enough that they can be corrected.

The overhead of any application running is very important to all users. The developers of DataCollider ran various tests to determine the overhead of running DataCollider based on the number of breakpoints. These results were included in the final paper. DataCollider has a low overall base overhead and it is only after 1000 breakpoints a second does the run time overhead increase drastically.[1] This adds to the effectiveness of DataCollider. Having a low overhead is very important to use of an application.

=References=
[1] Erickson, Musuvathi, Burchhardt, Olynyk, Effective Data-Race Detection for the Kernel, Microsoft Research, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Erickson.pdf PDF]

COMP 3000 Essay 2 2010 Question 6

2010-12-02T01:57:39Z

Abondio2: /* Research problem */

COMP 3000 Essay 2 2010 Question 6

2010-12-02T01:57:11Z

Abondio2: /* Research problem */

COMP 3000 Essay 2 2010 Question 6

2010-12-02T01:56:14Z

Abondio2: /* Background Concepts */

COMP 3000 Essay 2 2010 Question 6

2010-11-29T18:57:16Z

Abondio2: /* Background Concepts */

COMP 3000 Essay 2 2010 Question 6

2010-11-29T15:44:37Z

Abondio2: /* Background Concepts */

User:Abondio2

2010-11-23T15:35:02Z

Abondio2:

Sup. I'm Austin.

Y'all can email me at abondio2@connect.carleton.ca OR bondioal@msn.com

In other news:

[[user:tmalone|Trevor]] is a twit.

[[user:tpham|Tuan]] is an O-train.

Talk:COMP 3000 Essay 2 2010 Question 6

2010-11-23T15:33:19Z

Abondio2:

'''Actual group members'''

- Nicholas Shires nshires@connect.carleton.ca

- Andrew Zemancik andy.zemancik@gmail.com

- [[user:abondio2|Austin Bondio]] -> abondio2@connect.carleton.ca

- David Krutsko :: dkrutsko at connect.carleton.ca

If everyone could just post there names and contact information.--[[User:Azemanci|Azemanci]] 02:57, 15 November 2010 (UTC)

'''Who's Doing What'''

I'll do 'Research Problem' and help out with the 'Critique' section, the professor said that part was pretty big [[User:Nshires|Nshires]] 20:45, 21 November 2010 (UTC)

Ill do Contribution: [[User:Achamney|Achamney]] 03:50, 22 November 2010 (UTC)

I've noticed a couple things for controversy, even though its not my topic
The biggest thing i saw was that dataCollider reports non-erroneous operations 90% of the time. This makes the user have to sift through all of the reports to separate the problems from the benign races. [[User:Achamney|Achamney]] 17:18, 22 November 2010 (UTC)

Hey guys, sorry I'm late to the party. I'll get started with Background Concepts. - [[user:abondio2|Austin Bondio]] 15:33, 23 November 2010 (UTC)

Talk:COMP 3000 Essay 2 2010 Question 6

2010-11-23T15:33:02Z

Abondio2:

User:Abondio2

2010-11-16T16:15:01Z

Abondio2:

Sup. I'm Austin.

Contact information:

Name: Austin Bondio

email: abondio2@connect.carleton.ca OR bondioal@msn.com

[[user:tmalone|Trevor Malone]] is a twit.

User:Abondio2

2010-11-16T15:22:55Z

Abondio2:

Sup. I'm Austin.

Contact information:

Name: Austin Bondio

email: abondio2@connect.carleton.ca OR bondioal@msn.com

[[user:tmalone|Trevor Malone]] is a fool.

User:Tmalone

2010-11-16T15:21:39Z

Abondio2: Created page with "Sup. I'm Trevor. I'm super lame."

Sup. I'm Trevor. I'm super lame.

User:Abondio2

2010-11-16T15:19:56Z

Abondio2:

Sup. I'm Austin.

Contact information:

Name: Austin Bondio

email: abondio2@connect.carleton.ca OR bondioal@msn.com

[[user:tmalone|Trevor Malone]] is a bitch.

User:Abondio2

2010-11-16T15:17:51Z

Abondio2:

Sup. I'm Austin.

Contact information:
Name: Austin Bondio
email: abondio2@connect.carleton.ca OR bondioal@msn.com

[[user:tmalone|Trevor Malone]] is a bitch.

User:Abondio2

2010-11-16T15:17:42Z

Abondio2:

Sup. I'm Austin.

Contact information:
Name: Austin Bondio
email: abondio2@connect.carleton.ca OR bondioal@msn.com

[[user:tmalone:Trevor Malone]] is a bitch.

User:Abondio2

2010-11-16T15:17:12Z

Abondio2:

Sup. I'm Austin.

Contact information:
Name: Austin Bondio
email: abondio2@connect.carleton.ca OR bondioal@msn.com

Trevor Malone is a bitch.

User:Abondio2

2010-11-16T15:12:52Z

Abondio2: Created page with "Sup. I'm Austin."

Sup. I'm Austin.

Talk:COMP 3000 Essay 2 2010 Question 6

2010-11-16T15:12:30Z

Abondio2:

Talk:COMP 3000 Essay 2 2010 Question 6

2010-11-15T15:55:13Z

Abondio2:

'''Actual group members'''

- Nicholas Shires

- Andrew Zemancik andy.zemancik@gmail.com

- Austin Bondio -> abondio2@connect.carleton.ca

If everyone could just post there names and contact information.--[[User:Azemanci|Azemanci]] 02:57, 15 November 2010 (UTC)

COMP 3000 Essay 1 2010 Question 5

2010-10-15T04:52:07Z

Abondio2: /* Answer */

=Question=

Compare and contrast the evolution of the default BSD/FreeBSD and Linux schedulers.

=Answer=

(This work belongs to --[[User:Mike Preston|Mike Preston]] 23:01, 13 October 2010 (UTC))
(modified by --[[User:AbsMechanik|AbsMechanik]] 03:00, 14 October 2010 (UTC))

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with, all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system (Jensen: 1985).

(This work was done by [[User:abondio2|Austin Bondio]] 22:27, 13 October 2010 (UTC))
(Modified by [[User:Mike Preston|Mike Preston]] and [[User:Sschnei1|Sschnei1]])

There are several different algorithms which are utilized in different schedulers, but a few key algorithms are outlined below[http://joshaas.net/linux/linux_cpu_scheduler.pdf][http://www.sci.csueastbay.edu/~billard/cs4560/node6.html][http://www.articles.assyriancafe.com/documents/CPU_Scheduling.pdf]: <ul>

<li>First-Come, First-Serve (also known as FIFO): No multi-tasking. Processes are queued in the order they are called. A process gets full, uninterrupted use of the CPU until it has finished running.</li>

<li>Shortest Job First (similar to Shortest Remaining Time and/or Shortest Process Next): Limited multi-tasking. The CPU handles the easiest tasks first, and complex, time-consuming tasks are handled last.</li>

<li>Fixed-Priority Preemptive Scheduling: Greater multi-tasking. Processes are assigned priority levels which are independent of their complexity. High-priority processes can be completed quickly, while low-priority processes can take a long time as new, higher-priority processes arrive and interrupt them.</li>

<li>Round-Robin Scheduling: Fair multi-tasking. This method is similar in concept to First-Come, First-Serve, but all processes are assigned the same priority level; that is, every running process is given an equal share of CPU time.</li>

<li>Multilevel Feedback Queue Scheduling: Rule-based multi-tasking. It is a combination of First-Come, First-Serve, Round-Robin & Fixed-Priority Preemptive Scheduling, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.</li>

</ul>
There is no one "best" algorithm and most schedulers utilize a combination of the different algorithms, such as the Multi-Level Feedback Queue, which in one way or another was utilized in Win XP/Vista, Linux 2.5-2.6, FreeBSD, Mac OSX, NetBSD and Solaris. One thing for certain is that as computer hardware increases in complexity, such as multiple core CPUs (parallelization), and with the advent of more powerful embedded/mobile devices, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers:
<ul>
<li>'''The default BSD/FreeBSD scheduler'''</li>
<li>'''The Linux scheduler'''</li>
</ul>

==BSD/Free BSD Schedulers==

===Overview & History===

(This work belongs to --[[User:Mike Preston|Mike Preston]] 23:41, 13 October 2010 (UTC))
(modified by --[[User:AbsMechanik|AbsMechanik]] 13:21, 14 October 2010 (UTC))

The FreeBSD kernel originally inherited its scheduler from 4.3BSD which itself is a version of the UNIX scheduler [http://dspace.hil.unb.ca:8080/bitstream/handle/1882/100/roberson.pdf?sequence=1].
In order to understand the evolution of the FreeBSD scheduler it is important to understand the original purpose and limitations of the BSD scheduler. Like most traditional UNIX based systems, the BSD scheduler was designed to work on a single core computer system (with limited I/O) and handle relatively small numbers of processes. As a result, managing resources with an O(n) scheduler did not raise any performance issues. To ensure fairness, the scheduler would switch between processes every 0.1 second (100 milliseconds) in a round-robin format [http://www.thehackademy.net/madchat/ebooks/sched/FreeBSD/the_FreeBSD_process_scheduler.pdf].

As computer systems increased in complexity with the advent of multi-core CPUs and various new I/O devices, computer programs, naturally, increased in size and complexity to accommodate and manage the new hardware. With CPUs becoming more powerful (derived from Moore's Law [http://www.intel.com/technology/mooreslaw/]), the time taken to complete a process decreased significantly. This additional complexity highlighted the problem of having an O(n) scheduler for managing processes, as more items were added to the scheduling algorithm, the performance decreased. With symmetric multiprocessing (SMP) becoming inevitable (multi-core CPU's) a better scheduler was required. This was the driving force behind the creation of ULE for the FreeBSD.

===Older Versions===

(This work belongs to --[[User:Mike Preston|Mike Preston]] 00:02, 14 October 2010 (UTC))

The FreeBSD kernel originally used an enhanced version of the BSD scheduler. Specifically, the FreeBSD scheduler included classes of threads, which was a drastic change from the round-robin scheduling used in BSD. Initially, there were two types of thread class, real-time and idle [https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf], and the scheduler would give processor time to real-time threads first and the idle threads had to wait until there were no real-time threads that needed access to the processor.

To manage the various threads, FreeBSD had data structures called runqueues, into which the threads were placed. The scheduler would evaluate the runqueues based on priority from highest to lowest and execute the first thread of a non-empty runqueue it found. Once a non-empty runqueue was found, each thread in the runqueue would be assigned an equal value time slice of 0.1 seconds, a value that has not changed in over 20 years [http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156].

Unfortunately, like the BSD scheduler it was based on, the original FreeBSD scheduler was not built to handle Symmetric Multiprocessing (SMP) or Symmetric Multithreading (SMT) on multi-core systems. The scheduler was still limited by an O(n) algorithm, which could not efficiently handle the loads required on ever increasingly powerful systems.
To allow FreeBSD to operate with more modern computer systems, it became clear that a new scheduler would be required, and thus, the ULE scheduler was created.

===Current Version===

(This work is owned by --[[User:Mike Preston|Mike Preston]] 00:23, 14 October 2010 (UTC))

ULE was first implemented as part of an "experimental" process (by Jeff Roberson) in FreeBSD v5.1, before being added to the FreeBSD v5.3 development cycle. It was designed with modern hardware and requirements in mind and had proper support for Symmetric Multi-Processing (SMP) (and HTT), Symmetric Multi-Threading (SMT) platforms and handle heavy workloads. Primarily being an event-driven scheduler, ULE utilized a double-queue mechanism (borrowed from Linux's O(1) scheduler) for ensuring fairness. This mechanism is briefly outlined as follows[http://dspace.hil.unb.ca:8080/bitstream/handle/1882/100/roberson.pdf?sequence=1]:
<ul>
<li>Process threads are assigned in 2 queues, 'current' and 'next'</li>
<li>Each thread is either assigned to 'current' or 'next'</li>
<li>Process execution first begins in the 'current' queue (priority based)</li>
<li>Once 'current' is empty, the 'next' and 'current' queues are switched and the threads are executed in a similar manner (priority based)</li>
<li>All idle threads are stored in a third queue, 'idle' and is run only when 'current' and 'next' are empty</li>

</ul>
It has been implemented as the default scheduler since v7.1 onwards. ULE works really well on both single or uni-processor environments as well as multi-core environments. It prevents unnecessary CPU migration, while making good use of CPU resources. However, 2 key practical problems arose due to the double-queue mechanism.

==Linux Schedulers==

===Overview & History===

(This work belongs to [[User:Wlawrenc|Wesley Lawrence]])

The Linux scheduler has a large history of improvement, always aiming towards having a fair and fast scheduler. Various methods and concepts have been tried over different versions to get this fair and fast scheduler, including round robin, iterations, and queues. A quick read through of the history of Linux implies that firstly, equal and balanced use of the system was the goal of the scheduler, and once that was in place, speed was soon improved. Early schedulers did their best to give processes equal time and resources, but used a bit of extra time (in computer terms) to accomplish this. By Linux 2.6, after experimenting with different concepts, the scheduler was able to provide fair access and time, as well as run as quickly as possible, with various features to allow personal tweaking by the system user, or even the processes themselves.

(This work was done by [[User:abondio2|Austin Bondio]], modified by [[User:Sschnei1|Sschnei1]] )

The Linux kernel has undergone many changes over the decades since its original release as the UNIX operating system in 1969 [http://www.unix.com/whats-your-mind/110099-unix-40th-birthday.html](Stallings: 2009). The early versions had relatively inefficient schedulers, which operated in linear time with respect to the number of tasks to schedule; currently the Linux scheduler is able to operate in constant time, independent of the number of tasks being scheduled.

===Older Versions===

(This work belongs to [[User:Sschnei1|Sschnei1]])

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was the first scheduler that supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a scheduling event. The scheduler divided tasks into epochs, allowing each task to execute up to its time slice. If a task did not use up its entire time slice, the remaining time was added to the next time slice to allow the task to execute longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability, and did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware architectures, such as multi-core processors.

===Current Version===

(This work was done by [[User:Sschnei1|Sschnei1]])

As of the Linux 2.6.23 introduction, the CFS (Completely Fair Scheduler) took its place in the kernel. CFS uses the idea of maintaining fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor. When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual runtime.

The model of how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently. Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu are stored in the right side of the tree. To keep fairness, the scheduler takes the left-most node from the tree. The scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable, the task then is inserted into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side of the tree are migrated to the left side to maintain fairness.

(This work was done by [[User:abondio2|Austin Bondio]])

Under a recent Linux system (version 2.6.35 or later), scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature, which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.

==Tabulated Results==

(Once I read/see some history on the BSD section above, I'll do the best comparison I can. I'm balancing 3000/3004 and other courses (like most of you), so I don't think I can research/write BSD and write the comparison, but I will try to help out as much as I can)

-- [[User:Wlawrenc|Wesley Lawrence]]

I've got this. Hopefully most of the sections I created properly answer the question. I'm still going to go over everyone's answers and keep in mind that wikipedia cannot be cited as a resource. --[[User:AbsMechanik|AbsMechanik]] 02:29, 14 October 2010 (UTC)

==Current Challenges==

== Resources ==

1. Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

2. Stallings, William, Operating Systems: Internals and Design Principles, Pearson Prentice Hall, 2009.

3. McKusick, M. K. and Neville-Neil, G. V. 2004. Thread Scheduling in FreeBSD 5.2. Queue 2, 7 (Oct. 2004), 58-64. DOI= http://doi.acm.org/10.1145/1035594.1035622

Talk:COMP 3000 Essay 1 2010 Question 5

2010-10-13T22:27:28Z

Abondio2: /* Essay Preview */

=Resources=

I just moved the Resources section to our discussion page --[[User:AbsMechanik|AbsMechanik]] 18:19, 13 October 2010 (UTC)

I found some resources, which might be useful to answer this question. As far as I know, FreeBSD uses a Multilevel feeback queue and Linux uses in the current version the completly fair scheduler.
 
-Some text about FreeBSD-scheduling http://www.informit.com/articles/article.aspx?p=366888&seqNum=4
 
-ULE Thread Scheduler: http://www.scribd.com/doc/3299978/ULE-Thread-Scheduler-for-FreeBSD
 
-Completly Fair Scheduler: http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt
 
-Brain Fuck Scheduler: http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler
 
-Sebastian

Also found a nice link with regards to the new Linux Scheduler for those interested:
http://www.ibm.com/developerworks/linux/library/l-scheduler/
 It is also referred to as the O(1) scheduler in algorithmic terms (CFS is O(log(n)) scheduler). Both have been in development by Ingo Molnár.
-Abhinav

Some more resources; 
http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html (includes history of Linux scheduler from 1.2 to 2.6) 
http://my.opera.com/blu3c4t/blog/show.dml/1531517 
-Wes 

 
Information on changes to the O(1) scheduler: 
"Linux Kernel Documentation" 
http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt 
 
General information on Linux Job Scheduling: 
"Linux Job Scheduling | Linux Journal" 
http://www.linuxjournal.com/article/4087 
 
Scheduling on multi-core Linux machines: 
"Node affine NUMA scheduler for Linux" 
http://home.arcor.de/efocht/sched/ 
 
More on Linux process scheduling: 
"Understanding the Linux kernel" 
http://oreilly.com/catalog/linuxkernel/chapter/ch10.html 
 
FreeBSD thread scheduling: 
"InformIT: FreeBSD Process Management" 
http://www.informit.com/articles/article.aspx?p=366888&seqNum=4 
- Austin Bondio

=Discussion=

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

Hi Mike,
Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
 
http://lwn.net/Articles/230574/
 
http://lwn.net/Articles/240474/

[[User:Sschnei1|Sschnei1]] 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:
1. requires a scheduler to be selected at the time the kernel is built.
2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:
1. threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
2. the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
3. once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
4. the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler
- overhaul of Original BSD scheduler to:
1. support symmetric multiprocessing (SMP)
2. support symmetric multithreading (SMT) on multi-core systems
3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
 
Some interesting pieces about the Linux scheduler include:
1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.
 
I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads.
--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/).
--[[User:AbsMechanik|AbsMechanik]] 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research?
--[[User:Sschnei1|Sschnei1]] 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --[[User:AbsMechanik|AbsMechanik]] 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--[[User:Mike Preston|Mike Preston]] 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- [[User:Sschnei1|Sschnei1]] 19:27, 9 October 2010 (UTC)

Maybe we should write down our contact emails and names to write down who would like to write what part.

Another suggestion is that someone should read over the text and compare it to the references posted in the "Sources" section and check if someone is doing plagiarism.

Sebastian Schneider - sebastian@gamersblog.ca

Hi, here's a little forward on schedulers in relation to types of threads I've composed based off of one of my sources, I'm not sure if its necessary since there is one Mike typed below, but here it just for you guys to examine:

Threads that perform a lot of I/O require a fast response time to keep input and output devices busy, but need little CPU time. On the other hand, compute-bound threads need to receive a lot of CPU time to finish their work, but have no requirement for fast response time. Other threads lie somewhere in between, with periods of I/O punctuated by periods of computation, and thus have requirements that vary over time. A well-designed scheduler should be able accommodate threads with all these requirements simultaneously.

Also: as Mike said earlier about BSD's issue with locking mechanisms, should I go into greater detail about that, or just include a little, few sentence description of the issue? I've found a source for what I think is what he was referring to: http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc
--[[User:CFaibish|CFaibish]] 17:54, 13 October 2010 (UTC)

= Essay Preview =

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--[[User:AbsMechanik|AbsMechanik]] 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--[[User:Mike Preston|Mike Preston]] 03:48, 7 October 2010 (UTC)

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be
efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea
of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was
the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its
predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a
scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task
did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute
longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and
did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware
architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the
same amount of time for each task to execute, independent of how big the tasks were.It kept track of the tasks in a
running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the
scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of
the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current
scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining
fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor.
When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep
fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual
runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing
and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently.
Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu
are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The
scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted
into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side
of the tree are migrated to the left side to maintain fairness. [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html]

-- [[User:Sschnei1|Sschnei1]] 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.[http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt]

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.[http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726]

Here's something I put into the Linux: Overview section:

The Linux kernel has undergone many changes over the decades since its original release as the UNIX operating system in 1969.[http://www.unix.com/whats-your-mind/110099-unix-40th-birthday.html] The early versions had relatively inefficient schedulers which operated in linear time with respect to the number of tasks to schedule; currently the Linux scheduler is able to operate in constant time, independent of the number of tasks being scheduled.

There are five basic algorithms for allocating CPU time[http://en.wikipedia.org/wiki/Scheduling_(computing)#Scheduling_disciplines]: <ul>
<li>First-in, First-out: No multi-tasking. Processes are queued in the order they are called. A process gets full, uninterrupted use of the CPU until it has finished running.</li>
<li>Shortest Time Remaining: Limited multi-tasking. The CPU handles the easiest tasks first, and complex, time-consuming tasks are handled last.</li>
<li>Fixed-Priority Preemptive Scheduling: Greater multi-tasking. Processes are assigned priority levels which are independent of their complexity. High-priority processes can be completed quickly, while low-priority processes can take a long time as new, higher-priority processes arrive and interrupt them.</li>
<li>Round-Robin Scheduling: Fair multi-tasking. This method is similar in concept to Fixed-Priority Preemptive Scheduling, but all processes are assigned the same priority level; that is, every running process is given an equal share of CPU time.</li>
<li>Multilevel Queue Scheduling: Rule-based multi-taksing. This method is also similar to Fixed-Priority Preemptive Scheduling, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.</li>
</ul>

-- [[User:abondio2|Austin Bondio]] Last edit: 22:27, 13 October 2010 (UTC)

I'm writing on a contrast of the CFS scheduler right now, please don't edit it.

In contrast the the O(1) scheduler, CFS realizes the model of a scheduler which can execute precise on real multitasking on real hardware. Precise multitasking means that each process can run at equal speed. If 4 processes are running at the same time, CFS assigns 25% of the CPU time to each process. On real hardware, only one task can be executed at a time and other tasks have to wait, which gives the running tasks an unfair amount of CPU time.

To avoid an unfair balance over the processes, CFS has a wait run-time for each process. CFS tries to pick the process with the highest wait run-time value. To provide a real multitasking, CFS splits up the CPU time between running processes.

Processes are not stored in a run queue, such in the O(1) scheduler, but in a self-balancing red-black tree, where self-balancing means that the task with the highest need for CPU time is stored in the most left node. Tasks with a lower need for CPU time are stored on the right side of the Tree, where tasks with a higher need for CPU time are stored on the left side. The task on the left side is picked by the scheduler and put in a virtual runtime. If the process is ready to run, it is given CPU time to run. The tree re-balances itself and new tasks can be taken out by the CPU.

CFS is designed in a way that it does not need timeslicing and still provide most performance with as much cpu utilization. This is due to the nanosecond granularity, which removes the need for jiffies or other HZ details. [http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt]

-- [[User:Sschnei1|Sschnei1]] 16:32, 13 October 2010 (UTC)

Hey guys, sorry I've been non-existent for the past little bit, here's what I've done so far. I've been going through stuff on the 4BSD and ULE schedulers, here's what I have so far:

In order for FreeBSD to function, it requires a scheduler to be selected at the time the kernel is built. Also, all calls to scheduling code are resolved at compile time, meaning that the overhead of indirect function calls for scheduling decisions is eliminated.

[3] The 4BSD scheduler was a general-purpose scheduler. Its primary goal was to balance threads’ different scheduling requirements. FreeBSD's time-share-scheduling algorithm is based on multilevel feedback queues. The system adjusts the priority of a thread dynamically to reflect resource requirements and the amount consumed by the thread. Based on the thread's priority, it gets moved between run queues. When a new thread attains a higher priority than the currently running one, the system immediately switches to the new thread, if it's in user mode. Otherwise, the system switches as soon as the current thread leaves the kernel. The system scans the run queues in order of highest to lowest priority, and executes the first thread of the first non-empty run queue it finds. The system tailors it's short-term scheduling algorithm to favor user-interactive jobs by raising the priority of threads waiting for I/O for one or more seconds, and by lowering the priority of threads that hog up significant amounts of CPU time.

[1] In older BSD systems, (and I mean old, as in 20 or so years ago), a 1 second quantum was used for the round-robin scheduling algorithm. Later, in BSD 4.2, it did rescheduling every 0.1 seconds, and priority re-computation every second, and these values haven’t changed since. Round-robin scheduling is done by a timeout mechanism, which informs the clock interrupt driver to call a certain system routine after a specified interval. The subroutine to be called, in this case, causes the rescheduling and then resubmits a timeout to call itself again 0.1 sec later. The priority re-computation is also timed by a subroutine that resubmits a timeout for itself.

The ULE Scheduler was first introduced in FreeBSD 5, however disabled by default in favor of the default 4BSD scheduler. It was not until FreeBSD 7.1 that the ULE scheduler became the new default. The ULE scheduler was an overhaul of the original scheduler, and allowed it support for symmetric multiprocessing (SMP), support for symmetric multithreading (SMT) on multi-core systems, and improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.
<more to come>

1 = http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node46.html
2 = http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc
3 = McKusick, M. K. and Neville-Neil, G. V. 2004. Thread Scheduling in FreeBSD 5.2. Queue 2, 7 (Oct. 2004), 58-64. DOI= http://doi.acm.org/10.1145/1035594.1035622

Notes: Lots of this is just paraphrasing stuff you guys said in the discussion section. In terms of citations, should it be a superscripted citation next to the fact snippet we used, or should it just be a list of sources at the bottom?

--[[User:CFaibish|CFaibish]] 17:51, 13 October 2010 (UTC)

I would agree with putting superscripted citations that refer to the Sources section? How do they do it in the wikipedia?
-- [[User:Sschnei1|Sschnei1]] 18:52, 13 October 2010 (UTC)

Superscripted citations seems to be the best way to do it. If we cite URLs throughout the essay, it will be much harder to read. To put in a superscripted citation, enclose the URL of your source in square brackets.

Also, who here is actually good at writing, and can compile all these paragraphs into one nice essay for us? I think we have enough raw information here, it's just a matter of putting it all together now.

-- [[abondio2|Austin Bondio]] 20:39, 13 October 2010 (UTC)

Abhinav is putting something together right now on the main page.

-- [[User:Sschnei1|Sschnei1]] 20:56, 13 October 2010 (UTC)

= Sources =

[1] http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

[2] http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

[3] http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726

[4] http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt

Talk:COMP 3000 Essay 1 2010 Question 5

2010-10-13T22:19:59Z

Abondio2: /* Essay Preview */

=Resources=

I just moved the Resources section to our discussion page --[[User:AbsMechanik|AbsMechanik]] 18:19, 13 October 2010 (UTC)

I found some resources, which might be useful to answer this question. As far as I know, FreeBSD uses a Multilevel feeback queue and Linux uses in the current version the completly fair scheduler.
 
-Some text about FreeBSD-scheduling http://www.informit.com/articles/article.aspx?p=366888&seqNum=4
 
-ULE Thread Scheduler: http://www.scribd.com/doc/3299978/ULE-Thread-Scheduler-for-FreeBSD
 
-Completly Fair Scheduler: http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt
 
-Brain Fuck Scheduler: http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler
 
-Sebastian

Also found a nice link with regards to the new Linux Scheduler for those interested:
http://www.ibm.com/developerworks/linux/library/l-scheduler/
 It is also referred to as the O(1) scheduler in algorithmic terms (CFS is O(log(n)) scheduler). Both have been in development by Ingo Molnár.
-Abhinav

Some more resources; 
http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html (includes history of Linux scheduler from 1.2 to 2.6) 
http://my.opera.com/blu3c4t/blog/show.dml/1531517 
-Wes 

 
Information on changes to the O(1) scheduler: 
"Linux Kernel Documentation" 
http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt 
 
General information on Linux Job Scheduling: 
"Linux Job Scheduling | Linux Journal" 
http://www.linuxjournal.com/article/4087 
 
Scheduling on multi-core Linux machines: 
"Node affine NUMA scheduler for Linux" 
http://home.arcor.de/efocht/sched/ 
 
More on Linux process scheduling: 
"Understanding the Linux kernel" 
http://oreilly.com/catalog/linuxkernel/chapter/ch10.html 
 
FreeBSD thread scheduling: 
"InformIT: FreeBSD Process Management" 
http://www.informit.com/articles/article.aspx?p=366888&seqNum=4 
- Austin Bondio

=Discussion=

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

Hi Mike,
Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
 
http://lwn.net/Articles/230574/
 
http://lwn.net/Articles/240474/

[[User:Sschnei1|Sschnei1]] 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:
1. requires a scheduler to be selected at the time the kernel is built.
2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:
1. threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
2. the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
3. once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
4. the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler
- overhaul of Original BSD scheduler to:
1. support symmetric multiprocessing (SMP)
2. support symmetric multithreading (SMT) on multi-core systems
3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
 
Some interesting pieces about the Linux scheduler include:
1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.
 
I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads.
--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/).
--[[User:AbsMechanik|AbsMechanik]] 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research?
--[[User:Sschnei1|Sschnei1]] 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --[[User:AbsMechanik|AbsMechanik]] 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--[[User:Mike Preston|Mike Preston]] 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- [[User:Sschnei1|Sschnei1]] 19:27, 9 October 2010 (UTC)

Maybe we should write down our contact emails and names to write down who would like to write what part.

Another suggestion is that someone should read over the text and compare it to the references posted in the "Sources" section and check if someone is doing plagiarism.

Sebastian Schneider - sebastian@gamersblog.ca

Hi, here's a little forward on schedulers in relation to types of threads I've composed based off of one of my sources, I'm not sure if its necessary since there is one Mike typed below, but here it just for you guys to examine:

Threads that perform a lot of I/O require a fast response time to keep input and output devices busy, but need little CPU time. On the other hand, compute-bound threads need to receive a lot of CPU time to finish their work, but have no requirement for fast response time. Other threads lie somewhere in between, with periods of I/O punctuated by periods of computation, and thus have requirements that vary over time. A well-designed scheduler should be able accommodate threads with all these requirements simultaneously.

Also: as Mike said earlier about BSD's issue with locking mechanisms, should I go into greater detail about that, or just include a little, few sentence description of the issue? I've found a source for what I think is what he was referring to: http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc
--[[User:CFaibish|CFaibish]] 17:54, 13 October 2010 (UTC)

= Essay Preview =

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--[[User:AbsMechanik|AbsMechanik]] 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--[[User:Mike Preston|Mike Preston]] 03:48, 7 October 2010 (UTC)

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be
efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea
of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was
the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its
predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a
scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task
did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute
longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and
did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware
architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the
same amount of time for each task to execute, independent of how big the tasks were.It kept track of the tasks in a
running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the
scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of
the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current
scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining
fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor.
When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep
fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual
runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing
and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently.
Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu
are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The
scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted
into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side
of the tree are migrated to the left side to maintain fairness. [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html]

-- [[User:Sschnei1|Sschnei1]] 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.[http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt]

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.[http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726]

Here's something I put into the Linux: Overview section:
The Linux kernel has undergone many changes over the decades since its original release as the UNIX operating system in 1969.[http://www.unix.com/whats-your-mind/110099-unix-40th-birthday.html] The early versions had relatively inefficient schedulers which operated in linear time with respect to the number of tasks to schedule; currently the Linux scheduler is able to operate in constant time, independent of the number of tasks being scheduled.

There are five basic algorithms for allocating CPU time[http://en.wikipedia.org/wiki/Scheduling_(computing)#Scheduling_disciplines]: <ul>
<li>First-in, First-out: No multi-tasking. Processes are queued in the order they are called. A process gets full, uninterrupted use of the CPU until it has finished running.</li>
<li>Shortest Time Remaining: Limited multi-tasking. The CPU handles the easiest tasks first, and complex, time-consuming tasks are handled last.</li>
<li>Fixed-Priority Preemptive Scheduling: Greater multi-tasking. Processes are assigned priority levels which are independent of their complexity. High-priority processes can be completed quickly, while low-priority processes can take a long time as new, higher-priority processes arrive and interrupt them.</li>
<li>Round-Robin Scheduling: Fair multi-tasking. This method is similar in concept to Fixed-Priority Preemptive Scheduling, but all processes are assigned the same priority level; that is, every running process is given an equal share of CPU time.</li>
<li>Multilevel Queue Scheduling: Rule-based multi-taksing. This method is also similar to Fixed-Priority Preemptive Scheduling, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.</li>
</ul>

-- [[User:abondio2|Austin Bondio]] Last edit: 22:19, 13 October 2010 (UTC)

I'm writing on a contrast of the CFS scheduler right now, please don't edit it.

In contrast the the O(1) scheduler, CFS realizes the model of a scheduler which can execute precise on real multitasking on real hardware. Precise multitasking means that each process can run at equal speed. If 4 processes are running at the same time, CFS assigns 25% of the CPU time to each process. On real hardware, only one task can be executed at a time and other tasks have to wait, which gives the running tasks an unfair amount of CPU time.

To avoid an unfair balance over the processes, CFS has a wait run-time for each process. CFS tries to pick the process with the highest wait run-time value. To provide a real multitasking, CFS splits up the CPU time between running processes.

Processes are not stored in a run queue, such in the O(1) scheduler, but in a self-balancing red-black tree, where self-balancing means that the task with the highest need for CPU time is stored in the most left node. Tasks with a lower need for CPU time are stored on the right side of the Tree, where tasks with a higher need for CPU time are stored on the left side. The task on the left side is picked by the scheduler and put in a virtual runtime. If the process is ready to run, it is given CPU time to run. The tree re-balances itself and new tasks can be taken out by the CPU.

CFS is designed in a way that it does not need timeslicing and still provide most performance with as much cpu utilization. This is due to the nanosecond granularity, which removes the need for jiffies or other HZ details. [http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt]

-- [[User:Sschnei1|Sschnei1]] 16:32, 13 October 2010 (UTC)

Hey guys, sorry I've been non-existent for the past little bit, here's what I've done so far. I've been going through stuff on the 4BSD and ULE schedulers, here's what I have so far:

In order for FreeBSD to function, it requires a scheduler to be selected at the time the kernel is built. Also, all calls to scheduling code are resolved at compile time, meaning that the overhead of indirect function calls for scheduling decisions is eliminated.

[3] The 4BSD scheduler was a general-purpose scheduler. Its primary goal was to balance threads’ different scheduling requirements. FreeBSD's time-share-scheduling algorithm is based on multilevel feedback queues. The system adjusts the priority of a thread dynamically to reflect resource requirements and the amount consumed by the thread. Based on the thread's priority, it gets moved between run queues. When a new thread attains a higher priority than the currently running one, the system immediately switches to the new thread, if it's in user mode. Otherwise, the system switches as soon as the current thread leaves the kernel. The system scans the run queues in order of highest to lowest priority, and executes the first thread of the first non-empty run queue it finds. The system tailors it's short-term scheduling algorithm to favor user-interactive jobs by raising the priority of threads waiting for I/O for one or more seconds, and by lowering the priority of threads that hog up significant amounts of CPU time.

[1] In older BSD systems, (and I mean old, as in 20 or so years ago), a 1 second quantum was used for the round-robin scheduling algorithm. Later, in BSD 4.2, it did rescheduling every 0.1 seconds, and priority re-computation every second, and these values haven’t changed since. Round-robin scheduling is done by a timeout mechanism, which informs the clock interrupt driver to call a certain system routine after a specified interval. The subroutine to be called, in this case, causes the rescheduling and then resubmits a timeout to call itself again 0.1 sec later. The priority re-computation is also timed by a subroutine that resubmits a timeout for itself.

The ULE Scheduler was first introduced in FreeBSD 5, however disabled by default in favor of the default 4BSD scheduler. It was not until FreeBSD 7.1 that the ULE scheduler became the new default. The ULE scheduler was an overhaul of the original scheduler, and allowed it support for symmetric multiprocessing (SMP), support for symmetric multithreading (SMT) on multi-core systems, and improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.
<more to come>

1 = http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node46.html
2 = http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc
3 = McKusick, M. K. and Neville-Neil, G. V. 2004. Thread Scheduling in FreeBSD 5.2. Queue 2, 7 (Oct. 2004), 58-64. DOI= http://doi.acm.org/10.1145/1035594.1035622

Notes: Lots of this is just paraphrasing stuff you guys said in the discussion section. In terms of citations, should it be a superscripted citation next to the fact snippet we used, or should it just be a list of sources at the bottom?

--[[User:CFaibish|CFaibish]] 17:51, 13 October 2010 (UTC)

I would agree with putting superscripted citations that refer to the Sources section? How do they do it in the wikipedia?
-- [[User:Sschnei1|Sschnei1]] 18:52, 13 October 2010 (UTC)

Superscripted citations seems to be the best way to do it. If we cite URLs throughout the essay, it will be much harder to read. To put in a superscripted citation, enclose the URL of your source in square brackets.

Also, who here is actually good at writing, and can compile all these paragraphs into one nice essay for us? I think we have enough raw information here, it's just a matter of putting it all together now.

-- [[abondio2|Austin Bondio]] 20:39, 13 October 2010 (UTC)

Abhinav is putting something together right now on the main page.

-- [[User:Sschnei1|Sschnei1]] 20:56, 13 October 2010 (UTC)

= Sources =

[1] http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

[2] http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

[3] http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726

[4] http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt

COMP 3000 Essay 1 2010 Question 5

2010-10-13T22:17:19Z

Abondio2: /* Linux Schedulers */

=Question=

Compare and contrast the evolution of the default BSD/FreeBSD and Linux schedulers.

=Answer=

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system. As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

==BSD/Free BSD Schedulers==

===Overview & History===

===Older Versions===

===Current Version===

==Linux Schedulers==

(Note to the other group members: Feel free to modify or remove anything I post here. I'm just trying to piece together what you've all posted in the discussion section and turn it into a single paragraph. You know. Just to see how it looks.)

-- [[User:abondio2|Austin Bondio]] Last edit: 22:17, 13 October 2010 (UTC)

(Same for me, I'm trying to put together the overview/history and work on the comparison section of the essay, all based off the history you guys give. If I miss anything or get anything wrong, feel free to correct.)

-- [[User:Wlawrenc|Wesley Lawrence]]

===Overview & History===

(This work belongs to [[User:Wlawrenc|Wesley Lawrence]])

The Linux scheduler has a large history of improvement, always aiming towards having a fair and fast scheduler. Various methods and concepts have been tried over different versions to get this fair and fast scheduler, including round robins, iteration, and queues. A quick read through of the history of Linux implies that firstly, equal and balanced use of the system was the goal of the scheduler, and once that was in place, speed was soon improved. Early schedulers did their best to give processes equal time and resources, but used a bit of extra time (in computer terms) to accomplish this. By Linux 2.6, after experimenting with different concepts, the scheduler was able to provide fair access and time, as well as run as quickly as possible, with various features to allow personal tweaking by the system user, or even the processes themselves.

(This work was done by [[User:abondio2|Austin Bondio]])

The Linux kernel has undergone many changes over the decades since its original release as the UNIX operating system in 1969.[http://www.unix.com/whats-your-mind/110099-unix-40th-birthday.html] The early versions had relatively inefficient schedulers which operated in linear time with respect to the number of tasks to schedule; currently the Linux scheduler is able to operate in constant time, independent of the number of tasks being scheduled.

There are five basic algorithms for allocating CPU time[http://en.wikipedia.org/wiki/Scheduling_(computing)#Scheduling_disciplines]: <ul>
<li>First-in, First-out: No multi-tasking. Processes are queued in the order they are called. A process gets full, uninterrupted use of the CPU until it has finished running.</li>
<li>Shortest Time Remaining: Limited multi-tasking. The CPU handles the easiest tasks first, and complex, time-consuming tasks are handled last.</li>
<li>Fixed-Priority Preemptive Scheduling: Greater multi-tasking. Processes are assigned priority levels which are independent of their complexity. High-priority processes can be completed quickly, while low-priority processes can take a long time as new, higher-priority processes arrive and interrupt them.</li>
<li>Round-Robin Scheduling: Fair multi-tasking. This method is similar in concept to Fixed-Priority Preemptive Scheduling, but all processes are assigned the same priority level; that is, every running process is given an equal share of CPU time.</li>
<li>Multilevel Queue Scheduling: Rule-based multi-taksing. This method is also similar to Fixed-Priority Preemptive Scheduling, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.</li>
</ul>

===Older Versions===

(This work belongs to [[User:Sschnei1|Sschnei1]])

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware architectures, such as multi-core processors.

===Current Version===

(This work was done by [[User:Sschnei1|Sschnei1]])

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor. When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently. Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side of the tree are migrated to the left side to maintain fairness.

(This work was done by [[User:abondio2|Austin Bondio]])

Under a recent Linux system (version 2.6.35 or later), scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.

==Tabulated Results==

(Once I read/see some history on the BSD section above, I'll do the best comparison I can. I'm balancing 3000/3004 and other courses (like most of you), so I don't think I can research/write BSD and write the comparison, but I will try to help out as much as I can)

-- [[User:Wlawrenc|Wesley Lawrence]]

==Current Challenges==

COMP 3000 Essay 1 2010 Question 5

2010-10-13T22:16:04Z

Abondio2: /* Linux Schedulers */

=Question=

Compare and contrast the evolution of the default BSD/FreeBSD and Linux schedulers.

=Answer=

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system. As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

==BSD/Free BSD Schedulers==

===Overview & History===

===Older Versions===

===Current Version===

==Linux Schedulers==

(Note to the other group members: Feel free to modify or remove anything I post here. I'm just trying to piece together what you've all posted in the discussion section and turn it into a single paragraph. You know. Just to see how it looks.)

-- [[User:abondio2|Austin Bondio]] Last edit: 21:15, 13 October 2010 (UTC)

(Same for me, I'm trying to put together the overview/history and work on the comparison section of the essay, all based off the history you guys give. If I miss anything or get anything wrong, feel free to correct.)

-- [[User:Wlawrenc|Wesley Lawrence]]

===Overview & History===

(This work belongs to [[User:Wlawrenc|Wesley Lawrence]])

The Linux scheduler has a large history of improvement, always aiming towards having a fair and fast scheduler. Various methods and concepts have been tried over different versions to get this fair and fast scheduler, including round robins, iteration, and queues. A quick read through of the history of Linux implies that firstly, equal and balanced use of the system was the goal of the scheduler, and once that was in place, speed was soon improved. Early schedulers did their best to give processes equal time and resources, but used a bit of extra time (in computer terms) to accomplish this. By Linux 2.6, after experimenting with different concepts, the scheduler was able to provide fair access and time, as well as run as quickly as possible, with various features to allow personal tweaking by the system user, or even the processes themselves.

(This work was done by [[User:abondio2|Austin Bondio]])

The Linux kernel has undergone many changes over the decades since its original release as the UNIX operating system in 1969.[http://www.unix.com/whats-your-mind/110099-unix-40th-birthday.html] The early versions had relatively inefficient schedulers which operated in linear time with respect to the number of tasks to schedule; currently the Linux scheduler is able to operate in constant time, independent of the number of tasks being scheduled.

There are five basic algorithms for allocating CPU time[http://en.wikipedia.org/wiki/Scheduling_(computing)#Scheduling_disciplines]: <ul>
<li>First-in, First-out: No multi-tasking. Processes are queued in the order they are called. A process gets full, uninterrupted use of the CPU until it has finished running.</li>
<li>Shortest Time Remaining: Limited multi-tasking. The CPU handles the easiest tasks first, and complex, time-consuming tasks are handled last.</li>
<li>Fixed-Priority Preemptive Scheduling: Greater multi-tasking. Processes are assigned priority levels which are independent of their complexity. High-priority processes can be completed quickly, while low-priority processes can take a long time as new, higher-priority processes arrive and interrupt them.</li>
<li>Round-Robin Scheduling: Fair multi-tasking. This method is similar in concept to Fixed-Priority Preemptive Scheduling, but all processes are assigned the same priority level; that is, every running process is given an equal share of CPU time.</li>
<li>Multilevel Queue Scheduling: Rule-based multi-taksing. This method is also similar to Fixed-Priority Preemptive Scheduling, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.</li>
</ul>

===Older Versions===

(This work belongs to [[User:Sschnei1|Sschnei1]])

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware architectures, such as multi-core processors.

===Current Version===

(This work was done by [[User:Sschnei1|Sschnei1]])

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor. When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently. Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side of the tree are migrated to the left side to maintain fairness.

(This work was done by [[User:abondio2|Austin Bondio]])

Under a recent Linux system (version 2.6.35 or later), scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.

==Tabulated Results==

(Once I read/see some history on the BSD section above, I'll do the best comparison I can. I'm balancing 3000/3004 and other courses (like most of you), so I don't think I can research/write BSD and write the comparison, but I will try to help out as much as I can)

-- [[User:Wlawrenc|Wesley Lawrence]]

==Current Challenges==

COMP 3000 Essay 1 2010 Question 5

2010-10-13T21:15:44Z

Abondio2: /* Linux Schedulers */

COMP 3000 Essay 1 2010 Question 5

2010-10-13T21:13:48Z

Abondio2: /* Current Version */

COMP 3000 Essay 1 2010 Question 5

2010-10-13T21:06:35Z

Abondio2: /* Older Versions */

COMP 3000 Essay 1 2010 Question 5

2010-10-13T20:57:57Z

Abondio2: /* Linux Schedulers */

Talk:COMP 3000 Essay 1 2010 Question 5

2010-10-13T20:39:44Z

Abondio2: /* Essay Preview */

=Resources=

I just moved the Resources section to our discussion page --[[User:AbsMechanik|AbsMechanik]] 18:19, 13 October 2010 (UTC)

I found some resources, which might be useful to answer this question. As far as I know, FreeBSD uses a Multilevel feeback queue and Linux uses in the current version the completly fair scheduler.
 
-Some text about FreeBSD-scheduling http://www.informit.com/articles/article.aspx?p=366888&seqNum=4
 
-ULE Thread Scheduler: http://www.scribd.com/doc/3299978/ULE-Thread-Scheduler-for-FreeBSD
 
-Completly Fair Scheduler: http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt
 
-Brain Fuck Scheduler: http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler
 
-Sebastian

Also found a nice link with regards to the new Linux Scheduler for those interested:
http://www.ibm.com/developerworks/linux/library/l-scheduler/
 It is also referred to as the O(1) scheduler in algorithmic terms (CFS is O(log(n)) scheduler). Both have been in development by Ingo Molnár.
-Abhinav

Some more resources; 
http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html (includes history of Linux scheduler from 1.2 to 2.6) 
http://my.opera.com/blu3c4t/blog/show.dml/1531517 
-Wes 

 
Information on changes to the O(1) scheduler: 
"Linux Kernel Documentation" 
http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt 
 
General information on Linux Job Scheduling: 
"Linux Job Scheduling | Linux Journal" 
http://www.linuxjournal.com/article/4087 
 
Scheduling on multi-core Linux machines: 
"Node affine NUMA scheduler for Linux" 
http://home.arcor.de/efocht/sched/ 
 
More on Linux process scheduling: 
"Understanding the Linux kernel" 
http://oreilly.com/catalog/linuxkernel/chapter/ch10.html 
 
FreeBSD thread scheduling: 
"InformIT: FreeBSD Process Management" 
http://www.informit.com/articles/article.aspx?p=366888&seqNum=4 
- Austin Bondio

=Discussion=

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

Hi Mike,
Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
 
http://lwn.net/Articles/230574/
 
http://lwn.net/Articles/240474/

[[User:Sschnei1|Sschnei1]] 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:
1. requires a scheduler to be selected at the time the kernel is built.
2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:
1. threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
2. the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
3. once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
4. the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler
- overhaul of Original BSD scheduler to:
1. support symmetric multiprocessing (SMP)
2. support symmetric multithreading (SMT) on multi-core systems
3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
 
Some interesting pieces about the Linux scheduler include:
1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.
 
I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads.
--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/).
--[[User:AbsMechanik|AbsMechanik]] 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research?
--[[User:Sschnei1|Sschnei1]] 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --[[User:AbsMechanik|AbsMechanik]] 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--[[User:Mike Preston|Mike Preston]] 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- [[User:Sschnei1|Sschnei1]] 19:27, 9 October 2010 (UTC)

Maybe we should write down our contact emails and names to write down who would like to write what part.

Another suggestion is that someone should read over the text and compare it to the references posted in the "Sources" section and check if someone is doing plagiarism.

Sebastian Schneider - sebastian@gamersblog.ca

Hi, here's a little forward on schedulers in relation to types of threads I've composed based off of one of my sources, I'm not sure if its necessary since there is one Mike typed below, but here it just for you guys to examine:

Threads that perform a lot of I/O require a fast response time to keep input and output devices busy, but need little CPU time. On the other hand, compute-bound threads need to receive a lot of CPU time to finish their work, but have no requirement for fast response time. Other threads lie somewhere in between, with periods of I/O punctuated by periods of computation, and thus have requirements that vary over time. A well-designed scheduler should be able accommodate threads with all these requirements simultaneously.

Also: as Mike said earlier about BSD's issue with locking mechanisms, should I go into greater detail about that, or just include a little, few sentence description of the issue? I've found a source for what I think is what he was referring to: http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc
--[[User:CFaibish|CFaibish]] 17:54, 13 October 2010 (UTC)

= Essay Preview =

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--[[User:AbsMechanik|AbsMechanik]] 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--[[User:Mike Preston|Mike Preston]] 03:48, 7 October 2010 (UTC)

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be
efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea
of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was
the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its
predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a
scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task
did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute
longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and
did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware
architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the
same amount of time for each task to execute, independent of how big the tasks were.It kept track of the tasks in a
running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the
scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of
the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current
scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining
fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor.
When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep
fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual
runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing
and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently.
Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu
are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The
scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted
into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side
of the tree are migrated to the left side to maintain fairness. [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html]

-- [[User:Sschnei1|Sschnei1]] 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.[http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt]

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.[http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726]

-- [[User:abondio2|Austin Bondio]] Last edit: 18:39, 12 October 2010 (UTC)

I'm writing on a contrast of the CFS scheduler right now, please don't edit it.

In contrast the the O(1) scheduler, CFS realizes the model of a scheduler which can execute precise on real multitasking on real hardware. Precise multitasking means that each process can run at equal speed. If 4 processes are running at the same time, CFS assigns 25% of the CPU time to each process. On real hardware, only one task can be executed at a time and other tasks have to wait, which gives the running tasks an unfair amount of CPU time.

To avoid an unfair balance over the processes, CFS has a wait run-time for each process. CFS tries to pick the process with the highest wait run-time value. To provide a real multitasking, CFS splits up the CPU time between running processes.

Processes are not stored in a run queue, such in the O(1) scheduler, but in a self-balancing red-black tree, where self-balancing means that the task with the highest need for CPU time is stored in the most left node. Tasks with a lower need for CPU time are stored on the right side of the Tree, where tasks with a higher need for CPU time are stored on the left side. The task on the left side is picked by the scheduler and put in a virtual runtime. If the process is ready to run, it is given CPU time to run. The tree re-balances itself and new tasks can be taken out by the CPU.

CFS is designed in a way that it does not need timeslicing and still provide most performance with as much cpu utilization. This is due to the nanosecond granularity, which removes the need for jiffies or other HZ details. [http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt]

-- [[User:Sschnei1|Sschnei1]] 16:32, 13 October 2010 (UTC)

Hey guys, sorry I've been non-existent for the past little bit, here's what I've done so far. I've been going through stuff on the 4BSD and ULE schedulers, here's what I have so far:

In order for FreeBSD to function, it requires a scheduler to be selected at the time the kernel is built. Also, all calls to scheduling code are resolved at compile time, meaning that the overhead of indirect function calls for scheduling decisions is eliminated.

[3] The 4BSD scheduler was a general-purpose scheduler. Its primary goal was to balance threads’ different scheduling requirements. FreeBSD's time-share-scheduling algorithm is based on multilevel feedback queues. The system adjusts the priority of a thread dynamically to reflect resource requirements and the amount consumed by the thread. Based on the thread's priority, it gets moved between run queues. When a new thread attains a higher priority than the currently running one, the system immediately switches to the new thread, if it's in user mode. Otherwise, the system switches as soon as the current thread leaves the kernel. The system scans the run queues in order of highest to lowest priority, and executes the first thread of the first non-empty run queue it finds. The system tailors it's short-term scheduling algorithm to favor user-interactive jobs by raising the priority of threads waiting for I/O for one or more seconds, and by lowering the priority of threads that hog up significant amounts of CPU time.

[1] In older BSD systems, (and I mean old, as in 20 or so years ago), a 1 second quantum was used for the round-robin scheduling algorithm. Later, in BSD 4.2, it did rescheduling every 0.1 seconds, and priority re-computation every second, and these values haven’t changed since. Round-robin scheduling is done by a timeout mechanism, which informs the clock interrupt driver to call a certain system routine after a specified interval. The subroutine to be called, in this case, causes the rescheduling and then resubmits a timeout to call itself again 0.1 sec later. The priority re-computation is also timed by a subroutine that resubmits a timeout for itself.

The ULE Scheduler was first introduced in FreeBSD 5, however disabled by default in favor of the default 4BSD scheduler. It was not until FreeBSD 7.1 that the ULE scheduler became the new default. The ULE scheduler was an overhaul of the original scheduler, and allowed it support for symmetric multiprocessing (SMP), support for symmetric multithreading (SMT) on multi-core systems, and improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.
<more to come>

1 = http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node46.html
2 = http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc
3 = McKusick, M. K. and Neville-Neil, G. V. 2004. Thread Scheduling in FreeBSD 5.2. Queue 2, 7 (Oct. 2004), 58-64. DOI= http://doi.acm.org/10.1145/1035594.1035622

Notes: Lots of this is just paraphrasing stuff you guys said in the discussion section. In terms of citations, should it be a superscripted citation next to the fact snippet we used, or should it just be a list of sources at the bottom?

--[[User:CFaibish|CFaibish]] 17:51, 13 October 2010 (UTC)

I would agree with putting superscripted citations that refer to the Sources section? How do they do it in the wikipedia?
-- [[User:Sschnei1|Sschnei1]] 18:52, 13 October 2010 (UTC)

Superscripted citations seems to be the best way to do it. If we cite URLs throughout the essay, it will be much harder to read. To put in a superscripted citation, enclose the URL of your source in square brackets.

Also, who here is actually good at writing, and can compile all these paragraphs into one nice essay for us? I think we have enough raw information here, it's just a matter of putting it all together now.

-- [[abondio2|Austin Bondio]] 20:39, 13 October 2010 (UTC)

= Sources =

[1] http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

[2] http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

[3] http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726

[4] http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt

Talk:COMP 3000 Essay 1 2010 Question 5

2010-10-12T18:39:57Z

Abondio2: /* Essay Preview */

=Discussion=

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

Hi Mike,
Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
 
http://lwn.net/Articles/230574/
 
http://lwn.net/Articles/240474/

[[User:Sschnei1|Sschnei1]] 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:
1. requires a scheduler to be selected at the time the kernel is built.
2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:
1. threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
2. the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
3. once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
4. the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler
- overhaul of Original BSD scheduler to:
1. support symmetric multiprocessing (SMP)
2. support symmetric multithreading (SMT) on multi-core systems
3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
 
Some interesting pieces about the Linux scheduler include:
1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.
 
I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads.
--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/).
--[[User:AbsMechanik|AbsMechanik]] 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research?
--[[User:Sschnei1|Sschnei1]] 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --[[User:AbsMechanik|AbsMechanik]] 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--[[User:Mike Preston|Mike Preston]] 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- [[User:Sschnei1|Sschnei1]] 19:27, 9 October 2010 (UTC)

= Essay Preview =

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--[[User:AbsMechanik|AbsMechanik]] 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--[[User:Mike Preston|Mike Preston]] 03:48, 7 October 2010 (UTC)

2 In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be
efficient in adding and removing processes.2 When Linux 2.2 was introduced, the scheduler was changed. It now used the idea
of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was
the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its
predecessors, but it also has more features.2 The running time was O(n) because it iterated over each task during a
scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task
did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute
longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and
did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware
architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the
same amount of time for each task to execute, independent of how big the tasks were.2 It kept track of the tasks in a
running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the
scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of
the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current
scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining
fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor.
When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep
fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual
runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing
and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently.
Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu
are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The
scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted
into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side
of the tree are migrated to the left side to maintain fairness.

2 M. Tim Jones, Consultant Engineer, Emulex

-- [[User:Sschnei1|Sschnei1]] 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.[http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt]

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.[http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726]

Sources:

[1] http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

[2] http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726

-- [[User:abondio2|Austin Bondio]] Last edit: 14:39, 12 October 2010

Talk:COMP 3000 Essay 1 2010 Question 5

2010-10-12T18:16:09Z

Abondio2: /* Essay Preview */

=Discussion=

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

Hi Mike,
Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
 
http://lwn.net/Articles/230574/
 
http://lwn.net/Articles/240474/

[[User:Sschnei1|Sschnei1]] 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:
1. requires a scheduler to be selected at the time the kernel is built.
2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:
1. threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
2. the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
3. once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
4. the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler
- overhaul of Original BSD scheduler to:
1. support symmetric multiprocessing (SMP)
2. support symmetric multithreading (SMT) on multi-core systems
3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
 
Some interesting pieces about the Linux scheduler include:
1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.
 
I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads.
--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/).
--[[User:AbsMechanik|AbsMechanik]] 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research?
--[[User:Sschnei1|Sschnei1]] 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --[[User:AbsMechanik|AbsMechanik]] 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--[[User:Mike Preston|Mike Preston]] 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- [[User:Sschnei1|Sschnei1]] 19:27, 9 October 2010 (UTC)

= Essay Preview =

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--[[User:AbsMechanik|AbsMechanik]] 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--[[User:Mike Preston|Mike Preston]] 03:48, 7 October 2010 (UTC)

2 In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be
efficient in adding and removing processes.2 When Linux 2.2 was introduced, the scheduler was changed. It now used the idea
of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was
the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its
predecessors, but it also has more features.2 The running time was O(n) because it iterated over each task during a
scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task
did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute
longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and
did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware
architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the
same amount of time for each task to execute, independent of how big the tasks were.2 It kept track of the tasks in a
running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the
scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of
the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current
scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining
fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor.
When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep
fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual
runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing
and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently.
Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu
are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The
scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted
into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side
of the tree are migrated to the left side to maintain fairness.

2 M. Tim Jones, Consultant Engineer, Emulex

-- [[User:Sschnei1|Sschnei1]] 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices, which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.[http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt]

Sources:

[1] http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

-- [[User:abondio2|Austin Bondio]] Last edit: 14:07, 12 October 2010

Talk:COMP 3000 Essay 1 2010 Question 5

2010-10-12T18:07:19Z

Abondio2: /* Essay Preview */

=Discussion=

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

Hi Mike,
Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
 
http://lwn.net/Articles/230574/
 
http://lwn.net/Articles/240474/

[[User:Sschnei1|Sschnei1]] 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:
1. requires a scheduler to be selected at the time the kernel is built.
2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:
1. threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
2. the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
3. once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
4. the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler
- overhaul of Original BSD scheduler to:
1. support symmetric multiprocessing (SMP)
2. support symmetric multithreading (SMT) on multi-core systems
3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
 
Some interesting pieces about the Linux scheduler include:
1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.
 
I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads.
--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/).
--[[User:AbsMechanik|AbsMechanik]] 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research?
--[[User:Sschnei1|Sschnei1]] 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --[[User:AbsMechanik|AbsMechanik]] 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--[[User:Mike Preston|Mike Preston]] 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- [[User:Sschnei1|Sschnei1]] 19:27, 9 October 2010 (UTC)

= Essay Preview =

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--[[User:AbsMechanik|AbsMechanik]] 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--[[User:Mike Preston|Mike Preston]] 03:48, 7 October 2010 (UTC)

2 In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be
efficient in adding and removing processes.2 When Linux 2.2 was introduced, the scheduler was changed. It now used the idea
of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was
the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its
predecessors, but it also has more features.2 The running time was O(n) because it iterated over each task during a
scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task
did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute
longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and
did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware
architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the
same amount of time for each task to execute, independent of how big the tasks were.2 It kept track of the tasks in a
running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the
scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of
the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current
scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining
fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor.
When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep
fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual
runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing
and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently.
Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu
are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The
scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted
into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side
of the tree are migrated to the left side to maintain fairness.

2 M. Tim Jones, Consultant Engineer, Emulex

-- [[User:Sschnei1|Sschnei1]] 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices, which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.

Sources:

http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

-- [[User:abondio2|Austin Bondio]] Last edit: 14:07, 12 October 2010

Talk:COMP 3000 Essay 1 2010 Question 5

2010-10-12T15:03:41Z

Abondio2: /* Essay Preview */

=Discussion=

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

Hi Mike,
Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
 
http://lwn.net/Articles/230574/
 
http://lwn.net/Articles/240474/

[[User:Sschnei1|Sschnei1]] 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:
1. requires a scheduler to be selected at the time the kernel is built.
2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:
1. threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
2. the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
3. once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
4. the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler
- overhaul of Original BSD scheduler to:
1. support symmetric multiprocessing (SMP)
2. support symmetric multithreading (SMT) on multi-core systems
3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
 
Some interesting pieces about the Linux scheduler include:
1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.
 
I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads.
--[[User:Mike Preston|Mike Preston]] 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/).
--[[User:AbsMechanik|AbsMechanik]] 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research?
--[[User:Sschnei1|Sschnei1]] 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --[[User:AbsMechanik|AbsMechanik]] 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--[[User:Mike Preston|Mike Preston]] 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- [[User:Sschnei1|Sschnei1]] 19:27, 9 October 2010 (UTC)

= Essay Preview =

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--[[User:AbsMechanik|AbsMechanik]] 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--[[User:Mike Preston|Mike Preston]] 03:48, 7 October 2010 (UTC)

2 In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be
efficient in adding and removing processes.2 When Linux 2.2 was introduced, the scheduler was changed. It now used the idea
of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was
the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its
predecessors, but it also has more features.2 The running time was O(n) because it iterated over each task during a
scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task
did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute
longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and
did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware
architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the
same amount of time for each task to execute, independent of how big the tasks were.2 It kept track of the tasks in a
running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the
scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of
the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current
scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining
fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor.
When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep
fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual
runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing
and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently.
Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu
are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The
scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted
into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side
of the tree are migrated to the left side to maintain fairness.

2 M. Tim Jones, Consultant Engineer, Emulex

-- [[User:Sschnei1|Sschnei1]] 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. The spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices, which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor.

(I'm out for lunch. Don't modify my stuff please. I'll continue when I get back...)

Sources:
[http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt]

-[[User:abondio2|abondio2]]

COMP 3000 Essay 1 2010 Question 5

2010-10-11T17:41:11Z

Abondio2: /* Resources */

=Question=

Compare and contrast the evolution of the default BSD/FreeBSD and Linux schedulers.

=Answer=

=Resources=

I found some resources, which might be useful to answer this question. As far as I know, FreeBSD uses a Multilevel feeback queue and Linux uses in the current version the completly fair scheduler.
 
-Some text about FreeBSD-scheduling http://www.informit.com/articles/article.aspx?p=366888&seqNum=4
 
-ULE Thread Scheduler: http://www.scribd.com/doc/3299978/ULE-Thread-Scheduler-for-FreeBSD
 
-Completly Fair Scheduler: http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt
 
-Brain Fuck Scheduler: http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler
 
-Sebastian

Also found a nice link with regards to the new Linux Scheduler for those interested:
http://www.ibm.com/developerworks/linux/library/l-scheduler/
 It is also referred to as the O(1) scheduler in algorithmic terms (CFS is O(log(n)) scheduler). Both have been in development by Ingo Molnár.
-Abhinav

Some more resources; 
http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html (includes history of Linux scheduler from 1.2 to 2.6) 
http://my.opera.com/blu3c4t/blog/show.dml/1531517 
-Wes 

 
Information on changes to the O(1) scheduler: 
"Linux Kernel Documentation" 
http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt 
 
General information on Linux Job Scheduling: 
"Linux Job Scheduling | Linux Journal" 
http://www.linuxjournal.com/article/4087 
 
Scheduling on multi-core Linux machines: 
"Node affine NUMA scheduler for Linux" 
http://home.arcor.de/efocht/sched/ 
 
More on Linux process scheduling: 
"Understanding the Linux kernel" 
http://oreilly.com/catalog/linuxkernel/chapter/ch10.html 
 
FreeBSD thread scheduling: 
"InformIT: FreeBSD Process Management" 
http://www.informit.com/articles/article.aspx?p=366888&seqNum=4 
- Austin Bondio