COMP 3000 Essay 1 2010 Question 5: Difference between revisions

From Soma-notes
 
(22 intermediate revisions by 5 users not shown)
Line 5: Line 5:
=Answer=
=Answer=


(This work belongs to --[[User:Mike Preston|Mike Preston]] 23:01, 13 October 2010 (UTC))
One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with, all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system (Jensen: 1985).
(modified by --[[User:AbsMechanik|AbsMechanik]] 03:00, 14 October 2010 (UTC))
 


One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system (Jensen: 1985).
There are several different algorithms which are utilized in different schedulers, but a few key algorithms are outlined below[http://joshaas.net/linux/linux_cpu_scheduler.pdf][http://www.sci.csueastbay.edu/~billard/cs4560/node6.html][http://www.articles.assyriancafe.com/documents/CPU_Scheduling.pdf]: <ul>
There are several different algorithms which are utilized in different schedulers, but a few key algorithms are outlined below[http://joshaas.net/linux/linux_cpu_scheduler.pdf][http://www.sci.csueastbay.edu/~billard/cs4560/node6.html][http://www.articles.assyriancafe.com/documents/CPU_Scheduling.pdf]: <ul>


Line 21: Line 20:
<li><b>Multilevel Feedback Queue Scheduling</b>: Rule-based multi-tasking. It is a combination of <b> First-Come, First-Serve</b>, <b>Round-Robin</b> & <b> Fixed-Priority Preemptive Scheduling</b>, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.</li>
<li><b>Multilevel Feedback Queue Scheduling</b>: Rule-based multi-tasking. It is a combination of <b> First-Come, First-Serve</b>, <b>Round-Robin</b> & <b> Fixed-Priority Preemptive Scheduling</b>, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.</li>


</ul> <br>There is no one "best" algorithm and most schedulers utilize a combination of the different algorithms, such as the Multi-Level Feedback Queue which is utilized in Win XP/Vista, Linux 2.5-2.6, FreeBSD, Mac OSX, NetBSD and Solaris. As computer hardware has increases in complexity with the advent of multiple core CPUs (parallelization), schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.
</ul>
 
There is no one "best" algorithm and most schedulers utilize a combination of the different algorithms, such as the Multi-Level Feedback Queue, which in one way or another was utilized in Win XP/Vista, Linux 2.5-2.6, FreeBSD, Mac OSX, NetBSD and Solaris. <br>One thing for certain is that as computer hardware increases in complexity, such as multiple core CPUs (parallelization), and with the advent of more powerful embedded/mobile devices, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers:
<ul>
<li>'''The default BSD/FreeBSD scheduler'''</li>
<li>'''The Linux scheduler'''</li>
</ul>




Line 29: Line 32:
===Overview & History===
===Overview & History===


(This work belongs to --[[User:Mike Preston|Mike Preston]] 23:41, 13 October 2010 (UTC))
The FreeBSD kernel originally inherited its scheduler from 4.3BSD which itself is a version of the UNIX scheduler [http://dspace.hil.unb.ca:8080/bitstream/handle/1882/100/roberson.pdf?sequence=1].
In order to understand the evolution of the FreeBSD scheduler it is important to understand the original purpose and limitations of the BSD scheduler. Like most traditional UNIX based systems, the BSD scheduler was designed to work on a single core computer system (with limited I/O) and handle relatively small numbers of processes. As a result, managing resources with an O(n) scheduler did not raise any performance issues. To ensure fairness, the scheduler would switch between processes every 0.1 second (100 milliseconds) in a round-robin format [http://www.thehackademy.net/madchat/ebooks/sched/FreeBSD/the_FreeBSD_process_scheduler.pdf].


The FreeBSD kernel originally inherited its scheduler from BSD which itself is a version of the UNIX scheduler. In order to understand the evolution of the FreeBSD scheduler it is important to understand the original purpose and limitations of the BSD scheduler. The BSD scheduler was designed to work on a single core computer system and handle relatively small numbers of processes. As a result, managing resources with a scheduler which operates in O(n) time did not raise any performance issues for BSD. To ensure fairness, the scheduler would switch between processes every 0.1 seconds in a round-robin format [https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf].
As computer systems increased in complexity with the advent of multi-core CPUs and various new I/O devices, computer programs, naturally, increased in size and complexity to accommodate and manage the new hardware. With CPUs becoming more powerful (derived from <b>Moore's Law</b> [http://www.intel.com/technology/mooreslaw/]), the time taken to complete a process decreased significantly. This additional complexity highlighted the problem of having an O(n) scheduler for managing processes, as more items were added to the scheduling algorithm, the performance decreased. With symmetric multiprocessing (<b>SMP</b>) becoming inevitable (multi-core CPU's) a better scheduler was required. This was the driving force behind the creation of ULE for the FreeBSD.
 
As computer systems increased in complexity, specifically the addition of multiple processors, computer programs increased in size as well. Although the additional complexity increased what could be accomplished with a computer, it also highlighted the problem of having a O(n) scheduler; as more items are added to the scheduling algorithm, performance decreases. With symmetric multiprocessing becoming inevitable a better scheduler was required. This was the driving force behind the creation of FreeBSD.


===Older Versions===
===Older Versions===


(This work belongs to --[[User:Mike Preston|Mike Preston]] 00:02, 14 October 2010 (UTC))
The FreeBSD kernel originally used an enhanced version of the BSD scheduler. Specifically, the FreeBSD scheduler included classes of threads, which was a drastic change from the round-robin scheduling used in BSD. Initially, there were two types of thread class, real-time and idle [https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf], and the scheduler would give processor time to real-time threads first and the idle threads had to wait until there were no real-time threads that needed access to the processor.


The FreeBSD kernel originally used an enhanced version of the BSD scheduler. Specifically, the FreeBSD scheduler included classes of threads which was a drastic change from the round-robin scheduling used in BSD. Initially, there were two types of thread class, real-time and idle [https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf], and the scheduler would give processor time to real-time threads first and the idle threads had to wait until there were no real-time threads that needed access to the processor.
To manage the various threads, FreeBSD had data structures called runqueues, into which the threads were placed. The scheduler would evaluate the runqueues based on priority from highest to lowest and execute the first thread of a non-empty runqueue it found. Once a non-empty runqueue was found, each thread in the runqueue would be assigned an equal value time slice of 0.1 seconds, a value that has not changed in over 20 years [http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156].  


To manage the various threads, FreeBSD had data structures called runqueues into which the threads were placed. The scheduler would evaluate the runqueues based on priority from highest to lowest and execute the first thread of a non-empty runqueue it found. Once a non-empty runqueue was found, each thread in the runqueue would be assigned an equal value time slice of 0.1 seconds, a value that has not changed in over 20 years [http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156].
Unfortunately, like the BSD scheduler it was based on, the original FreeBSD scheduler was not built to handle Symmetric Multiprocessing (SMP) or Symmetric Multithreading (SMT) on multi-core systems. The scheduler was still limited by an O(n) algorithm, which could not efficiently handle the loads required on ever increasingly powerful systems.  
 
To allow FreeBSD to operate with more modern computer systems, it became clear that a new scheduler would be required, and thus, the ULE scheduler was created.
Unfortunately, like the BSD scheduler it was based on, the origianl FreeBSD scheduler was not built to handle Symmetric Multiprocessing or Symmetric Multithreading on multi-core systems. The sheduler was still limited by a O(n) algorithm which could not efficiently handle the loads required on ever increasingly powerful systems. To allow FreeBSD to operate with more modern computer systems, a new scheduler, the ULE scheduler, was necessary.


===Current Version===
===Current Version===


(This work is owned by --[[User:Mike Preston|Mike Preston]] 00:23, 14 October 2010 (UTC))


In order to effectively manage multi-core computer systems, FreeBSD needed a scheduler with an algorithm which would execute in constant time regardless of the number of threads involved. The ULE scheduler was designed for this purpose. It is of interest to note that throughout the course of the BSD/FreeBSD scheduler evolution, each iteration has just been an improvement on existing scheduler technologies. Although each version was designed to provide support for some current reality of computing, like multi-core systems, the evolution was out of necessity and not due to a desire to re-evaluate how the current version accomplished its tasks.
ULE was first implemented as part of an "experimental" process (by Jeff Roberson) in FreeBSD v5.1, before being added to the FreeBSD v5.3 development cycle. It was designed with modern hardware and requirements in mind and had proper support for Symmetric Multi-Processing (SMP) (and HTT), Symmetric Multi-Threading (SMT) platforms and handle heavy workloads. Primarily being an event-driven scheduler, ULE utilized a double-queue mechanism (borrowed from Linux's <b>O(1) scheduler</b>) for ensuring fairness. This mechanism is briefly outlined as follows[http://dspace.hil.unb.ca:8080/bitstream/handle/1882/100/roberson.pdf?sequence=1]:
<ul>
<li>Process threads are assigned in 2 queues, 'current' and 'next'</li>
<li>Each thread is either assigned to 'current' or 'next'</li>
<li>Process execution first begins in the 'current' queue (priority based)</li>
<li>Once 'current' is empty, the 'next' and 'current' queues are switched and the threads are executed in a similar manner (priority based)</li>
<li>All idle threads are stored in a third queue, 'idle' and is run only when 'current' and 'next' are empty</li>


The "slow" evolution of the FreeBSD scheduler becomes even more evident when comparing it to the Linux scheduler which has evolved through a series of attempts to provide alternative ways to solve scheduling tasks. From dynamic time slices, to various data structure implimentations, and even various ways of describing prioritiy levels (see: "nice" levels), the Linux scheduler advancement has occurred through a series of drastic changes. In comparison, the FreeBSD scheduler has been changed only when the current version was no longer able to meet the needs of the existing computing climate.
</ul>
It has been implemented as the default scheduler since v7.1 onwards. ULE works really well on both single or uni-processor environments as well as multi-core environments. It prevents unnecessary CPU migration, while making good use of CPU resources. However, 2 key practical problems arose due to the double-queue mechanism [http://jeffr-tech.livejournal.com/3729.html]:
<ul>
  <li>Implementing queues meant more memory usage and cache hits</li>
  <li>Having multiple threads on different queues caused problems for the rest of the system</li>
</ul>
This problem was resolved by implementing a circular queue, inspired by the calendar queue data structure. Since there is only one queue, there is little extra cpu overhead. There is more flexibility in deciding how much runtime each thread gets relative to those with a lower priority.
With these design approaches in mind, ULE is also slightly faster on uni-processor systems than the 4.4BSD.


==Linux Schedulers==
==Linux Schedulers==


(Note to the other group members: Feel free to modify or remove anything I post here. I'm just trying to piece together what you've all posted in the discussion section and turn it into a single paragraph. You know. Just to see how it looks.)
-- [[User:abondio2|Austin Bondio]] Last edit: 22:17, 13 October 2010 (UTC)
(Same for me, I'm trying to put together the overview/history and work on the comparison section of the essay, all based off the history you guys give. If I miss anything or get anything wrong, feel free to correct.)
-- [[User:Wlawrenc|Wesley Lawrence]]
(Austin - I added a reference to one of your sections as the current reference only went to wikipedia which the prof has kind of implied is not a good idea, I also added another one that was to a blog post as that was another thing the prof mentioned was not the best idea. I am hoping this will provide additional alidations to the sources.)
--[[User:Mike Preston|Mike Preston]] 00:29, 14 October 2010 (UTC)


===Overview & History===
===Overview & History===


(This work belongs to [[User:Wlawrenc|Wesley Lawrence]])
Unlike its competitor, FreeBSD, the Linux scheduler has undergone several different cycles of evolution [http://www.unix.com/whats-your-mind/110099-unix-40th-birthday.html], always aiming towards having a fair and fast scheduler. Various methods and concepts have been tried over different versions such as round robin, iterations, and queues. A quick read through of the history of Linux implies that firstly, equal and balanced use of the system was the goal of the scheduler (to ensure fairness). Eventually, the speed issue was also tackled to ensure that the run times were not affected.
 
Early schedulers did their best to give processes equal time and resources, but used a bit of extra time (in computer terms) to accomplish this. With the advent of more complicated hardware (multi-core CPU's), Linux also faced new challenges, as experienced by BSD. Various techniques were employed to accommodate for these changes ranging from the inefficient O(n) scheduler to the CFS (current version) [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html].
The Linux scheduler has a large history of improvement, always aiming towards having a fair and fast scheduler. Various methods and concepts have been tried over different versions to get this fair and fast scheduler, including round robins, iteration, and queues. A quick read through of the history of Linux implies that firstly, equal and balanced use of the system was the goal of the scheduler, and once that was in place, speed was soon improved. Early schedulers did their best to give processes equal time and resources, but used a bit of extra time (in computer terms) to accomplish this. By Linux 2.6, after experimenting with different concepts, the scheduler was able to provide fair access and time, as well as run as quickly as possible, with various features to allow personal tweaking by the system user, or even the processes themselves.
 
 
(This work was done by [[User:abondio2|Austin Bondio]], modified by [[User:Sschnei1|Sschnei1]] )
 
The Linux kernel has undergone many changes over the decades since its original release as the UNIX operating system in 1969 [http://www.unix.com/whats-your-mind/110099-unix-40th-birthday.html](Stallings: 2009). The early versions had relatively inefficient schedulers which operated in linear time with respect to the number of tasks to schedule; currently the Linux scheduler is able to operate in constant time, independent of the number of tasks being scheduled.


===Older Versions===
===Older Versions===


(This work belongs to [[User:Sschnei1|Sschnei1]])
In Linux 1.2 a scheduler operated with a <b>round robin policy</b> using a circular queue, allowing the scheduler to be efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed and used the idea of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was the first scheduler that supported SMP.


In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was the first scheduler which supported SMP.
With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its predecessors, but it also provided more features. The running time was O(n) because it iterated over each task during a scheduling event. The scheduler divided tasks into epochs, allowing each task to execute up to its time slice. If a task did not use up its entire time slice, the remaining time was added to the next time slice to allow the task to execute longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability, and did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware architectures, such as multi-core processors.
 
With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a scheduling event. The scheduler divided tasks into epochs, allowing each task to execute up to its time slice. If a task did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware architectures, such as multi-core processors.


===Current Version===
===Current Version===


(This work was done by [[User:Sschnei1|Sschnei1]])
As of the Linux 2.6.23 introduction, the CFS (Completely Fair Scheduler) took its place in the kernel. CFS uses the idea of maintaining fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor. When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual runtime.  
 
As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor. When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual runtime.  
 
The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently. Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side of the tree are migrated to the left side to maintain fairness.
 


(This work was done by [[User:abondio2|Austin Bondio]])
The model of how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently. Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu are stored in the right side of the tree. To keep fairness, the scheduler takes the left-most node from the tree. The scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable, the task then is inserted into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side of the tree are migrated to the left side to maintain fairness.


Under a recent Linux system (version 2.6.35 or later), scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.  
Under a recent Linux system (version 2.6.35 or later), scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.  
Line 100: Line 90:
In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.
In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.


In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.
In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature, which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.


==Tabulated Results==
==Conclusion==


(Once I read/see some history on the BSD section above, I'll do the best comparison I can. I'm balancing 3000/3004 and other courses (like most of you), so I don't think I can research/write BSD and write the comparison, but I will try to help out as much as I can)
A few key differences between the Linux & FreeBSD schedulers are outlined as below:
 
<ul>
-- [[User:Wlawrenc|Wesley Lawrence]]
  <li>ULE runs in O(1) time whereas CFS runs in O(log(n)) time</li>
 
  <li>ULE is based on run-queues whereas CFS utilizes a red-black tree</li>
I've got this. Hopefully most of the sections I created properly answer the question. I'm still going to go over everyone's answers and keep in mind that wikipedia cannot be cited as a resource. --[[User:AbsMechanik|AbsMechanik]] 02:29, 14 October 2010 (UTC)
  <li>Context switching is done much faster by ULE than Linux [http://jeffr-tech.livejournal.com/19139.html]</li>
 
</ul>
==Current Challenges==


Hardware innovation has primarily been the driving force behind the evolution of the schedulers and with the ever increasing complexity of parallel & distributed systems and managing multi-threaded it is clear that even the current schedulers will undergo some radical changes to meet those new challenges. It remains to be seen what each group comes up with to address these growing challenges


== Resources ==
== Resources ==
Line 118: Line 108:


2. Stallings, William, Operating Systems: Internals and Design Principles, Pearson Prentice Hall, 2009.
2. Stallings, William, Operating Systems: Internals and Design Principles, Pearson Prentice Hall, 2009.
3. McKusick, M. K. and Neville-Neil, G. V. 2004. Thread Scheduling in FreeBSD 5.2. Queue 2, 7 (Oct. 2004), 58-64. DOI= http://doi.acm.org/10.1145/1035594.1035622

Latest revision as of 15:33, 8 November 2010

Question

Compare and contrast the evolution of the default BSD/FreeBSD and Linux schedulers.

Answer

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with, all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system (Jensen: 1985).


There are several different algorithms which are utilized in different schedulers, but a few key algorithms are outlined below[1][2][3]:

  • First-Come, First-Serve (also known as FIFO): No multi-tasking. Processes are queued in the order they are called. A process gets full, uninterrupted use of the CPU until it has finished running.
  • Shortest Job First (similar to Shortest Remaining Time and/or Shortest Process Next): Limited multi-tasking. The CPU handles the easiest tasks first, and complex, time-consuming tasks are handled last.
  • Fixed-Priority Preemptive Scheduling: Greater multi-tasking. Processes are assigned priority levels which are independent of their complexity. High-priority processes can be completed quickly, while low-priority processes can take a long time as new, higher-priority processes arrive and interrupt them.
  • Round-Robin Scheduling: Fair multi-tasking. This method is similar in concept to First-Come, First-Serve, but all processes are assigned the same priority level; that is, every running process is given an equal share of CPU time.
  • Multilevel Feedback Queue Scheduling: Rule-based multi-tasking. It is a combination of First-Come, First-Serve, Round-Robin & Fixed-Priority Preemptive Scheduling, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system.

There is no one "best" algorithm and most schedulers utilize a combination of the different algorithms, such as the Multi-Level Feedback Queue, which in one way or another was utilized in Win XP/Vista, Linux 2.5-2.6, FreeBSD, Mac OSX, NetBSD and Solaris.
One thing for certain is that as computer hardware increases in complexity, such as multiple core CPUs (parallelization), and with the advent of more powerful embedded/mobile devices, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers:

  • The default BSD/FreeBSD scheduler
  • The Linux scheduler


BSD/Free BSD Schedulers

Overview & History

The FreeBSD kernel originally inherited its scheduler from 4.3BSD which itself is a version of the UNIX scheduler [4]. In order to understand the evolution of the FreeBSD scheduler it is important to understand the original purpose and limitations of the BSD scheduler. Like most traditional UNIX based systems, the BSD scheduler was designed to work on a single core computer system (with limited I/O) and handle relatively small numbers of processes. As a result, managing resources with an O(n) scheduler did not raise any performance issues. To ensure fairness, the scheduler would switch between processes every 0.1 second (100 milliseconds) in a round-robin format [5].

As computer systems increased in complexity with the advent of multi-core CPUs and various new I/O devices, computer programs, naturally, increased in size and complexity to accommodate and manage the new hardware. With CPUs becoming more powerful (derived from Moore's Law [6]), the time taken to complete a process decreased significantly. This additional complexity highlighted the problem of having an O(n) scheduler for managing processes, as more items were added to the scheduling algorithm, the performance decreased. With symmetric multiprocessing (SMP) becoming inevitable (multi-core CPU's) a better scheduler was required. This was the driving force behind the creation of ULE for the FreeBSD.

Older Versions

The FreeBSD kernel originally used an enhanced version of the BSD scheduler. Specifically, the FreeBSD scheduler included classes of threads, which was a drastic change from the round-robin scheduling used in BSD. Initially, there were two types of thread class, real-time and idle [7], and the scheduler would give processor time to real-time threads first and the idle threads had to wait until there were no real-time threads that needed access to the processor.

To manage the various threads, FreeBSD had data structures called runqueues, into which the threads were placed. The scheduler would evaluate the runqueues based on priority from highest to lowest and execute the first thread of a non-empty runqueue it found. Once a non-empty runqueue was found, each thread in the runqueue would be assigned an equal value time slice of 0.1 seconds, a value that has not changed in over 20 years [8].

Unfortunately, like the BSD scheduler it was based on, the original FreeBSD scheduler was not built to handle Symmetric Multiprocessing (SMP) or Symmetric Multithreading (SMT) on multi-core systems. The scheduler was still limited by an O(n) algorithm, which could not efficiently handle the loads required on ever increasingly powerful systems. To allow FreeBSD to operate with more modern computer systems, it became clear that a new scheduler would be required, and thus, the ULE scheduler was created.

Current Version

ULE was first implemented as part of an "experimental" process (by Jeff Roberson) in FreeBSD v5.1, before being added to the FreeBSD v5.3 development cycle. It was designed with modern hardware and requirements in mind and had proper support for Symmetric Multi-Processing (SMP) (and HTT), Symmetric Multi-Threading (SMT) platforms and handle heavy workloads. Primarily being an event-driven scheduler, ULE utilized a double-queue mechanism (borrowed from Linux's O(1) scheduler) for ensuring fairness. This mechanism is briefly outlined as follows[9]:

  • Process threads are assigned in 2 queues, 'current' and 'next'
  • Each thread is either assigned to 'current' or 'next'
  • Process execution first begins in the 'current' queue (priority based)
  • Once 'current' is empty, the 'next' and 'current' queues are switched and the threads are executed in a similar manner (priority based)
  • All idle threads are stored in a third queue, 'idle' and is run only when 'current' and 'next' are empty

It has been implemented as the default scheduler since v7.1 onwards. ULE works really well on both single or uni-processor environments as well as multi-core environments. It prevents unnecessary CPU migration, while making good use of CPU resources. However, 2 key practical problems arose due to the double-queue mechanism [10]:

  • Implementing queues meant more memory usage and cache hits
  • Having multiple threads on different queues caused problems for the rest of the system

This problem was resolved by implementing a circular queue, inspired by the calendar queue data structure. Since there is only one queue, there is little extra cpu overhead. There is more flexibility in deciding how much runtime each thread gets relative to those with a lower priority. With these design approaches in mind, ULE is also slightly faster on uni-processor systems than the 4.4BSD.

Linux Schedulers

Overview & History

Unlike its competitor, FreeBSD, the Linux scheduler has undergone several different cycles of evolution [11], always aiming towards having a fair and fast scheduler. Various methods and concepts have been tried over different versions such as round robin, iterations, and queues. A quick read through of the history of Linux implies that firstly, equal and balanced use of the system was the goal of the scheduler (to ensure fairness). Eventually, the speed issue was also tackled to ensure that the run times were not affected. Early schedulers did their best to give processes equal time and resources, but used a bit of extra time (in computer terms) to accomplish this. With the advent of more complicated hardware (multi-core CPU's), Linux also faced new challenges, as experienced by BSD. Various techniques were employed to accommodate for these changes ranging from the inefficient O(n) scheduler to the CFS (current version) [12].

Older Versions

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed and used the idea of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was the first scheduler that supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its predecessors, but it also provided more features. The running time was O(n) because it iterated over each task during a scheduling event. The scheduler divided tasks into epochs, allowing each task to execute up to its time slice. If a task did not use up its entire time slice, the remaining time was added to the next time slice to allow the task to execute longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability, and did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware architectures, such as multi-core processors.

Current Version

As of the Linux 2.6.23 introduction, the CFS (Completely Fair Scheduler) took its place in the kernel. CFS uses the idea of maintaining fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor. When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual runtime.

The model of how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently. Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu are stored in the right side of the tree. To keep fairness, the scheduler takes the left-most node from the tree. The scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable, the task then is inserted into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side of the tree are migrated to the left side to maintain fairness.

Under a recent Linux system (version 2.6.35 or later), scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature, which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.

Conclusion

A few key differences between the Linux & FreeBSD schedulers are outlined as below:

  • ULE runs in O(1) time whereas CFS runs in O(log(n)) time
  • ULE is based on run-queues whereas CFS utilizes a red-black tree
  • Context switching is done much faster by ULE than Linux [13]

Hardware innovation has primarily been the driving force behind the evolution of the schedulers and with the ever increasing complexity of parallel & distributed systems and managing multi-threaded it is clear that even the current schedulers will undergo some radical changes to meet those new challenges. It remains to be seen what each group comes up with to address these growing challenges

Resources

1. Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

2. Stallings, William, Operating Systems: Internals and Design Principles, Pearson Prentice Hall, 2009.

3. McKusick, M. K. and Neville-Neil, G. V. 2004. Thread Scheduling in FreeBSD 5.2. Queue 2, 7 (Oct. 2004), 58-64. DOI= http://doi.acm.org/10.1145/1035594.1035622