Resources

I just moved the Resources section to our discussion page --AbsMechanik 18:19, 13 October 2010 (UTC)

I found some resources, which might be useful to answer this question. As far as I know, FreeBSD uses a Multilevel feeback queue and Linux uses in the current version the completly fair scheduler.
-Some text about FreeBSD-scheduling http://www.informit.com/articles/article.aspx?p=366888&seqNum=4
-ULE Thread Scheduler: http://www.scribd.com/doc/3299978/ULE-Thread-Scheduler-for-FreeBSD
-Completly Fair Scheduler: http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt
-Brain Fuck Scheduler: http://en.wikipedia.org/wiki/Brain_Fuck_Scheduler
-Sebastian

Also found a nice link with regards to the new Linux Scheduler for those interested: http://www.ibm.com/developerworks/linux/library/l-scheduler/
It is also referred to as the O(1) scheduler in algorithmic terms (CFS is O(log(n)) scheduler). Both have been in development by Ingo Molnár. -Abhinav

Some more resources;
http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html (includes history of Linux scheduler from 1.2 to 2.6)
http://my.opera.com/blu3c4t/blog/show.dml/1531517
-Wes

Information on changes to the O(1) scheduler:
"Linux Kernel Documentation"
http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

General information on Linux Job Scheduling:
"Linux Job Scheduling | Linux Journal"
http://www.linuxjournal.com/article/4087

Scheduling on multi-core Linux machines:
"Node affine NUMA scheduler for Linux"
http://home.arcor.de/efocht/sched/

More on Linux process scheduling:
"Understanding the Linux kernel"
http://oreilly.com/catalog/linuxkernel/chapter/ch10.html

FreeBSD thread scheduling:
"InformIT: FreeBSD Process Management"
http://www.informit.com/articles/article.aspx?p=366888&seqNum=4
- Austin Bondio

Discussion

From what I have been reading the early versions of the Linux scheduler had a very hard time managing high numbers of tasks at the same time. Although I do not how it ran, the scheduler algorithm operated at O(n) time. As a result as more tasks were added, the scheduler would become slower. In addition to this, a single data structure was used to manage all processors of a system which created a problem with managing cached memory between processors. The Linux 2.6 scheduler was built to resolve the task management issues in O(1), constant, time as well as addressing the multiprocessing issues.

It appears as though BSD also had issues with task management however for BSD this was due to a locking mechanism that only allowed one process at a time to operate in kernel mode. FreeBSD 5 changed this locking mechanism to allow multiple processes the ability to run in kernel mode at the same time advancing the success of symmetric multiprocessing.

--Mike Preston 18:38, 3 October 2010 (UTC)

Hi Mike, Can you give any names for the schedulers you are talking about? I think it is easier to distinguish by names and not by the algorithm. It is just a suggestion!

The O(1) scheduler was replaced in the linux kernel 2.6.23 with the CFS (completly fair scheduler) which runs in O(log n). Also, the schedulers before CFS were based on a Multilevel feedback queue algorithm, which was changed in 2.6.23. It is not based on a queue as most schedulers, but on a red-black-tree to implement a timeline to make future predictions. The aim of CFS is to maximize CPU utilization and maximizing the performance at the same time.

In FreeBSD 5, the ULE Scheduler was introduced but disabled by default in the early versions, which eventually changed later on. ULE has better support for SMP and SMT, thus allowing it to improve overall performance in uniprocessors and multiprocessors. And it has a constant execution time, regardless of the amount of threads.

More information can be found here:
http://lwn.net/Articles/230574/
http://lwn.net/Articles/240474/

Sschnei1 16:33, 3 October 2010 (UTC) or Sebastian

Here is another article which essentially backs up what you are saying Sebastian: http://delivery.acm.org/10.1145/1040000/1035622/p58-mckusick.pdf?key1=1035622&key2=8828216821&coll=GUIDE&dl=GUIDE&CFID=104236685&CFTOKEN=84340156

Here are the highlights from the article:

General FreeBSD knowledge:

     1. requires a scheduler to be selected at the time the kernel is built.
     2. all calls to scheduling code are resolved at compile time...this means that the overhead of indirect function calls for scheduling decisions is eliminated.
     3. kernels up to FreeBSD 5.1 used this scheduler, but from 5.2 onward the ULE scheduler used.

Original FreeBSD Scheduler:

     1.  threads assigned a scheduling priority which determines which 'run queue' the thread is placed in.
     2.  the system scans the run queues in order of highest priority to lowest priority and executes the first thread of the first non-empty run queue it finds.
     3.  once a non-empty queue is found the system spends an equal time slice on each thread in the run queue. This time slice is 0.1 seconds and this value has not changed in over 20 years. A shorter time slice would cause overhead due to switching between threads too often thus reducing productivity.
     4.  the article then provides detailed formulae on how to determine thread priority which is out of our scope for this project.

ULE Scheduler - overhaul of Original BSD scheduler to:

      1. support symmetric multiprocessing (SMP)
      2. support symmetric multithreading (SMT) on multi-core systems
      3. improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

Here is another article which gives some great overview of a bunch of versions/the evolution of different schedulers: https://www.usenix.org/events/bsdcon03/tech/full_papers/roberson/roberson.pdf
Some interesting pieces about the Linux scheduler include:

     1. The Jan 2002 version included O(1) algorithm as well as additions for SMP.
     2. Scheduler uses 2 priority queue arrays to achieve fairness. Does this by giving each thread a time slice and a priority and executes each thread in order of highest priority to lowest. Threads that exhaust their time slice are moved to the exhausted queue and threads with remaining time slices are kept in the active queue.
     3. Time slices are DYNAMIC, larger time slices are given to higher priority tasks, smaller slices to lower priority tasks.

I thought the dynamic time slice piece was of particular interest as you would think this would lead to starvation situations if the priority was high enough on one or multiple threads. --Mike Preston 18:38, 3 October 2010 (UTC)

This is essentially a summarized version of the aforementioned information regarding CFS (http://www.ibm.com/developerworks/linux/library/l-scheduler/). --AbsMechanik 02:32, 4 October 2010 (UTC)

I have seen this website and thought it is useful. Do you think this is enough on research to write an essay or are we going to do some more research? --Sschnei1 09:38, 5 October 2010 (UTC)

I also stumbled upon this website: http://my.opera.com/blu3c4t/blog/show.dml/1531517. It explains a lot of stuff in layman's terms (I had a lot of trouble finding more info on the default BSD scheduler, but this link has some brief description included in it). I think we have enough resources/research done. We should start to formulate these results into an answer now. --AbsMechanik 20:08, 4 October 2010 (UTC)

So I thought I would take a first crack at an intro for our article, please tell me what you think of the following. Note that I have included the resource used as a footnote, the placement of which I indicate with the number 1, and I just tacked the details of the footnote on at the bottom:

See Essay preview section!

--Mike Preston 02:54, 6 October 2010 (UTC)

I added a part to introduce the several schedulers for LINUX. We might need to change the reference, since I got it all from http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

-- Sschnei1 19:27, 9 October 2010 (UTC)

Maybe we should write down our contact emails and names to write down who would like to write what part.

Another suggestion is that someone should read over the text and compare it to the references posted in the "Sources" section and check if someone is doing plagiarism.

Sebastian Schneider - sebastian@gamersblog.ca

Essay Preview

So just a small, quick question. Are we going to follow a certain standard for citing resources (bibliography & footnotes) to maintain consistency, or do we just stick with what Mike's presented?--AbsMechanik 12:53, 7 October 2010 (UTC)

Maybe we should write the essay templates/prototypes here, to keep overview of the discussion part.

Just relocating previous post with suggested intro paragraph:

One of the most difficult problems that operating systems must handle is process management. In order to ensure that a system will run efficiently, processes must be maintained, prioritized, categorized and communicated with all without experiencing critical errors such as race conditions or process starvation. A critical component in the management of such issues is the operating system’s scheduler. The goal of a scheduler is to ensure that all processes of a computer system get access to the system resources they require as efficiently as possible while maintaining fairness for each process, limiting CPU wait times, and maximizing the throughput of the system.1 As computer hardware has increased in complexity, for example multiple core CPUs, schedulers of operating systems have similarly evolved to handle these additional challenges. In this article we will compare and contrast the evolution of two such schedulers; the default BSD/FreeBSD and Linux schedulers.

1 Jensen, Douglas E., C. Douglass Locke and Hideyuki Tokuda, A Time-Driven Scheduling Model for Real-Time Operating Systems, Carnegie-Mellon University, 1985.

--Mike Preston 03:48, 7 October 2010 (UTC)

In Linux 1.2 a scheduler operated with a round robin policy using a circular queue, allowing the scheduler to be efficient in adding and removing processes. When Linux 2.2 was introduced, the scheduler was changed. It now used the idea of scheduling classes, thus allowing it to schedule real-time tasks, non real-time tasks, and non-preemptible tasks. It was the first scheduler which supported SMP.

With the introduction of Linux 2.4, the scheduler was changed again. The scheduler started to be more complex than its predecessors, but it also has more features. The running time was O(n) because it iterated over each task during a scheduling event. The scheduler divided tasks into epochs, allowing each tasks to execute up to its time slice. If a task did not use up all of its time slice, the remaining time was added to the next time slice to allow the task to execute longer in its next epoch. The scheduler simply iterated over all tasks, which made it inefficient, low in scalability and did not have a useful support for real-time systems. On top of that, it did not have features to exploit new hardware architectures, such as multi-core processors.

Linux-2.6 introduced another scheduler up to Linux 2.6.23. Before Linux 2.6.23 an O(1) scheduler was used. It needed the same amount of time for each task to execute, independent of how big the tasks were.It kept track of the tasks in a running queue. The scheduler offered much more scalability. To determine if a task was I/O bound or processor bound the scheduler used interactive metrics with numerous heuristics. Because the code was difficult to manage and the most part of the code was to calculate heuristics, it was replaced in Linux 2.6.23 with the CFS scheduler, which is the current scheduler in the actual Linux versions.

As of the Linux 2.6.23 introduction the CFS scheduler took its place in the kernel. CFS uses the idea of maintaining fairness in providing processor time to tasks, which means each tasks gets a fair amount of time to run on the processor. When the time task is out of balance, it means the tasks has to be given more time because the scheduler has to keep fairness. To determine the balance, the CFS maintains the amount of time given to a task, which is called a virtual runtime.

The model how the CFS executes has changed, too. The scheduler now runs a time-ordered red-black tree. It is self-balancing and runs in O(log n) where n is the amount of nodes in the tree, allowing the scheduler to add and erase tasks efficiently. Tasks with the most need of processor are stored in the left side of the tree. Therefore, tasks with a lower need of cpu are stored in the right side of the tree. To keep fairness the scheduler takes the left most node from the tree. The scheduler then accounts execution time at the CPU and adds it to the virtual runtime. If runnable the task then is inserted into the red-black tree. This means tasks on the left side are given time to execute, while the contents on the right side of the tree are migrated to the left side to maintain fairness. [1]

-- Sschnei1 19:26, 9 October 2010 (UTC)

I've started writing a bit about the Linux O(1) scheduler:

Under a Linux system, scheduling can be handled manually by the user by assigning programs different priority levels, called "nice levels." Put simply, the higher a program's nice level is, the nicer it will be about sharing system resources. A program with a lower nice level will be more greedy, and a program with a higher nice level will more readily give up its CPU time to other, more important programs. This spectrum is not linear; programs with high negative nice levels run significantly faster than those with high positive nice levels. The Linux scheduler accomplishes this by sharing CPU usage in terms of time slices (also called quanta), which refer to the length of time a program can use the CPU before being forced to give it up. High-priority programs get much larger time slices, allowing them to use the CPU more often and for longer periods of time than programs with lower priority. Users can adjust the niceness of a program using the shell command nice( ). Nice values can range from -20 to +19.

In previous versions of Linux, the scheduler was dependent on the clock speed of the processor. While this dependency was an effective way of dividing up time slices, it made it impossible for the Linux developers to fine-tune their scheduler to perfection. In recent releases, specific nice levels are assigned fixed-size time slices instead. This keeps nice programs from trying to muscle in on the CPU time of less nice programs, and also stops the less nice programs from stealing more time than they deserve.[2]

In addition to this fixed style of time slice allocation, Linux schedulers also have a more dynamic feature which causes them to monitor all active programs. If a program has been waiting an abnormally long time to use the processor, it will be given a temporary increase in priority to compensate. Similarly, if a program has been hogging CPU time, it will temporarily be given a lower priority rating.[3]

Here's something I put into the Linux: Overview section:
I (Sschnei1) added some text to the Round-Robin Scheduling and the Multilevel Queue Scheduling.

The Linux kernel has undergone many changes over the decades since its original release as the UNIX operating system in 1969.[4] The early versions had relatively inefficient schedulers which operated in linear time with respect to the number of tasks to schedule; currently the Linux scheduler is able to operate in constant time, independent of the number of tasks being scheduled.

There are five basic algorithms for allocating CPU time[5]:

First-in, First-out: No multi-tasking. Processes are queued in the order they are called. A process gets full, uninterrupted use of the CPU until it has finished running.
Shortest Time Remaining: Limited multi-tasking. The CPU handles the easiest tasks first, and complex, time-consuming tasks are handled last.
Fixed-Priority Preemptive Scheduling: Greater multi-tasking. Processes are assigned priority levels which are independent of their complexity. High-priority processes can be completed quickly, while low-priority processes can take a long time as new, higher-priority processes arrive and interrupt them.
Round-Robin Scheduling: Fair multi-tasking. This method is similar in concept to Fixed-Priority Preemptive Scheduling, but all processes are assigned the same priority level; that is, every running process is given an equal share of CPU time. The Round-Robin Scheduling is used in Linux-1.2.
Multilevel Queue Scheduling: Rule-based multi-taksing. This method is also similar to Fixed-Priority Preemptive Scheduling, but processes are associated with groups that help determine how high their priorities are. For example, all I/O tasks get low priority since much time is spent waiting for the user to interact with the system. The O(1) algorithm in 2.6 up to 2.6.23 is based on a Multilevel Queue.

-- Austin Bondio Last edit: 22:27, 13 October 2010 (UTC)

-- Sschnei1 00:24, 14 October 2010 (UTC)

I'm writing on a contrast of the CFS scheduler right now, please don't edit it.

In contrast the the O(1) scheduler, CFS realizes the model of a scheduler which can execute precise on real multitasking on real hardware. Precise multitasking means that each process can run at equal speed. If 4 processes are running at the same time, CFS assigns 25% of the CPU time to each process. On real hardware, only one task can be executed at a time and other tasks have to wait, which gives the running tasks an unfair amount of CPU time.

To avoid an unfair balance over the processes, CFS has a wait run-time for each process. CFS tries to pick the process with the highest wait run-time value. To provide a real multitasking, CFS splits up the CPU time between running processes. This allows multiple processes to parallel on a single CPU.

Processes are not stored in a run queue, such in the O(1) scheduler, but in a self-balancing red-black tree, where self-balancing means that the task with the highest need for CPU time is stored in the most left node. Tasks with a lower need for CPU time are stored on the right side of the Tree, where tasks with a higher need for CPU time are stored on the left side. The task on the left side is picked by the scheduler and put in a virtual runtime. If the process is ready to run, it is given CPU time to run. The tree re-balances itself and new tasks can be taken out by the CPU.

CFS is designed in a way that it does not need to do timeslicing on the CPU, and still provide most performance with as much CPU utilization. This is due to the nanosecond granularity, which removes the need for jiffies or other HZ details. [6]

-- Sschnei1 16:32, 13 October 2010 (UTC)

Hey guys, sorry I've been non-existent for the past little bit, here's what I've done so far. I've been going through stuff on the 4BSD and ULE schedulers, here's what I have so far:

In order for FreeBSD to function, it requires a scheduler to be selected at the time the kernel is built. Also, all calls to scheduling code are resolved at compile time, meaning that the overhead of indirect function calls for scheduling decisions is eliminated.

[3] The 4BSD scheduler was a general-purpose scheduler. Its primary goal was to balance threads’ different scheduling requirements. FreeBSD's time-share-scheduling algorithm is based on multilevel feedback queues. The system adjusts the priority of a thread dynamically to reflect resource requirements and the amount consumed by the thread. Based on the thread's priority, it gets moved between run queues. When a new thread attains a higher priority than the currently running one, the system immediately switches to the new thread, if it's in user mode. Otherwise, the system switches as soon as the current thread leaves the kernel. The system scans the run queues in order of highest to lowest priority, and executes the first thread of the first non-empty run queue it finds. The system tailors it's short-term scheduling algorithm to favor user-interactive jobs by raising the priority of threads waiting for I/O for one or more seconds, and by lowering the priority of threads that hog up significant amounts of CPU time.

[1] In older BSD systems, (and I mean old, as in 20 or so years ago), a 1 second quantum was used for the round-robin scheduling algorithm. Later, in BSD 4.2, it did rescheduling every 0.1 seconds, and priority re-computation every second, and these values haven’t changed since. Round-robin scheduling is done by a timeout mechanism, which informs the clock interrupt driver to call a certain system routine after a specified interval. The subroutine to be called, in this case, causes the rescheduling and then resubmits a timeout to call itself again 0.1 sec later. The priority re-computation is also timed by a subroutine that resubmits a timeout for itself.

The ULE Scheduler was first introduced in FreeBSD 5, however disabled by default in favor of the default 4BSD scheduler. It was not until FreeBSD 7.1 that the ULE scheduler became the new default. The ULE scheduler was an overhaul of the original scheduler, and allowed it support for symmetric multiprocessing (SMP), support for symmetric multithreading (SMT) on multi-core systems, and improve the scheduler algorithm to ensure execution is no longer limited by the number of threads in the system.

The ULE has a constant execution time of O(1), regardless of the number of threads. In addition, it also is careful to identify interactive tasks and give them the lowest latency response possible. The scheduling components include several queues, two CPU load-balancing algorithms, an interactivity scorer, a CPU usage estimator, a priority calculator, and a slice calculator.

Since ULE is an event driven scheduler, there is no periodic timer that adjusts thread priority to prevent starvation. Fairness is implemented by maintaining two queues: the current and the next queue. Each thread that is granted a CPU slice is assigned to either the current or next queue. Threads are then picked from the current queue in order of priority until it is empty, which is when the next and current queues are switched, and the process begins again. This guarantees that each thread will be given use of its slice once every two queue cycles, regardless of priority. Interrupt and real-time threads (and threads with these priority levels) are inserted onto the current queue, for they are of the highest priority. There is also an idle class of threads, which is checked only when there are no other runnable tasks.

In order to promptly discover when a thread changes from interactive to non-interactive, the ULE scheduler uses Interactivity Scoring, a key part of the scheduler affecting the responsiveness of the system, and subsequently, user experience. An interactivity score is computed from the relationship between run and sleep time, using a formula that is out of scope for this project. Interactive threads usually have high sleep times because they are often waiting for user input. This usually is followed by quick bursts in CPU activity from processing the user's request.

The ULE also implements a priority calculator, not to maintain fairness, but only to order the threads by priority. Only time-sharing threads use the priority calculator, the rest are allotted statically. ULE uses the interactivity score to determine the nice value of a thread, which will allow it to in turn decide upon its priority.

The way that the ULE uses its nice values in combination with slices is by implementing a moving window of nice values allowed slices. The threads within the window are given a slice oppositely proportional to the difference between their nice value and the lowest recorded nice value. This results in smaller slices to nicer threads, which subsequently defines their amount of allotted CPU time. On x86, FreeBSD allows for a minimum slice value of 10ms, and a maximum of 140ms. Interactive tasks receive the smallest nice value, in order to more promptly find that the interactive task is no longer interacting.

The ULE uses a CPU usage estimator to show roughly how much CPU a thread is using. It operates on an event driven basis. ULE keeps track of the number of clock ticks that occurred within a sliding window of the thread's execution time. The window slides to upwards of one second past threshold, and then back down to regulate the ratio of run time to sleep time.

The ULE enables SMP (Symmetric Multiprocessing) in order to greater achieve CPU affinity, which is when you schedule threads onto the last CPU they were run on, as to avoid unnecessary CPU migrations. Processors typically have large caches to aid the performance of threads and processes. CPU affinity is key because the thread may still have leftover data in the caches of the previous CPU, and when a thread migrates, it not only has to load this data into the new CPU's cache, but must also clear the data from the previous CPU's cache. ULE has two methods for CPU load balancing: pull and push. Pull is when an idle CPU grabs a thread from a non-idle CPU to lend a hand. Push is when a periodic task evaluates the current load situation of the CPUs and balances it out amongst them. These two function side by side, and allow for an optimally balanced CPU workload.

SMT (Symmetric Multithreading), a concept of non-uniform processors, is not fully present in ULE. The foundations are there, however, which can eventually be extended to support NUMA (Non-Uniform Memory Architecture). This involves expressing the penalties of CPU migration through separate queues, which could be extended to add a local and global load-balancing policy. As far as my sources go, FreeBSD does not at this point support NUMA, however the groundwork is there, and it is a real possibility for it to appear in a future version.

1 = http://www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node46.html 2 = http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc 3 = McKusick, M. K. and Neville-Neil, G. V. 2004. Thread Scheduling in FreeBSD 5.2. Queue 2, 7 (Oct. 2004), 58-64. DOI= http://doi.acm.org/10.1145/1035594.1035622

Notes: Lots of this is just paraphrasing stuff you guys said in the discussion section. In terms of citations, should it be a superscripted citation next to the fact snippet we used, or should it just be a list of sources at the bottom?

--CFaibish 23:27, 13 October 2010 (UTC)

I would agree with putting superscripted citations that refer to the Sources section? How do they do it in the wikipedia? -- Sschnei1 18:52, 13 October 2010 (UTC)

Superscripted citations seems to be the best way to do it. If we cite URLs throughout the essay, it will be much harder to read. To put in a superscripted citation, enclose the URL of your source in square brackets.

Also, who here is actually good at writing, and can compile all these paragraphs into one nice essay for us? I think we have enough raw information here, it's just a matter of putting it all together now.

-- Austin Bondio 20:39, 13 October 2010 (UTC)

Abhinav is putting something together right now on the main page.

-- Sschnei1 20:56, 13 October 2010 (UTC)

Hi, here's a little forward on schedulers in relation to types of threads I've composed based off of one of my sources, I'm not sure if its necessary since there is one Mike typed above, but here it just for you guys to examine:

Threads that perform a lot of I/O require a fast response time to keep input and output devices busy, but need little CPU time. On the other hand, compute-bound threads need to receive a lot of CPU time to finish their work, but have no requirement for fast response time. Other threads lie somewhere in between, with periods of I/O punctuated by periods of computation, and thus have requirements that vary over time. A well-designed scheduler should be able accommodate threads with all these requirements simultaneously.

Also: as Mike said earlier about BSD's issue with locking mechanisms, should I go into greater detail about that, or just include a little, few sentence description of the issue? I've found a source for what I think is what he was referring to: http://security.freebsd.org/advisories/FreeBSD-EN-10:02.sched_ule.asc

I'll be posting more of what I've got on the BSD stuff under the hour.

--CFaibish 22:59, 13 October 2010 (UTC)

I'm reading through what's up there now looking for typos/rewording certain aspects [mostly typos tho, this is very well written]. Should I just go and edit it? Or should I compose a mirrored version and go from there?

--CFaibish 14:42, 14 October 2010 (UTC)

Since their just minimal fixes like commas, pluralization, and apostrophes, I'll just go ahead and edit them in-essay. I've saved a backup as well in case you guys feel I ruined anything.

--CFaibish 14:52, 14 October 2010 (UTC)

Sources

[1] http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/index.html

[2] http://www.mjmwired.net/kernel/Documentation/scheduler/sched-nice-design.txt

[3] http://oreilly.com/catalog/linuxkernel/chapter/ch10.html#94726

[4] http://people.redhat.com/mingo/cfs-scheduler/sched-design-CFS.txt