Soma-notes - User contributions [en]

COMP 3000 Essay 2 2010 Question 3

2010-11-24T22:48:19Z

Sfangche:

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode swtich which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation lookaside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpetedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendancy for the same set of data to be accessed repeatedly over a brief time period. There are two imprtant forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes.[1] System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them. 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by dispairity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

== Related Work: ==

===System Call Batching===

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

===Locality of Execution and Multicores===

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

===Non-blocking Execution===

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

COMP 3000 Essay 2 2010 Question 3

2010-11-24T22:47:21Z

Sfangche:

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode swtich which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation lookaside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpetedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendancy for the same set of data to be accessed repeatedly over a brief time period. There are two imprtant forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes.[1] System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them. 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by dispairity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

== Related Work: ==

==System Call Batching==

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

==Locality of Execution and Multicores==

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

==Non-blocking Execution==

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

COMP 3000 Essay 2 2010 Question 3

2010-11-24T22:46:50Z

Sfangche: /* Related Work: */

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode swtich which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation lookaside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpetedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendancy for the same set of data to be accessed repeatedly over a brief time period. There are two imprtant forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes.[1] System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them. 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by dispairity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

== Related Work: ==

==System Call Batching==

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

==Locality of Execution and Multicores==

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

==Non-blocking Execution==

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

==System Call Batching==

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

==Locality of Execution and Multicores==

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

==Non-blocking Execution==

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking, and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

COMP 3000 Essay 2 2010 Question 3

2010-11-24T22:46:18Z

Sfangche:

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode swtich which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation lookaside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpetedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendancy for the same set of data to be accessed repeatedly over a brief time period. There are two imprtant forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes.[1] System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them. 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by dispairity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

== Related Work: ==

==System Call Batching==

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

==Locality of Execution and Multicores==

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

==Non-blocking Execution==

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking, and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

Talk:COMP 3000 Essay 2 2010 Question 3

2010-11-24T22:41:33Z

Sfangche: /* Who is working on what ? */

=Group 3 Essay=

Hello everyone, please post your contact information here:

Ben Robson [mailto:brobson@connect.carleton.ca brobson@connect.carleton.ca]

Rey Arteaga: rarteaga@connect.carleton.ca

Corey Faibish: [mailto:corey.faibish@gmail.com corey.faibish@gmail.com]

Tawfic Abdul-Fatah: [mailto:tfatah@gmail.com tfatah@gmail.com]

Fangchen Sun: [mailto:sfangche@connect.carleton.ca sfangche@connect.carleton.ca]

Mike Preston: [mailto:michaelapreston@gmail.com michaelapreston@gmail.com]

Wesley L. Lawrence: [mailto:wlawrenc@connect.carleton.ca wlawrenc@connect.carleton.ca]

Can't access the video without a login as we found out in class, but you can listen to the speech and follow with the slides pretty easily, I just went through it and it's not too bad. Rarteaga

==Question 3 Group==
*Abdul-Fatah Tawfic tafatah
*Arteaga Reynaldo rarteaga
*Faibish Corey cfaibish
*Lawrence Wesley wlawrenc
*Preston Mike mpreston
*Robson Benjamin brobson
*Sun Fangchen sfangche

==Who is working on what ?==
Just to keep track of who's doing what --[[User:Tafatah|Tafatah]] 01:37, 15 November 2010 (UTC)

Hey everyone, I have taken the liberty of trying to provide a good first start on our paper. I have provided many resources and filled in information for all of the sections. This is not complete, but it should make the rest of the work a lot easier. Please go through and add in pieces that I am missing (specifically in the Critique section) and then we can put this essay to bed. Also, please note that below I have included my notes on the paper so that if anyone feels they do not have time to read the paper, they can read my notes instead and still find additional materials to contribute with.
--[[User:Mike Preston|Mike Preston]] 18:22, 20 November 2010 (UTC)

Man, Mike: you did a nice job! I'm reading through it now very thorough :) Since you pretty much turned all of your bulleted points from the discussion page into that on the main page, what else needs to be done? Just expanding on each topic and sub-topic? Or are there untouched concepts/topics that we should be addressing?
Oh and question two: Should we turn the Q&A from the end of the video of the presentation into information for the ''Critique'' section?
--[[User:CFaibish|CFaibish]] 20:34, 22 November 2010 (UTC)

Mike, thnx for the great job! I basically finished the part of related work based on your draft.
--[[User:sfangchen|Fangchen Sun]] 17:40, 24 November 2010 (UTC)

==Paper Summary==
I am not sure if everyone has taken the time to examine the paper closely, so I thought I would provide my notes on the paper so that anyone who has not read it could have a view of the high points.

Abstract:
- System calls are the accepted way to request services from the OS kernel, historical implementation.
- System calls almost always synchronous
- Aim to demonstrate how synchronous system calls negatively affect performance due mainly to pipeline flushing and pollution of key processor structures (TLB, data and instruction caches, etc.)
o TLB is translation lookaside buffer which is uses pages (data and code pages) to speed up virtual translation speed.
- Propose exception-less system calls to improve the current system call process.
o Improve processor efficiency via enabling flexible scheduling of OS work which in turn reduces size of execution both in kernel and user space thus reducing pollution effects on processor structures.
- Exception-less system calls especially effective on multi-core systems running multi-threaded applications.
- FlexSC is an implementation of exception-less system calls in the Linux kernel with accompanying user-mode threads from FlexSC-Threads package.
o Flex-SC-Threads convert legacy system calls into exception-less system calls.
Introduction:
- Synchronous system calls have a negative impact on system performance due to:
o Direct costs – mode switching
o Indirect costs – pollution of important processor structures
- Traditional system calls:
o Involve writing arguments to appropriate registers as well as issuing a special machine instruction which raises a synchronous exception.
o A processor exception is used to communicate with the kernel.
o Synchronous execution is enforced as the application expects the completion of the system call before user-mode execution resumes.
- Moore’s Law has provided large increases to performance potential of software while at the same time widening the gap between the performance of efficient and inefficient software.
o This gap is mainly caused by disparity of accessing different processor resources (registers, caches, memory)
- Server and system-intensive workloads are known to perform well below processor potential throughput.
o These are the items the researchers are mostly interested in.
o The cause is often described as due to the lack of locality.
o The researchers state this lack of locality is in part a result of the current synchronous system calls.
- When a synchronous system call, like pwrite, is called, the instruction per cycle level drops significantly and it takes many (in the example 14,000) cycles of execution for the instruction per cycle rate
to return to the level it was at before the system (pwrite) call.
- Exception-less System Call:
o Request for kernel services that does not require the use of synchronous processor exceptions.
o System calls are written to a reserved syscall page.
o Execution of system calls is performed asynchronously by special kernel level syscall threads. The result of the execution is stored on the syscall page after execution.
- By separating system call execution from system call invocation, the system can now have flexible system call scheduling.
o This allows system calls to be executed in batches, increasing the temporal locality of execution.
o Also provides a way to execute system calls on a separate core, in parallel to user-mode thread execution. This provides spatial per-core locality.
o An additional side effect is that now a multi-core system can have individual cores designated to run either user-mode or kernel mode execution dynamically depending on the current system load.
- In order to implement the exception-less system calls, the research team suggests adding a new M-on-N threading package.
o M user-mode threads executing on N kernel-visible threads.
o This would allow the threading package to harvest independent system calls, by switching threads, in user-mode, whenever a thread invokes a system call.
The (Real) Cost of System Calls
- Traditional way to measure the performance cost of system calls is the mode switch time. This is the time necessary to execute the system call instruction in user-mode, resume execution in kernel mode and
then return execution back to the user-mode.
- Mode switch in modern processors is a processor exception.
o Flush the user-mode pipeline, save registers onto the kernel stack, change the protection domain and redirect execution to the proper exception handler.
- Another measure of the performance of a system call is the state pollution caused by the system call.
o State pollution is the measure of how much user-mode data is overwritten in places like the TLB, cache (L1, L2, L3), branch prediction tables with kernel leel execution instructions for the system call.
o This data must be re-populated upon the return to user-mode.
- Potentially the most significant measure of cost of system calls is the performance impact on a running application.
o Ideally, user-mode instructions per cycle should not decrease as a result of a system call.
o Synchronous system calls do cause a drop in user-mode IPC due to; direct overhead - the processor exception associated with the system call which flushes the processor pipeline; and indirect overhead
– system call pollution on processors structures.
Exception-less System calls:
- System call batching
o By delaying a series of system calls and executing them in batches you can minimize the frequency of mode switches between user and kernel mode.
o Improves both the direct and indirect cost of system calls.
- Core specialization
o A system call can be scheduled on a different core then the core on which it was invoked, only for exception-less system calls.
o Provides ability to designate a core to run all system calls.
- Exception-less Syscall Interface
o Set of memory pages shared between user and kernel modes. Referred to as Syscall pages.
o User-space threads find a free entry in a syscall page and place a request for a system call. The user-space thread can then continue executing without interruption and must then return to the syscall
page to get the return value from the system call.
o Neither issuing the system call (via the syscall page) nor getting the return value generate an exception.
- Syscall pages
o Each page is a table of syscall entries.
o Each syscall entre has a state:
 Free – means a syscall can be added her
 Submitted – means the kernel can proceed to invoke the appropriate system call operations.
 Done – means the kernel is finished and has provided the return value to the syscall entry. User space thread must return and get the return value from the page.
- Decoupling Execution from Invocation
o To separate these two concepts a special kernel thread, syscall thread, is used.
o Sole purpose is to pull requests from syscall pages and execute them always in kernel mode.
o Syscall threads provide the ability to schedule the system calls on specific cores.
System Calls Galore – FlexSC-Threads
- Programming for exception-less system calls requires a different and more complex way of interacting with the kernel for OS functionality.
o The researchers describe working with exception-less system calls as being similar to event-driven programming in that you do not get the same sequential execution of code as you do with synchronous
system calls.
o In event-driven servers, the researchers suggest using a hybrid of both exception-less system calls (for performance critical paths) and regular synchronous system calls (for less critical system calls).
FlexSC-Threads
- Threading package which transforms synchronous system calls into exception-less system calls.
- Intended use is with server-type applications with which have many user-mode threads (like Apache or MySQL).
- Compatible with both POSIX threads and the default Linux thread library.
o As a result, multi-threaded Linux programs are immediately compatible with FlexSC threads without modification.
- For multi-core systems, a single kernel level thread is created for each core of the system. Multiple user-mode threads are multiplexed onto each kernel level thread via interactions with the syscall pages.
o The syscall pages are private to each kernel level thread, this means each core of a system has a syscall page from which it will receive system calls.
Overhead:
- When running a single exception-less system call against a single synchronous system call, the exception-less call was slower.
- When running a batch of exception-less system calls compared to a bunch of synchronous system calls, the exception-less system calls were much faster.
- The same is true for a remote server situation, one synchronous call is much faster than one exception-less system call but a batch of exception-less system calls is faster than the same number
of synchronous system calls.
Related Work:
- System Call Batching
o Operating systems have a concept called multi-calls which involves collecting multiple system calls and submitting them as a single system call.
o The Cassyopia compiler has an additional process called a looped multi-call where the result of one system call can be fed as an argument to another system call in the same multi-call.
o Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls do.
 Multi-call system calls are executed sequentially, each one must complete before the next may start.
- Locality of Execution and Multicores
o Other techniques include Soft Timers and Lazy Receiver Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to
limit processor interference associated with interrupt handling without affecting the latency of servicing requests.
o Computation Spreading is another locality process which is similar to FlexSC.
 Processor modifications that allow hardware migration of threads and migration to specialized cores.
 Did not model TLBs and on current hardware synchronous thread migration is a costly interprocessor interrupt.
o Also have proposals for dedicating CPU cores to specific operating system functionality.
 These solutions require a microkernel system.
 Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically.
- Non-blocking Execution
o Past research on improving system call performance has focused on blocking versus non-blocking behaviour.
 Typically researchers used threading, event-based (non-blocking) and hybrid systems to obtain high performance on server applications.
o Main difference between past research and FlexSC is that none of the past proposals have decoupled system call execution from system call invocation.
--[[User:Mike Preston|Mike Preston]] 04:03, 20 November 2010 (UTC)

COMP 3000 Essay 2 2010 Question 3

2010-11-24T22:38:49Z

Sfangche:

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode swtich which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation lookaside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpetedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendancy for the same set of data to be accessed repeatedly over a brief time period. There are two imprtant forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes.[1] System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them. 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by dispairity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

== Related Work: ==

==System Call Batching==

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

==Locality of Execution and Multicores==

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

==Non-blocking Execution==

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

Talk:COMP 3000 Essay 2 2010 Question 3

2010-11-15T16:59:53Z

Sfangche: /* Group 3 Essay */

Talk:COMP 3000 Essay 2 2010 Question 3

2010-11-15T16:59:24Z

Sfangche: /* Group 3 Essay */

COMP 3000 Essay 1 2010 Question 6

2010-10-15T04:08:03Z

Sfangche: /* References */

=Question=

What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?

=Answer=

=Overview=

A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result might be lead to unpredictable results depending on the exact timing of those processes. Consequently a major system failure can occur.

=Introduction=

Race conditions are notorious in the history of software bugs. Examples range from a section of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. All of the system failures due to race conditions have common patterns and are caused by inadequate management of shared memory.

During development of these systems, programmers do not realize that their designs incorporate a race condition until they occur. They are unexpected, infrequent, and the specific failure conditions are difficult to duplicate. Therefore the origin of the failure may take weeks up to years to discover. This is also dependent on the complexity of the system. A lack of testing before deployment may also be responsible.

Race conditions occasionally reoccur in the same software. An example of this is when the race condition is mistaken as another problem. Another example is when a system contains multiple race conditions. Programming languages where memory management is an important aspect of development, such as Assembly and C/C++, are also common to all of the systems.

In this article, we will examine the most well known cases involving race conditions. For each of the cases we will explain why the race condition occurred, its significance and the aftermath of the failure.

=Examples=
== Therac-25 ==

The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines.
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.

===Main Subroutines===

The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.

#Reset
#Datent
##Magnet
#Set Up Done
#Set Up Test
#Patient Treatment
#Pause Treatment
#Terminate Treatment
#Date, Time, ID Changes

The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.

The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.

===Example Bug Situation===

The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.

#Operator types up data, presses return
#(Magnet subroutine is initiated)
#Operator realizes there is an extra 0 in the radiation intensity field
#Operator quickly moves cursor up and fixes the error and presses return again.
#Magnets are set to previous power level .subroutine returns
#Program moves on to next subroutine without registering changes
#Patient is administered a lethal overdose of radiation

===Root Causes & Outcomes===

A number of factors contributed to the failure of the Therac-25. The code was put together by a single programmer and no proper testing was conducted. In addition, code was reused from previous generation machines without verifying it was fully compatible with the new hardware. Previous Therac-6 and Therac-20 had hardware interrupts which prevent race conditions from occurring. It is clear that proper planning and forethought could have prevented this incident.

Six incidents involving the Therac-25 took place over the span 1985 and 1987. It took 2 years until the FDA took the machines out of service. The FDA forced AECL to make modifications to the Therac-25 before it was allowed back on the market. Software bugs were fixed to suspend all other operations while the magnets positioned themselves to administer the correct radiation strength. In addition, a dead mans switch was added the switch was a foot pedal which the operator must hold down to enable motion of the x-ray machine. This prevented the operator of being unaware of changes in the x-ray machines state.

After these changes were made the Therac-25 was reintroduced into the market in 1988. Some of the machines are still in service today.

== Black-out of 2003 ==

An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.

The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy's control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.

FirstEnergy at the time was using General Eletric's Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.

===Cause of Race Condition===

The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.

===Aftermath===
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy's part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.

== The NASA Mars-Rover ==
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol.

===Hardware design and architecture===
The vehicle's main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM).

===Software design===
The Rover is controlled by a VxWorks real-time operating system. The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly.
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&A) and data products.

===System failures and vulnerabilities===
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that,
the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same
piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to
the Rover being in a halted state for a few days. In efforts to keep the Rover functioning, the NASA team attempted to avoid the problem by restricting another module from operating during that time-frame, allowing enough time for
the IM process to carry on its task. However, the NASA team were aware of the fact that the bug could actually resurface again. And it actually did later on in the Spirit Rover Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.

A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover.

===Aftermath and current status===
While those race conditions errors were clearly due to a lack of memory management and proper co-ordination among processes, they were largely unexpected and unforeseen. In contrast to the other cases mentioned so far, the consequences that the NASA team had to deal with weren't life threatening. So it seems that their main concern was to keep the Rovers functioning in order to obtain as much information as possible. No effort was even made to alter the software. Also, one could imagine that the task of examining and debugging those errors was quite a challenge, since they couldn't deal with the Rovers physically, rather everything was done via transmission and messages. Another thing to note is the fact that the single CPU used in those Rovers had a lot to deal with beside the usual software implementation. Had NASA considered the possibility of implementing a multiple CPU design, it could have made a difference.

The Spirit Rover has experienced a number of problems since then. Most recent reports revealed that the Rover has been largely inactive, with no data being received from the Rover. The Opportunity Rover on the other hand continues to function successfully.

==Windows Blue-Screens-Of-Death==

When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).

The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process.

The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.

=Conclusions=
The main challenge with race condition errors is that they're usually unpredictable and can be triggered in various ways depending on the processes involved, the implementation of software, the hardware design and the surrounding environment. However, the human element plays a huge part here as well, as far as applying the required amount of testing, anticipating certain schemes and coming up with different situations where an error might occur.

A handful of commercial software tools have been developed to address and detect race conditions errors. Most recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.

As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world.

=References=
* Nancy Leveson. July 1993. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25]
* Nancy Leveson and Clark Turner. July 1993. [http://www.stanford.edu/class/cs240/readings/therac-25.pdf An Investigation of the Therac-25 Accidents]
* Anne Marie Porrello. July 1993. [http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/THERAC25.html Death and Denial: The Failure of the THERAC-25, A Medical Linear Accelerator]
* Reeves and Snyder. 10 January 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1571113&userType=inst An Overview of the Mars Exploration Rovers' Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]
* Update: Spirit and Opportunity [http://marsrover.nasa.gov/mission/status.html]
* It's Never Done That Before: A Guide to Troubleshooting Windows XP, John Ross, No Starch Press, 2006
* John Chan. 12 August 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html]
* Dr. Dobb's Journal. 9 June 2010. Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]

COMP 3000 Essay 1 2010 Question 6

2010-10-13T01:43:49Z

Sfangche: /* Windows Blue-Screens-Of-Death */

=Question=

What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?

=Answer=

=Introduction=

Race conditions bugs have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing the application to halt, to life-critical system failures that lead to fatal results. In this article, we will define race conditions, examine some of the most well known cases involving race conditions and explore some of the solution schemes and ways the industry have proposed to track and detect race conditions.

=Overview=

Race conditions is the term used in situations where two or more processes can access the same piece of data simutaneously and
the end result depends on the timing sequence of those processes. This end result can be quite hazardous leading to major system
failures.

The need to control those race conditions and maintain concurrency and safe sharing of resources among processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.
=Examples=
== Therac-25 ==
(This is still very rough and needs work. Thought I would lay it out there as a starting point)

The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The cause of the incidents has been traced back to a programming bug which caused a race-condition.
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines.
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”

The 8 main subroutines were:

Reset

Datent

Set Up Done

Set Up Test

Patient Treatment

Pause Treatment

Terminate Treatment

Date, Time, ID Changes

The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.

The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.

Hypothetical example situation:

-Operator types up data, presses return

-(Magnet subroutine is initiated)

-Operator realizes there is an extra 0 in the radiation intensity field
-Operator moves cursor up and fixes the error and presses return again.

-Magnets are set to previous power level .subroutine returns

-Program moves on to next subroutine without registering changes

-Patient is administered a lethal overdose of radiation

== Black-out of 2003 ==

On August 14th, 2003, a massive power outage spread through the Northeastern and Midwestern United States and Canada. A generating plant in Eastlake, Ohio went offline, causing a domino affect ultimately leading to over 100 power plants shutting down.

There are several reasons that are attributed to this massive failure. One of the most prominent factors being a software bug in General Electric Energy's Unix-based XA/21 energy management system.

FirstEnergy's Akron, Ohio control center was responsible for monitoring the Eastlake plant. However, the software flaw caused the control center to be unable to receive any warning or alarm from the plants.

Because of this, the control center's ability to prevent the cascading effect after the Eastlake plant went offline.

The XA/21 bug was triggered through a unique combination of events and alarm conditions on the equipment it was monitoring. The main system failed, unable to handle the combination of requests. By the time the back-up server kicked in, the accumulation of events since the main system failure caused it to go down as well.

The system made no indication that it had failed, and the control center received no warnings about the fact that they were operating without an alarm system.

The combination which caused the first system failure itself was due to three sagging power lines being tripped simultaneously. The three separate events attempted to execute on a shared state, causing no alarm to be raised and the system to fail.

== The NASA Mars-Rover ==

The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or any possible data about the planet.

===Hardware design and architecture===

The vehicle's main operating equipment consists of a set of wide and narrow angled cameras and a collection of specialized spectrometers. This set of equipment which also includes motors and the power bus is wired to an electronics card cage called the rover equipment module (REM). The main computer was built around a RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and EEPROM).

===Software design===

The autonomous operation of the flight software maintains the vehicle in the state needed to receive and act upon commands, execute sequences of commands when available, and collect and format data for transmission.

Other software modules handle certain engineering functions like power on/off of components, conducting communications, management of memory and resources; device health status and performance of sequence control. The more operational tasks include acquiring of images/videos, processing data, power instruments and carrying out the needed orders to drive the vehicle.

The main software records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&A) and data products.

===System failures===

==Windows Blue-Screens-Of-Death==

When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death.

The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process.

=Conclusions=

=References=

Talk:COMP 3000 Essay 1 2010 Question 6

2010-10-13T01:40:19Z

Sfangche:

Hey guys, this is Munther. I'm one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.

Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.

Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.

Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.

Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a "failure" we should check with the prof. Anyways, its a good starting point.
http://rosettacode.org/wiki/Race_condition

Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf

-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)

----

Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition

Couple notable ones:

The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html

A blackout in 2003 was caused by a race condition in one of the power company's alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)

--Andrew

-----

Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by "systems", is any device based operating system. It doesn't necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.

Other notable examples:

1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well.

Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772

Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf

2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I'm sure we can find a lot of resources for this.

A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html

- - - - - - - - - - -

Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :
http://www.google.ca/url?sa=t&source=web&cd=4&ved=0CCoQFjAD&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&rct=j&q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&ei=FTCtTOzRN8mVnAeL-OThDA&usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&sig2=u2Qo9kdemxdCWAlH10GNeQ

Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&coll=Portal&dl=GUIDE&CFID=104720795&CFTOKEN=13393160

If anyone can't access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.

I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.

PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.

-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)

--------------------

Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I'm ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you've rounded up quite a bit of info on the subject already, great job!

Introduction Paragraph: Introduces the question and gives some general background etc.
Paragraph 1: Gives first example in detail
Paragraph 2: Gives second example in detail
Paragraph 3: Gives third example in detail
Conclusion: Relates it all back together or something (never been good with conclusions)

I think each example paragraph should be broken down like this:

1. Introduction to the example
2. What they tried to use the Multi-Threading to do (or something like that)
3. Story of the system failing
4. The significance/involvement of race condition and mutual exclusion in the failure
5. Conclusion (how it was solved and stuff like that can go here too)

[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)

-----

Hey guys, I'm Fangchen. I am also in group 6. (So I might be the last member lol)
I found a chapter of a book from sun, which name of the chapter is Race Conditions and
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.

The link of the book chapter is here.

http://java.sun.com/developer/Books/performance2/chap3.pdf

On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.

--Fangchen

----
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant.

Note: This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken.

I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.

4.1 Blackout (pg. 5 – 6)

4.3 Therac-25 (pg. 7 – 8)

I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.

Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions.

Lastly, what is our plan on how divide the work for this essay? Also do we want to meet in person someday?

--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)

One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)

-----

Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible.

What Julie mentioned is right. The prof said that 3 examples are alright. But he's really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today.

Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.

-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)

----

Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3. I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...

But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.

If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)

Also is this due on tuesday or thursday?

[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit

Started using tildes now thanks julie

---
Ok everyone write in here when you are available before the 14th

Daniel: all day Monday, Tuesday, and Thursday
Munther: --
Fangchen: --
Andrew: After 12:30 Tues-Wed-Thurs
Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)
cha0s: monday in the afternoon, tuesday after 1, and all day wednesday
----

Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we've found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he'd like to see some more exotic examples lets try and find some for examples 4/5.

Layout we can build on.

Introduction

Therac-25

Mars Rover

Blackout

Example 4

Example 5

Conclusion

I'm going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.

[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)
----
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.

Has the group member above (<strike>Could you please put your name? Was it Andrew?</strike>) decided on Therac-25 then?

Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well).

Any ideas on a deadline for all of our writing?

[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)

----

I tried writing up a bit about the Therac-25. Still pretty rough but its a start.

Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf

Pages 22-28 deal with the software bug

[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)

----

Yo, I'm guessing I'm the last member, putting us at 6. I'll post what I've got for my section later tonight. I'm good to meet monday in the afternoon, tuesday after 1, and all day wednesday.

[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)

-----

Looks like tuesday is a good day, wait to see for the rest to confirm?
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)

----

Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each

[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)

------

Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.

-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)

----

I've been looking for a while now, and I can't find any major system failures related to the topic except the three we already have. I'll focus my research on the blackout case for now.

[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)

----

Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i'll try and find some more info on it.

[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)

-----

Hey guys. I've edited the article, provided an introduction and an overview piece. Plus, I've posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don't think theres any harm in doing that, I've found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.

Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)

----

I guess ill do Blue Screens then

[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)

----
Ok, so in today's lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be '''original'''. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader's understanding. So no copy and pasting will be tolerated. In fact, I'm going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one's words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.

Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)

-----
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access. I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that's essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0. Now if you stick your own code at 0, you can now run your own code in the kernel ;)

--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)
-----
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&lr=&id=2bGxMzOtUMsC&oi=fnd&pg=PR15&dq=Blue-Screens-of-Death&ots=aYecJYK84q&sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&q=Blue-Screens-of-Death&f=false

On page 54, it described the reason why that happened.

http://books.google.com/books?hl=zh-CN&lr=&id=cp0k20nfMBcC&oi=fnd&pg=PR6&dq=Blue-Screens-of-Death&ots=PDaXQZiTdu&sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&q=Blue-Screens-of-Death&f=false

And here is an example how blue-screen affects people's life. I think this book might be useful since it is related to software performance.

BTW,i'll be available the whole afternoon tomorrow.

---Fangchen
------
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?

---Fangchen 21:40, 14 October 2010

Talk:COMP 3000 Essay 1 2010 Question 6

2010-10-13T01:16:47Z

Sfangche:

Talk:COMP 3000 Essay 1 2010 Question 6

2010-10-09T04:41:32Z

Sfangche: