Soma-notes - User contributions [en]

COMP 3000 Essay 2 2010 Question 3

2010-12-02T09:58:13Z

Brobson: /* Background Concepts: */

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay. It is essential for one to comprehend the basic concepts of the language that is spoken in the paper in order to fully understand the ideas that are being discussed. The most important notions discussed in the FlexSC paper that are at the core of it all, are System calls[21], and Sychronous systems. These base definitons along with numerous other helpful ideas can be understood through the section of the paper to follow.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper. It is more vital to the reader to understand the core ideas of these defintions along with the underlying motivation for there existance, then to understand the miniscule details of their processes.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode switch which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation look-aside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Pipeline Flushing===
The regular operation of a CPU has multiple instructions being fetched, decoded and executed at the same time. The parallel processing of instructions provides a significant speed advantage in processing. During a mode switch, however, instructions in the user-mode pipeline are flushed and removed from the processor registers.[1] These lost instructions are part of the cost of a mode switch. 

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpectedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendency for the same set of data to be accessed repeatedly over a brief time period. There are two important forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

===Translation Look-Aside Buffer (TLB)===
A TLB is a table used in a virtual memory system that lists the physical address page number associated with each virtual address page number. A TLB is used in conjunction with a cache whose tags are based on virtual addresses. The virtual address is presented simultaneously to the TLB and to the cache so that cache access and the virtual-to-physical address translation can proceed in parallel. If the requested address is not cached then the physical address is used to locate the data in main memory.

The TLB is the reason context switches can have such large performance penalties. Every time the OS switches context, the entire buffer is flushed. When the process resumes, it must be rebuilt from scratch. Too many context switches will therefore cause an increase in cache misses and degrade performance.[17] 

===Lack of Locality ===
As per paper, locality refers to both types of locality, i.e. temporal and spatial, defined above. Thus, lack of locality here means data and instructions needed most frequently by the application continues to be switched back and forth (from registers and caches) due to system calls, attributing hence, to performance degradation. 

===Throughput ===
Is an indication of how much work is done during a unit of time. E.g. n transactions per hour. The higher n is, the better. [2. P151] 

===Regular Store Instructions ===
A store instruction simply refers to a typical assembly language instruction, where, usually, there are two arguments. A value, and a memory location, where that value should be stored. 

===Linux Application Binary Interface (ABI)===
The ABI is a patch to the kernel that allows you to run SCO, Xenix, Solaris ix86, and other binaries on Linux.[18] 

===Native POSIX Thread Library (NPTL)===
NPTL is a software component that allows the Linux kernel to run applications optimized for POSIX Thread efficiency.[19] 

===Syscall Page ===
A syscall page is a collection of syscall entries. In turn, a sysentry is a 64-byte data structure, which includes information such as syscall number, number of arguments,
the arguments, status, and return value [1]. 

===Syscall Threads ===
Syscall threads is FlexSC's method to allow exception-less system calls. A syscall thread shares its process virtual address space [1]. 

===Latency ===
Latency is a measure of the time delay between the start of an action and its completion in a system.[20] 

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes. System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them.[1] 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by disparity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

===Blocking Calls===
FlexSC relies on the fact that web and database servers have a lot of concurrency and independent parallelism. FlexSC can 'harvest' enough independent work so that it doesn't need to track dependencies between system calls. However, this could be a problem in other situations. Since FlexSC system calls are 'inherently asynchronous', if they need to block, FlexSC would jump to the next system call and execute that one. This can cause a problem for system calls such as reading and writing, where the write call has an outstanding dependency on the read call. However, this could be resolved by using some kind of combined system call, that is, multiple system calls executed as one single call. Unfortunately, FlexSC does not have any current handling for such an implementation. 

===Core Scheduling Issues===
In a system with X cores, FlexSC needs to dedicate some subset of cores for system calls. Currently, FlexSC first wakes up core X to run a system call thread, and when another batch comes in, if core X is still busy, it will then try core X-1, and so on. Of all the algorithms they tested, it turned out that this, the simplest algorithm, was the most efficient algorithm for FlexSC scheduling. However, this was only tested with FlexSC running a single application at a time. FlexSC's scheduling algorithm would need to be fine-tuned for running multiple applications. 

===When There Are Not More Threads Then Cores===
In situations where there is a single thread using 100% of a CPU, and acting primarily in user-space, such as 'Scientific Programs', FlexSC causes more overhead then performance gains. As a result, FlexSC is not an optimal implementation for cases such as this.

===IO ===
FlexSC is not suited for data intensive, IO centric applications, as realized by Vijay Vasudevan [16]. Vijay's research aims to reduce the energy footprint in data centers. FlexSC 
was considered. It was found that FlexSC's reduction of mode switches, via the use of shared memory pages between user space and kernel space is useful for reducing the impact 
of system calls. That technique however was not useful for IO intensive work since it did not remove the requirement of data copying and did not reduce the overheads associated 
with interrupts in IO intensive tasks. 

===Some Kernel Changes Are Required===
Though most of the work is done transparently. i.e. there is no need for application's code modification, there remains a need for small kernel change (3 lines of code), as per section 3.2 of the paper [1]. 
That means adopters, and after each update of the kernel, would have to add/modify the referenced lines and then recompile the kernel. 

===Multicore Systems ===
For a multicore system, the FlexSC scheduler will attempt to choose a subset of the available cores and specialize them for running system call threads. It is unclear how the dynamic 
allocation is done. It is mentioned that decisions are made based on the workload requirements, which doesn't exactly clarify the mechanism. 
Further, the paper mentions that a predefined, static list of cores is used for system call threads assignments. It is unclear when that list is created. Is it at installation time, 
is it generated initially, or does the installer have to do any manual work. On a related note, scalability with increased cores is ambiguous. It is not that clear how scalable the 
scheduler is. One gets the impression that it is very scalable due to the fact that each core spawns a system call thread. Thus, as many threads as there are cores could be running 
concurrently, for one or more processes [1]. More explicit results however would've been beneficial. Further, the paper mentions that hyper-threading was turned off to ease the analysis 
of the results. Understandable, however, it would be nice to know if these threads (2 per core) would actually be treated as a core when turned on ? I.e. would the scheduler then realize 
that it can use eight cores ? Does that also mean the predefined static cores list would need to be modified, to list eight instead of four ? 
 
Along the same reasoning, and given the growing popularity of GPU's use for general programming, it would've been useful to at-least hypothesize on the possible performance 
outcome when using specialized GPUs, like NVIDIA's Tesla GPUs for example. Would FlexSC's scheduler be able to take advantage of the additional cores, and hence use them for 
specialized purposes ?

== Related Work: ==

===System Call Batching===

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

===Locality of Execution and Multicores===

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

===Non-blocking Execution===

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

[16] Vasudevan, Vijay. Improving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes, Thesis Proposal, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, October 12, 2010.[http://www.cs.cmu.edu/~vrv/proposal/vijay_thesis_proposal.pdf PDF]

[17] Patricia J. Teller Translation-Lookaside Buffer Consistency, Journal Volume 23 Issue 6, IBM T. J. Watson Research Center, Yorktown Heights, NY, June 1990. [http://dx.doi.org/10.1109/2.55498 HTML]

[18] Linux ABI sourceforge page. [http://linux-abi.sourceforge.net/ HTML] and Linux application page. [http://www.linux.org/apps/AppId_8088.html HTML]

[19] DREPPER, U., AND MOLNAR , I. The Native POSIX Thread Library for Linux. Tech. rep., RedHat Inc, 2003. [http://people.redhat.com/drepper/nptl-design.pdf HTML]

[20] M. Brian Blake, Coordinating Multiple Agents for Workflow-Oriented Process Orchestration. Information Systems and e-Business Management Journal, Springer-Verlag, December 2003. [http://www.cs.georgetown.edu/~blakeb/pubs/blake_ISEB2003.pdf PDF]

[21] DeveloperWorks, Kernel Command using Linux System Calls IBM,2010.[http://www.ibm.com/developerworks/linux/library/l-system-calls/ ]

COMP 3000 Essay 2 2010 Question 3

2010-12-02T09:57:26Z

Brobson: /* Background Concepts: */

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay. It is essential for one to comprehend the basic concepts of the language that is spoken in the paper in order to fully understand the ideas that are being discussed. The most important notions discussed in the FlexSC paper that are at the core of it all, are System calls[21], and Sychronous systems. These base definitons along with numerous other helpful ideas can be understood through the section of the paper to follow.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper. It is more vital to the reader to understand the core ideas of these defintions along with the underlying motivation for there existance, then to understand the miniscule details of their processes.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode switch which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation look-aside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Pipeline Flushing===
The regular operation of a CPU has multiple instructions being fetched, decoded and executed at the same time. The parallel processing of instructions provides a significant speed advantage in processing. During a mode switch, however, instructions in the user-mode pipeline are flushed and removed from the processor registers.[1] These lost instructions are part of the cost of a mode switch. 

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpectedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendency for the same set of data to be accessed repeatedly over a brief time period. There are two important forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

===Translation Look-Aside Buffer (TLB)===
A TLB is a table used in a virtual memory system that lists the physical address page number associated with each virtual address page number. A TLB is used in conjunction with a cache whose tags are based on virtual addresses. The virtual address is presented simultaneously to the TLB and to the cache so that cache access and the virtual-to-physical address translation can proceed in parallel. If the requested address is not cached then the physical address is used to locate the data in main memory.

The TLB is the reason context switches can have such large performance penalties. Every time the OS switches context, the entire buffer is flushed. When the process resumes, it must be rebuilt from scratch. Too many context switches will therefore cause an increase in cache misses and degrade performance.[17]

===Lack of Locality ===
As per paper, locality refers to both types of locality, i.e. temporal and spatial, defined above. Thus, lack of locality here means data and instructions needed most frequently by the application continues to be switched back and forth (from registers and caches) due to system calls, attributing hence, to performance degradation.

===Throughput ===
Is an indication of how much work is done during a unit of time. E.g. n transactions per hour. The higher n is, the better. [2. P151]

===Regular Store Instructions ===
A store instruction simply refers to a typical assembly language instruction, where, usually, there are two arguments. A value, and a memory location, where that value should be stored.

===Linux Application Binary Interface (ABI)===
The ABI is a patch to the kernel that allows you to run SCO, Xenix, Solaris ix86, and other binaries on Linux.[18]

===Native POSIX Thread Library (NPTL)===
NPTL is a software component that allows the Linux kernel to run applications optimized for POSIX Thread efficiency.[19]

===Syscall Page ===
A syscall page is a collection of syscall entries. In turn, a sysentry is a 64-byte data structure, which includes information such as syscall number, number of arguments,
the arguments, status, and return value [1].

===Syscall Threads ===
Syscall threads is FlexSC's method to allow exception-less system calls. A syscall thread shares its process virtual address space [1].

===Latency ===
Latency is a measure of the time delay between the start of an action and its completion in a system.[20]

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes. System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them.[1] 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by disparity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

===Blocking Calls===
FlexSC relies on the fact that web and database servers have a lot of concurrency and independent parallelism. FlexSC can 'harvest' enough independent work so that it doesn't need to track dependencies between system calls. However, this could be a problem in other situations. Since FlexSC system calls are 'inherently asynchronous', if they need to block, FlexSC would jump to the next system call and execute that one. This can cause a problem for system calls such as reading and writing, where the write call has an outstanding dependency on the read call. However, this could be resolved by using some kind of combined system call, that is, multiple system calls executed as one single call. Unfortunately, FlexSC does not have any current handling for such an implementation. 

===Core Scheduling Issues===
In a system with X cores, FlexSC needs to dedicate some subset of cores for system calls. Currently, FlexSC first wakes up core X to run a system call thread, and when another batch comes in, if core X is still busy, it will then try core X-1, and so on. Of all the algorithms they tested, it turned out that this, the simplest algorithm, was the most efficient algorithm for FlexSC scheduling. However, this was only tested with FlexSC running a single application at a time. FlexSC's scheduling algorithm would need to be fine-tuned for running multiple applications. 

===When There Are Not More Threads Then Cores===
In situations where there is a single thread using 100% of a CPU, and acting primarily in user-space, such as 'Scientific Programs', FlexSC causes more overhead then performance gains. As a result, FlexSC is not an optimal implementation for cases such as this.

===IO ===
FlexSC is not suited for data intensive, IO centric applications, as realized by Vijay Vasudevan [16]. Vijay's research aims to reduce the energy footprint in data centers. FlexSC 
was considered. It was found that FlexSC's reduction of mode switches, via the use of shared memory pages between user space and kernel space is useful for reducing the impact 
of system calls. That technique however was not useful for IO intensive work since it did not remove the requirement of data copying and did not reduce the overheads associated 
with interrupts in IO intensive tasks. 

===Some Kernel Changes Are Required===
Though most of the work is done transparently. i.e. there is no need for application's code modification, there remains a need for small kernel change (3 lines of code), as per section 3.2 of the paper [1]. 
That means adopters, and after each update of the kernel, would have to add/modify the referenced lines and then recompile the kernel. 

===Multicore Systems ===
For a multicore system, the FlexSC scheduler will attempt to choose a subset of the available cores and specialize them for running system call threads. It is unclear how the dynamic 
allocation is done. It is mentioned that decisions are made based on the workload requirements, which doesn't exactly clarify the mechanism. 
Further, the paper mentions that a predefined, static list of cores is used for system call threads assignments. It is unclear when that list is created. Is it at installation time, 
is it generated initially, or does the installer have to do any manual work. On a related note, scalability with increased cores is ambiguous. It is not that clear how scalable the 
scheduler is. One gets the impression that it is very scalable due to the fact that each core spawns a system call thread. Thus, as many threads as there are cores could be running 
concurrently, for one or more processes [1]. More explicit results however would've been beneficial. Further, the paper mentions that hyper-threading was turned off to ease the analysis 
of the results. Understandable, however, it would be nice to know if these threads (2 per core) would actually be treated as a core when turned on ? I.e. would the scheduler then realize 
that it can use eight cores ? Does that also mean the predefined static cores list would need to be modified, to list eight instead of four ? 
 
Along the same reasoning, and given the growing popularity of GPU's use for general programming, it would've been useful to at-least hypothesize on the possible performance 
outcome when using specialized GPUs, like NVIDIA's Tesla GPUs for example. Would FlexSC's scheduler be able to take advantage of the additional cores, and hence use them for 
specialized purposes ?

== Related Work: ==

===System Call Batching===

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

===Locality of Execution and Multicores===

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

===Non-blocking Execution===

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

[16] Vasudevan, Vijay. Improving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes, Thesis Proposal, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, October 12, 2010.[http://www.cs.cmu.edu/~vrv/proposal/vijay_thesis_proposal.pdf PDF]

[17] Patricia J. Teller Translation-Lookaside Buffer Consistency, Journal Volume 23 Issue 6, IBM T. J. Watson Research Center, Yorktown Heights, NY, June 1990. [http://dx.doi.org/10.1109/2.55498 HTML]

[18] Linux ABI sourceforge page. [http://linux-abi.sourceforge.net/ HTML] and Linux application page. [http://www.linux.org/apps/AppId_8088.html HTML]

[19] DREPPER, U., AND MOLNAR , I. The Native POSIX Thread Library for Linux. Tech. rep., RedHat Inc, 2003. [http://people.redhat.com/drepper/nptl-design.pdf HTML]

[20] M. Brian Blake, Coordinating Multiple Agents for Workflow-Oriented Process Orchestration. Information Systems and e-Business Management Journal, Springer-Verlag, December 2003. [http://www.cs.georgetown.edu/~blakeb/pubs/blake_ISEB2003.pdf PDF]

[21] DeveloperWorks, Kernel Command using Linux System Calls IBM,2010.[http://www.ibm.com/developerworks/linux/library/l-system-calls/ ]

COMP 3000 Essay 2 2010 Question 3

2010-12-02T09:56:47Z

Brobson: /* Pipeline Flushing */

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay. It is essential for one to comprehend the basic concepts of the language that is spoken in the paper in order to fully understand the ideas that are being discussed. The most important notions discussed in the FlexSC paper that are at the core of it all, are System calls[21], and Sychronous systems. These base definitons along with numerous other helpful ideas can be understood through the section of the paper to follow.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper. It is more vital to the reader to understand the core ideas of these defintions along with the underlying motivation for there existance, then to understand the miniscule details of their processes.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode switch which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation look-aside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Pipeline Flushing===
The regular operation of a CPU has multiple instructions being fetched, decoded and executed at the same time. The parallel processing of instructions provides a significant speed advantage in processing. During a mode switch, however, instructions in the user-mode pipeline are flushed and removed from the processor registers.[1] These lost instructions are part of the cost of a mode switch.

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpectedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendency for the same set of data to be accessed repeatedly over a brief time period. There are two important forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

===Translation Look-Aside Buffer (TLB)===
A TLB is a table used in a virtual memory system that lists the physical address page number associated with each virtual address page number. A TLB is used in conjunction with a cache whose tags are based on virtual addresses. The virtual address is presented simultaneously to the TLB and to the cache so that cache access and the virtual-to-physical address translation can proceed in parallel. If the requested address is not cached then the physical address is used to locate the data in main memory.

The TLB is the reason context switches can have such large performance penalties. Every time the OS switches context, the entire buffer is flushed. When the process resumes, it must be rebuilt from scratch. Too many context switches will therefore cause an increase in cache misses and degrade performance.[17]

===Lack of Locality ===
As per paper, locality refers to both types of locality, i.e. temporal and spatial, defined above. Thus, lack of locality here means data and instructions needed most frequently by the application continues to be switched back and forth (from registers and caches) due to system calls, attributing hence, to performance degradation.

===Throughput ===
Is an indication of how much work is done during a unit of time. E.g. n transactions per hour. The higher n is, the better. [2. P151]

===Regular Store Instructions ===
A store instruction simply refers to a typical assembly language instruction, where, usually, there are two arguments. A value, and a memory location, where that value should be stored.

===Linux Application Binary Interface (ABI)===
The ABI is a patch to the kernel that allows you to run SCO, Xenix, Solaris ix86, and other binaries on Linux.[18]

===Native POSIX Thread Library (NPTL)===
NPTL is a software component that allows the Linux kernel to run applications optimized for POSIX Thread efficiency.[19]

===Syscall Page ===
A syscall page is a collection of syscall entries. In turn, a sysentry is a 64-byte data structure, which includes information such as syscall number, number of arguments,
the arguments, status, and return value [1].

===Syscall Threads ===
Syscall threads is FlexSC's method to allow exception-less system calls. A syscall thread shares its process virtual address space [1].

===Latency ===
Latency is a measure of the time delay between the start of an action and its completion in a system.[20]

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes. System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them.[1] 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by disparity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

===Blocking Calls===
FlexSC relies on the fact that web and database servers have a lot of concurrency and independent parallelism. FlexSC can 'harvest' enough independent work so that it doesn't need to track dependencies between system calls. However, this could be a problem in other situations. Since FlexSC system calls are 'inherently asynchronous', if they need to block, FlexSC would jump to the next system call and execute that one. This can cause a problem for system calls such as reading and writing, where the write call has an outstanding dependency on the read call. However, this could be resolved by using some kind of combined system call, that is, multiple system calls executed as one single call. Unfortunately, FlexSC does not have any current handling for such an implementation. 

===Core Scheduling Issues===
In a system with X cores, FlexSC needs to dedicate some subset of cores for system calls. Currently, FlexSC first wakes up core X to run a system call thread, and when another batch comes in, if core X is still busy, it will then try core X-1, and so on. Of all the algorithms they tested, it turned out that this, the simplest algorithm, was the most efficient algorithm for FlexSC scheduling. However, this was only tested with FlexSC running a single application at a time. FlexSC's scheduling algorithm would need to be fine-tuned for running multiple applications. 

===When There Are Not More Threads Then Cores===
In situations where there is a single thread using 100% of a CPU, and acting primarily in user-space, such as 'Scientific Programs', FlexSC causes more overhead then performance gains. As a result, FlexSC is not an optimal implementation for cases such as this.

===IO ===
FlexSC is not suited for data intensive, IO centric applications, as realized by Vijay Vasudevan [16]. Vijay's research aims to reduce the energy footprint in data centers. FlexSC 
was considered. It was found that FlexSC's reduction of mode switches, via the use of shared memory pages between user space and kernel space is useful for reducing the impact 
of system calls. That technique however was not useful for IO intensive work since it did not remove the requirement of data copying and did not reduce the overheads associated 
with interrupts in IO intensive tasks. 

===Some Kernel Changes Are Required===
Though most of the work is done transparently. i.e. there is no need for application's code modification, there remains a need for small kernel change (3 lines of code), as per section 3.2 of the paper [1]. 
That means adopters, and after each update of the kernel, would have to add/modify the referenced lines and then recompile the kernel. 

===Multicore Systems ===
For a multicore system, the FlexSC scheduler will attempt to choose a subset of the available cores and specialize them for running system call threads. It is unclear how the dynamic 
allocation is done. It is mentioned that decisions are made based on the workload requirements, which doesn't exactly clarify the mechanism. 
Further, the paper mentions that a predefined, static list of cores is used for system call threads assignments. It is unclear when that list is created. Is it at installation time, 
is it generated initially, or does the installer have to do any manual work. On a related note, scalability with increased cores is ambiguous. It is not that clear how scalable the 
scheduler is. One gets the impression that it is very scalable due to the fact that each core spawns a system call thread. Thus, as many threads as there are cores could be running 
concurrently, for one or more processes [1]. More explicit results however would've been beneficial. Further, the paper mentions that hyper-threading was turned off to ease the analysis 
of the results. Understandable, however, it would be nice to know if these threads (2 per core) would actually be treated as a core when turned on ? I.e. would the scheduler then realize 
that it can use eight cores ? Does that also mean the predefined static cores list would need to be modified, to list eight instead of four ? 
 
Along the same reasoning, and given the growing popularity of GPU's use for general programming, it would've been useful to at-least hypothesize on the possible performance 
outcome when using specialized GPUs, like NVIDIA's Tesla GPUs for example. Would FlexSC's scheduler be able to take advantage of the additional cores, and hence use them for 
specialized purposes ?

== Related Work: ==

===System Call Batching===

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

===Locality of Execution and Multicores===

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

===Non-blocking Execution===

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

[16] Vasudevan, Vijay. Improving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes, Thesis Proposal, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, October 12, 2010.[http://www.cs.cmu.edu/~vrv/proposal/vijay_thesis_proposal.pdf PDF]

[17] Patricia J. Teller Translation-Lookaside Buffer Consistency, Journal Volume 23 Issue 6, IBM T. J. Watson Research Center, Yorktown Heights, NY, June 1990. [http://dx.doi.org/10.1109/2.55498 HTML]

[18] Linux ABI sourceforge page. [http://linux-abi.sourceforge.net/ HTML] and Linux application page. [http://www.linux.org/apps/AppId_8088.html HTML]

[19] DREPPER, U., AND MOLNAR , I. The Native POSIX Thread Library for Linux. Tech. rep., RedHat Inc, 2003. [http://people.redhat.com/drepper/nptl-design.pdf HTML]

[20] M. Brian Blake, Coordinating Multiple Agents for Workflow-Oriented Process Orchestration. Information Systems and e-Business Management Journal, Springer-Verlag, December 2003. [http://www.cs.georgetown.edu/~blakeb/pubs/blake_ISEB2003.pdf PDF]

[21] DeveloperWorks, Kernel Command using Linux System Calls IBM,2010.[http://www.ibm.com/developerworks/linux/library/l-system-calls/ ]

COMP 3000 Essay 2 2010 Question 3

2010-12-02T09:25:28Z

Brobson: /* Background Concepts: */

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

== Paper ==
The Title of the paper we will be analyzing is named "FlexSC: Flexible System Call Scheduling with Exception-Less System Calls". The authors of this paper consist of Livio Stores and Michael Stumm, both of which are from the University of Toronto. The paper can be viewed here, [http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf] for further details on specifics of the essay. It is essential for one to comprehend the basic concepts of the language that is spoken in the paper in order to fully understand the ideas that are being discussed. The most important notions discussed in the FlexSC paper that are at the core of it all, are System calls[21], and Sychronous systems. These base definitons along with numerous other helpful ideas can be understood through the section of the paper to follow.
== Background Concepts: ==

In order to fully understand the FlexSC paper, it is essential to understand the key concepts that are discussed within the paper. Here listed below, are the main concepts required to fully comprehend the paper. It is more vital to the reader to understand the core ideas of these defintions along with the underlying motivation for there existance, then to understand the miniscule details of their processes.

===System Call===
A System Call is the gateway between the User Space and the Kernel Space. The User Space is not given direct access to the Kernel's services, for several reasons (one being security), hence System calls are the messengers between the User and Kernel Space.[1][4] 

===Mode Switch===
Mode Switches speak of moving from one medium to another. Specifically moving from the User Space mode to the Kernel mode or Kernel mode to User Space. It does not matter which direction or which modes we are swtiching from, this is simply a general term. Crucial to mode switching is the mode switch time which is the time necessary to execute a system call instruction in user-mode, perform the kernel mode execution of the system call, and finally return the execution back to user-mode.[1] 

===Synchronous System Call===
Synchronous Execution Model(System call Interface) refers to the structure in which system calls specifically are managed in a serialized manner. Moreover, the synchronous model completes one system call at a time, and does not move onto the next system call until the previous system call is finished executing. This form of system call is blocking, meaning the process which initiates the system call is blocked until the system call returns. Traditionally, operating system calls are mostly synchronous system calls.[1][2] 

===Asynchronous System Call===
An asynchronous system call is a system call which does not block upon invocation; control of execution is returned to the calling process immediately. Asynchronous system calls do not necessarily execute in order and can be compared to event driven programming.[2][3] 

===System Call Pollution===
System Call Pollution is a more sophisticated manner of referring to wasteful or un-necessary delay in the system caused by system calls. This pollution is in direct correlation with the fact that the system call invokes a mode switch which is not a costless task. The "pollution" involved takes the form of data over-written in critical processor structures like the TLB (translation look-aside buffer - table which reduces the frequency of main memory access for page table entries), branch prediction tables, and the cache (L1, L2, L3).[1][3] 

===Pipeline Flushing===

===Processor Exceptions===
Processor exceptions are situations which cause the processor to stop current execution unexpectedly in order to handle the issue. There are many situations which generate processor exceptions including undefined instructions and software interrupts(system calls).[5] 

===System Call Batching===
System Call Batching is the concept of collecting system calls together to be executed in a group instead of executing them immediately after they are called.[6] 

===Temporal and Spatial Locality===
Locality is the concept that during execution there will be a tendency for the same set of data to be accessed repeatedly over a brief time period. There are two important forms of locality; spatial locality and temporal locality. Spatial locality refers to the pattern that memory locations in close physical proximity will be referenced close together in a short period of time. Temporal locality, on the other hand, is the tendency of recently requested memory locations to be requested again.[7][8] 

===Instructions Per Cycle (IPC)===
Instructions per cycle is the amount of instructions a processor can execute in a single clock cycle.[9] 

===Translation Look-Aside Buffer (TLB)===
A TLB is a table used in a virtual memory system that lists the physical address page number associated with each virtual address page number. A TLB is used in conjunction with a cache whose tags are based on virtual addresses. The virtual address is presented simultaneously to the TLB and to the cache so that cache access and the virtual-to-physical address translation can proceed in parallel. If the requested address is not cached then the physical address is used to locate the data in main memory.

The TLB is the reason context switches can have such large performance penalties. Every time the OS switches context, the entire buffer is flushed. When the process resumes, it must be rebuilt from scratch. Too many context switches will therefore cause an increase in cache misses and degrade performance.[17]

===Lack of Locality ===
As per paper, locality refers to both types of locality, i.e. temporal and spatial, defined above. Thus, lack of locality here means data and instructions needed most frequently by the application continues to be switched back and forth (from registers and caches) due to system calls, attributing hence, to performance degradation.

===Throughput ===
Is an indication of how much work is done during a unit of time. E.g. n transactions per hour. The higher n is, the better. [2. P151]

===Regular Store Instructions ===
A store instruction simply refers to a typical assembly language instruction, where, usually, there are two arguments. A value, and a memory location, where that value should be stored.

===Linux Application Binary Interface (ABI)===
The ABI is a patch to the kernel that allows you to run SCO, Xenix, Solaris ix86, and other binaries on Linux.[18]

===Native POSIX Thread Library (NPTL)===
NPTL is a software component that allows the Linux kernel to run applications optimized for POSIX Thread efficiency.[19]

===Syscall Page ===
A syscall page is a collection of syscall entries. In turn, a sysentry is a 64-byte data structure, which includes information such as syscall number, number of arguments,
the arguments, status, and return value [1].

===Syscall Threads ===
Syscall threads is FlexSC's method to allow exception-less system calls. A syscall thread shares its process virtual address space [1].

===Latency ===
Latency is a measure of the time delay between the start of an action and its completion in a system.[20]

== Research Problem: ==
System calls provide an interface for user-mode applications to request services from the operating system. Traditionally, the system call interface has been implemented using synchronous system calls, which block the calling user-space process when the system call is initiated. The benefit of using synchronous system calls comes from the easy to program nature of having sequential operation. However, this ease of use also comes with undesireable side effects which can slow down the instructions per cycle (IPC) of the processor.[9] In FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, Soares and Stumm attempt to provide a new form of system call which minimizes the negative effects of synchronous system calls while still remaining easy to implement for application programmers.[1] 

The negative effects of synchronous system calls have been researched heavily, it is accepted that although easy to use, they are not optimal. Previous research includes work into system call batching such as multi-calls[6], locality of execution with multicore systems[7][8], and non-blocking execution. System call batching shares great similarity with FlexSC as multiple system calls are grouped together to reduce the amount of mode switches required of the system.[6] The difference is multi-calls do not make use of parallel execution of system calls nor do they manage the blocking aspect of synchronous system calls. FlexSC describes methods to handle both of these situations as described in the Contribution section of this document.[1] Previous research into locality of execution and multicore systems has focused on managing device interrupts and limiting processor interference associated with interrupt handling.[7][8] However, these solutions require a microkernel solution and although they can dedicate certain execution to specific cores of a system, they can not dynamically adapt to the proportion of cores used by the kernel and the cores shared between the kernel and the user like FlexSC can.[1] Non-blocking execution research has focused on threading, event-based (non-blocking) and hybrid solutions. However, FlexSC provides a mechanism to separate system call execution from system call invocation. This is a key difference between FlexSC and previous research.[1] 

== Contribution: ==

===Exception-Less System Calls===
Exception-less system calls are the research team's attempt to provide an alternative to synchronous systems calls. The downside to synchronous system calls includes the cumulative mode switch time of multiple system calls each called independently, state pollution of key processor structures (TLB, cache, etc.)[1][3], and, potentially the most crucial, the performance impact on the user-mode application during a system call. Exception-less system calls attempt to resolve these three issues through: 
1. System Call Batching: 
Instead of having each system call run as soon as it is called, FlexSC instead groups together system calls into batches. These batches can then be executed at one time thus minimizing the frequency of mode switches bewteen user and kernel modes. Batching provides a benefit both in terms of the direct cost of mode switching as well as the indirect cost, pollution of critical processor structures, associated with switching modes. System call batching works by first requesting as many system calls as possible, then switching to kernel mode, and then executing each of them.[1] 
2. Core Specialization 
On a multi-core system, FlexSC can provide the ability to designate a single core to run all system calls. The reason this is possible is that for an exception-less system call, the system call execution is decoupled from the system call invocation. This is described further in Decoupling Execution from Invocation section below.[1] 
3. Exception-less System Call Interface 
To provide an asynchronous interface to the kernel, FlexSC uses syscall pages. Syscall pages are a set of memory pages shared between user-mode and kernel-mode. User-space threads interact with syscall pages in order to make a request (system call) for kernel-mode procedures. A user-mode thread may make a system call request on a free entry of a syscall page, the syscall page will then run once the batch condition is met and store the return value on the syscall page. The user-mode thread can then return to the syscall page to obtain the return value. Neither issuing the system call via the syscall page nor getting the return value from the syscall page generate a processor exception. Each syscall page is a table of syscall entries. These entries may have one of three states: Free - meaning a syscall can be added to the entry; Submitted - meaning the kernel can proceed to invoke the appropriate system call operations; and Done - meaning the kernel is finished and the return value is ready for the user-mode thread to retrieve it.[1] 
4. Decoupling Execution from Invocation 
In order to separate a system call invocation from the execution of the system call, syscall threads were created. The sole purpose of syscall threads is to pull requests from syscall pages and execute the request, always in kernel mode. This is the mechanic that allows exception-less system calls to provide the ability for a user-mode thread to issue a request and continue to run while the kernel level system call is being executed. In addition, since the system call invocation is separate from execution, a process running on one core may request a system call yet the execution of the system call may be completed on an entirely different core. This allows exception-less system calls the unique capability of having all system call execution delegated to a specific core while other cores maintain user-mode execution.[1] 

===FlexSC Threads===
As mentioned above, FlexSC threads are a key component of the exception-less system call interface. FlexSC threads transform regular, synchronous system calls into exception-less system calls and are compatible with both the POSIX and default Linux thread libraries. This means that FlexSC Threads are immediately capable of running multi-threaded Linux applications with no modifications. The intended use of these threads is with server-type applications which contain many user-mode threads. In order to accomodate multiple user-mode threads, the FlexSC interface provides a syscall page for each core of a system. In this manner, multiple user-mode threads can be multiplexed onto a single syscall page which in turn has a single kernel level thread to facilitate execution of the system calls. Programming with FlexSC threads can be compared to event-driven programming as interactions are not guaranteed to be sequential. This does increase the complexity of programming for an exception-less system call interface as compared to the relatively simple synchronous system call interface.[1][2][3] 

== Critique: ==

===Moore's Law===
One interesting aspect of this paper is how the research relates to Moore's Law. Moore's Law states that the number of transistors on a chip doubles every 18 months.[10]. This has lead to very large increases in the performance potential of software but at the same time has opened a large gap between the actual performance of efficient and inefficient software. This paper claims that the gap is mainly caused by disparity of accessing different processor resources such as registers, cache and memory.[1] In this manner, the FlexSC interface is not just an attempt to increase the efficiency of current system calls, but it is actually an attempt to change the way we view software. It is not simply enough to continue to build more powerful machines if the code we currently run will not speed up (become more efficient) along with the gain of power. Instead we need to focus on appropriate allocation and usage of the power as failure to do so is the origination of the gap between our potential and our performance. 

===Performance of FlexSC===
It is of particular interest to note that exception-less system calls only outperformed synchronous system calls when the system was running multiple system calls. For an individual system call, the overhead of the FlexSC interface was greater than a synchronous call. The real benefit of FlexSC comes when there are many system calls which can be in turn batched before execution. In this situation the FlexSC system far outperformed the traditional synchronous system calls.[1] This is why the research paper's focus is on server-like applications as server must handle many user requests efficiently to be useful. Thus, for a general case it appears that a hybrid solution of synchronous calls below some threshold and execption-less system calls above the same threshold would be most efficient. 

===Blocking Calls===
FlexSC relies on the fact that web and database servers have a lot of concurrency and independent parallelism. FlexSC can 'harvest' enough independent work so that it doesn't need to track dependencies between system calls. However, this could be a problem in other situations. Since FlexSC system calls are 'inherently asynchronous', if they need to block, FlexSC would jump to the next system call and execute that one. This can cause a problem for system calls such as reading and writing, where the write call has an outstanding dependency on the read call. However, this could be resolved by using some kind of combined system call, that is, multiple system calls executed as one single call. Unfortunately, FlexSC does not have any current handling for such an implementation. 

===Core Scheduling Issues===
In a system with X cores, FlexSC needs to dedicate some subset of cores for system calls. Currently, FlexSC first wakes up core X to run a system call thread, and when another batch comes in, if core X is still busy, it will then try core X-1, and so on. Of all the algorithms they tested, it turned out that this, the simplest algorithm, was the most efficient algorithm for FlexSC scheduling. However, this was only tested with FlexSC running a single application at a time. FlexSC's scheduling algorithm would need to be fine-tuned for running multiple applications. 

===When There Are Not More Threads Then Cores===
In situations where there is a single thread using 100% of a CPU, and acting primarily in user-space, such as 'Scientific Programs', FlexSC causes more overhead then performance gains. As a result, FlexSC is not an optimal implementation for cases such as this.

===IO ===
FlexSC is not suited for data intensive, IO centric applications, as realized by Vijay Vasudevan [16]. Vijay's research aims to reduce the energy footprint in data centers. FlexSC 
was considered. It was found that FlexSC's reduction of mode switches, via the use of shared memory pages between user space and kernel space is useful for reducing the impact 
of system calls. That technique however was not useful for IO intensive work since it did not remove the requirement of data copying and did not reduce the overheads associated 
with interrupts in IO intensive tasks. 

===Some Kernel Changes Are Required===
Though most of the work is done transparently. i.e. there is no need for application's code modification, there remains a need for small kernel change (3 lines of code), as per section 3.2 of the paper [1]. 
That means adopters, and after each update of the kernel, would have to add/modify the referenced lines and then recompile the kernel. 

===Multicore Systems ===
For a multicore system, the FlexSC scheduler will attempt to choose a subset of the available cores and specialize them for running system call threads. It is unclear how the dynamic 
allocation is done. It is mentioned that decisions are made based on the workload requirements, which doesn't exactly clarify the mechanism. 
Further, the paper mentions that a predefined, static list of cores is used for system call threads assignments. It is unclear when that list is created. Is it at installation time, 
is it generated initially, or does the installer have to do any manual work. On a related note, scalability with increased cores is ambiguous. It is not that clear how scalable the 
scheduler is. One gets the impression that it is very scalable due to the fact that each core spawns a system call thread. Thus, as many threads as there are cores could be running 
concurrently, for one or more processes [1]. More explicit results however would've been beneficial. Further, the paper mentions that hyper-threading was turned off to ease the analysis 
of the results. Understandable, however, it would be nice to know if these threads (2 per core) would actually be treated as a core when turned on ? I.e. would the scheduler then realize 
that it can use eight cores ? Does that also mean the predefined static cores list would need to be modified, to list eight instead of four ? 
 
Along the same reasoning, and given the growing popularity of GPU's use for general programming, it would've been useful to at-least hypothesize on the possible performance 
outcome when using specialized GPUs, like NVIDIA's Tesla GPUs for example. Would FlexSC's scheduler be able to take advantage of the additional cores, and hence use them for 
specialized purposes ?

== Related Work: ==

===System Call Batching===

Muti-calls is a concept which involves collecting multiple system calls and submitting them as a single system call. It is used both in operating systems and paravirtualized hypervisors. The Cassyopia compiler has a special technique name a looped multi-call, which is an additional process where the result of one system call can be fed as an argument to another system call in the same multi-call.[11] There is a significant difference between multi-calls and exception-less system calls. Multi-calls do not investigate parallel execution of system calls, nor do they address the blocking of system calls like exception-less system calls. Multi-call system calls are executed sequentially, each one must complete before the next may start. On the other hand, exception-less system calls can be executed in parallel, and in the presence of blocking, the next call can execute immediately. 

===Locality of Execution and Multicores===

Several techniques addressed the issue of locality of execution. Larus and Parkes proposed Cohort Scheduling to efficiently execute staged computations.[12] Other techniques include Soft Timers[13] and Lazy Receiver[14] Processing which try to tackle the issue of locality of execution by handling device interrupts. They both try to limit processor interference associated with interrupt handling without affecting the latency of servicing requests. Another technique name Computation Spreading[15] is most similar to the multicore execution of FlexSC. Processor modifications that allow hardware migration of threads and migration to specialized cores. However, they did not model TLBs on current hardware synchronous thread migration is a costly interprocessor interrupt. Another solution has 2 difference between FlexSC. They require a micro-kernel. Also FlexSC can dynamically adapt the proportion of cores used by the kernel or cores shared by user and kernel execution dynamically. While all these solutions rely on expensive inter-processor interrupts to offload system calls, FlexSC could provide a more efficient, and flexible mechanism. 

===Non-blocking Execution===

Past research on improving system call performance has focused extensively on blocking versus non-blocking behavior. Typically researchers used threading, event-based, which is non-blocking and hybrid systems to obtain high performance on server applications. The main difference between many of the proposals for non-blocking execution and FlexSC is that none of the non-blocking system calls have decoupled the system call invocation from its execution. 

== References: ==
[1] Soares, Livio and Michael Stumm, FlexSC: Flexible System Call Scheduling with Exception-Less System Calls, University of Toronto, 2010.[http://www.usenix.org/events/osdi10/tech/full_papers/Soares.pdf PDF]

[2] Tanenbaum, Andrew S., Modern Operating Systems: 3rd Edition, Pearson/Prentice Hall, New Jersey, 2008.

[3] Stallings, William, Operating Systems: Internals and Design Principles - 6th Edition, Pearson/Prentice Hall, New Jersey, 2009.

[4] Garfinkel, Tim, Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools, Computer Science Department - Stanford University.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.2695&rep=rep1&type=pdf PDF]

[5] Yoo, Sunjoo et al., Automatic Generation of Fast Timed Simulation Models for Operating Systems in SoC Design, SLS Group, TIMA Laboratory, Grenoble, 2002.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.1148&rep=rep1&type=pdf PDF]

[6] Rajagopalan, Mohan et al., Cassyopia: Compiler Assisted System Optimization, Poceedings of HotOS IX: The 9th Workshop on Hot Topics in Operating Systems, Lihue, Hawaii, 2003.[https://www.usenix.org/events/hotos03/tech/full_papers/rajagopalan/rajagopalan.pdf PDF]

[7] Kumar, Sanjeev and Christopher Wilkerson, Exploiting Spatial Locality in Data Caches using Spatial Footprints, Princeton University and Microcomputer Research Labs (Oregon), 1998.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.1550&rep=rep1&type=pdf PDF]

[8] Jin, Shudong and Azer Bestavros, Sources and Characteristics of Web Temporal Locality, Computer Science Depratment - Boston University, Boston. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5941&rep=rep1&type=pdf PDF]

[9] Agarwal, Vikas et al., Clock Rate versus IPS: The End of the Road for Conventional Microarhitechtures, University of Texas, Austin, 2000.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.3694&rep=rep1&type=pdf PDF]

[10] Tuomi, Ilkka, The Lives and Death of Moore's Law, 2002.[http://131.193.153.231/www/issues/issue7_11/tuomi/ HTML]

[11] BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP) (2003), pp. 164–177.

[12] LARUS, J., AND PARKES, M. Using Cohort-Scheduling to Enhance Server Performance. In Proceedings of the annual conference on USENIX Annual Technical Conference (ATEC) (2002), pp. 103–114.

[13] ARON, M., AND DRUSCHEL, P. Soft timers: efficient microsecond software timer support for network processing. ACM Trans. Comput. Syst. (TOCS) 18, 3 (2000), 197–228.

[14] DRUSCHEL, P., AND BANGA, G. Lazy receiver processing (LRP): a network subsystem architecture for server systems. In Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation (OSDI) (1996), pp. 261–275.

[15] CHAKRABORTY, K., WELLS, P. M., AND SOHI, G. S. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2006), pp. 283–292.

[16] Vasudevan, Vijay. Improving Datacenter Energy Efficiency Using a Fast Array of Wimpy Nodes, Thesis Proposal, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, October 12, 2010.[http://www.cs.cmu.edu/~vrv/proposal/vijay_thesis_proposal.pdf PDF]

[17] Patricia J. Teller Translation-Lookaside Buffer Consistency, Journal Volume 23 Issue 6, IBM T. J. Watson Research Center, Yorktown Heights, NY, June 1990. [http://dx.doi.org/10.1109/2.55498 HTML]

[18] Linux ABI sourceforge page. [http://linux-abi.sourceforge.net/ HTML] and Linux application page. [http://www.linux.org/apps/AppId_8088.html HTML]

[19] DREPPER, U., AND MOLNAR , I. The Native POSIX Thread Library for Linux. Tech. rep., RedHat Inc, 2003. [http://people.redhat.com/drepper/nptl-design.pdf HTML]

[20] M. Brian Blake, Coordinating Multiple Agents for Workflow-Oriented Process Orchestration. Information Systems and e-Business Management Journal, Springer-Verlag, December 2003. [http://www.cs.georgetown.edu/~blakeb/pubs/blake_ISEB2003.pdf PDF]

[21] DeveloperWorks, Kernel Command using Linux System Calls IBM,2010.[http://www.ibm.com/developerworks/linux/library/l-system-calls/ ]

Talk:COMP 3000 Essay 2 2010 Question 3

2010-11-11T17:00:12Z

Brobson: Created page with "=Group 3 Essay= Hello everyone, please post your contact information here: Ben Robson [mailto:brobson@connect.carleton.ca brobson@connect.carleton.ca] ==Question 3 Group== *A…"

=Group 3 Essay=

Hello everyone, please post your contact information here:

Ben Robson [mailto:brobson@connect.carleton.ca brobson@connect.carleton.ca]

==Question 3 Group==
*Abdul-Fatah Tawfic tafatah
*Arteaga Reynaldo rarteaga
*Faibish Corey cfaibish
*Lawrence Wesley wlawrenc
*Preston Mike mpreston
*Robson Benjamin brobson
*Sun Fangchen sfangche

COMP 3000 Essay 2 2010 Question 3

2010-11-11T16:54:13Z

Brobson: Created page with "3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls"

3.FlexSC: Flexible System Call Scheduling with Exception-Less System Calls

Talk:COMP 3000 Essay 1 2010 Question 3

2010-10-14T18:25:04Z

Brobson: /* Redundancy */

== Group 3 ==
Here's my email I'll add some of the stuff I find soon I'm just saving the question for last.
Andrew Bown(abown2@connect.carleton.ca)

I'm not sure if this is totally relevant, oh well.
-First time sharing system CTSS (Compatible Time Sharing System) in the 1950s. Created at MIT
http://www.kernelthread.com/publications/virtualization/

-achamney@connect.carleton.ca

Here's my contact info (qzhang13@connect.carleton.ca)
An article about the mainframe.
-Mainframe Migration http://www.microsoft.com/windowsserver/mainframe/migration.mspx

-[[User:Zhangqi|Zhangqi]] 15:02, 7 October 2010 (UTC)

Here's my contact information, look forward to working with everyone. - Ben Robson (brobson@connect.carleton.ca)

Hey, Here's my contact info, nshires@connect.carleton.ca, I'll have some sources posted by the weekend hopefully

Hey guys i'm not in your group but I found some useful information that could help you
http://en.wikipedia.org/wiki/Mainframe_computer i know we are not suppose to use wiki references but its a good place to start

Okay found an article paper titled called"Mainframe Scalability in the Windows Environment"
http://new.cmg.org/proceedings/2003/3023.pdf (required registration to access but is free)~ Andrew (abown2@connect.carleton.ca)sometime friday.

Folks, remember to do your discussions here. Use four tildes to sign your entries, that adds time and date. Email discussions won't count towards your participation grade...
[[User:Soma|Anil]] 15:43, 8 October 2010 (UTC)

Okay going to break the essay into points paragraphs on the main page which people can choose one paragraph to write. Then after all paragraphs are written we will communally edit it to have a cohesive voice. It is the only way I can viably think of to properly distribute the work. ~Andrew (abown2@connect.carleton.ca) 11:00 am, 10 October 2010.

Link to IBMs info on their mainframes --[[User:Lmundt|Lmundt]] 19:58, 7 October 2010 (UTC)
http://publib.boulder.ibm.com/infocenter/zos/basics/index.jsp?topic=/com.ibm.zos.zmainframe/zconc_valueofmf.htm

Just made the revelation that when trying to find information on the Windows equivalent to mainframe is refered to as '''clustering''' which should help finding information.
Here's the wiki article on the technology for an overview http://en.wikipedia.org/wiki/Microsoft_Cluster_Server ~ Andrew (abown2@connect.carleton.ca

hey,I agree with Andrew's idea. We should break the essay into several sections and work it together.From my point of view, I think we should focus on how Windows provide the mainframe functionality and the VMware and EMC's storage should be our examples. As listed on the main page, there are many advantages and disadvantages of the mainframe.But where is Windows? I'm confused...
In my opinion, the first paragraph can introduct the mainframe (such as the history,features,application,etc) and what mainframe-equivalent functionality Windows support. Then we can use some paragraphs to discuss the functionalities in details. And VMware and EMC's storage solution also can be involved in this part. At last we make a conclusion of the whloe essay. Do you think it's feasible?

--[[User:Zhangqi|Zhangqi]] 02:12, 11 October 2010 (UTC)

Ah but the question isn't the pros and cons of each. It is how to get mainframe functionality from a Windows Operating System. How I split up the essay has each paragraph focusing on one aspect of mainframes and how it can be duplicated in windows either with windows tools or 3rd party software. You don't need to go into the history or applications of mainframes since that is not required by the phrasing of the question.

~ Andrew Bown, 11:28 AM, October 11th 2010

Okay, I think I catch your meaning. So now we should do is to edit the content of each paragragh as soon as possible. Time is limited.

--[[User:Zhangqi|Zhangqi]] 19:57, 11 October 2010 (UTC)

If you guys are looking for an authoritative source on how Windows works, I *highly* recommend checking out "Window Internals 4th Edition" or "Windows Internals 5th Edition" by Mark Russinovich and David Solomon.

--[[User:3maisons|3maisons]] 18:59, 12 October 2010 (UTC)

OLD VERSION - Here for the time being while optimizing some sections --[[User:Dkrutsko|Dkrutsko]] 00:20, 14 October 2010 (UTC)

=Answer=
added introduction points and sections for each paragraph so you guys can edit one paragraph at a time instead of the whole document. If you want to claim a certain paragram just put your name into the section first. ~ Andrew (abown2@connect.carleton.ca) 12:00 10th of October 2010

== Introduction ==
Main Aspects of mainframes:
* redundancy which enables high reliability and security
* high input/output
* backwards-compatibility with legacy software
* support massive throughput
* Systems run constantly so they can be hot upgraded
http://www.exforsys.com/tutorials/mainframe/mainframe-features.html

Linking sentence about how windows can duplicate mainframe functionality.

here's the introduction ~ Abown (11:12 pm, October 12th 2010) 
Thanks Abown, just tweaked a couple of the sentences to improve flow [[User:Achamney|Achamney]] 01:13, 14 October 2010 (UTC)

Also, i removed this statement "Unfortunately, computers are only able to process data as fast as they can receive it". I couldn't find a good place to plug it in.

Mainframes have been always used for large corporations to process thousands of small transactions, but what strengths allow for mainframes to be useful in their purpose. Mainframes are extremely useful in business because they are designed to run without downtime. This is achieved by having tremendous redundancy which allows for mainframes to be extremely reliable. This also gives security when concerning data loss due to downtime. Mainframes can be upgraded without taking the system down to allow for repairs, which further increases reliability. After upgrading a mainframe, however, the software does not change, so they can offer the features of backwards compatibility through virtualization; software never needs to be replaced. Mainframes support high input/output so that the mainframe is always being utilized. To make sure mainframes are utilized to their fullest, they support powerful schedulers which ensure the fastest throughput for processing transactions as fast as possible. [http://www.exforsys.com/tutorials/mainframe/mainframe-features.html] With so many features, how are Windows based systems supposed to compete with a mainframe? The fact of the matter is that there are features in Windows, and software solutions which can duplicate these features in a Windows environment. Be it redundancy, real-time upgrading, virtualization, high input/output or utilizing resources.

Using this paragraph and my solution on the assignment I was able to expand on this topic. It is in the main page at the moment, see if you like it, add anything you think I missed --[[User:Dkrutsko|Dkrutsko]] 05:17, 14 October 2010 (UTC)

== History ==
Before comparing Windows systems and mainframes, the history of what mainframes were used for and where they came from must be understood. The first official mainframe computer was the UNIVAC I. [http://www.vikingwaters.com/htmlpages/MFHistory.htm] It was designed for the U.S. Census Bureau by J. Presper Eckert and John Mauchly. [http://www.thocp.net/hardware/univac.htm]. By this point in history, there were no personal computers, and the only people who could afford a computer were massive businesses. The main functionality of these mainframes were to calculate company payrolls, sales records, analyze sales performance, and store all company information. 
[[User:Achamney|Achamney]] 01:30, 12 October 2010 (UTC)

This doesn't seem to actually be pertinent to the question at hand. Question does not have any indication of the need to provide a history. [[User:Abown|Andrew Bown]] 11:16, 12 October 2010

I have to agree this doesn't seem relevant to the question. --[[User:Dkrutsko|Dkrutsko]] 00:10, 14 October 2010 (UTC)

== Redundancy ==
[[User:Nshires|Nshires]] 04:10, 13 October 2010 (UTC)
A large feature of mainframes is their ability for redundancy. Mainframes produce redundancy by using the provider's off-site redundancy faeture. This feature lets the customer move all of their processes and applications onto the providers mainframe while the provider makes repairs on the customers system. Another way that mainframes create redundancy is their use of multi-processors that share the same memory. If one processor dies, the rest of the processors still keep all of the cache. There are multiple ways windows systems can create this redundancy feature that mainframes have. The first way windows systems can create this is by creating a windows cluster server. The cluster uses the same feature of the mainframe's multi-processor system. Another way windows systems can create redundancy is by using virtual machines. VMWare has a feature called Microsoft Cluster Service, which allows users to create a cluster of virtual machines on one physical windows system (or multiple physical machines). The virtual machines set up two different networks. They create a private network for communication in between the virtual machines and then a public network to control I/O services. The virtual machines also share storage to create concurrency so that if one fails, the other still has all of the data.

(this is what I've gotten out of some researching so far, comments and any edits/suggestions if I'm on the right track or not are greatly apreciated :) )
*note: This is the second time I have written this, make sure to save whatever you edit in notepad or whatever first so that you don't lose everything*

link to VMWare's cluster virtualization http://www.vmware.com/pdf/vsphere4/r40/vsp_40_mscs.pdf

[[User:Nshires|Nshires]] 04:10, 13 October 2010 (UTC)

:I'll attempt to re-write this paragraph for clarity and accuracy:

:A feature provided by mainframes is their ability to create redundancy in terms of data storage and parallel processing. Windows can mimic expandable storage and storage redundancy through out-sourced storage solutions.

:Processing redundancy for Windows can be created through the Microsoft Cluster Service (MSCS). This service allows multiple Windows machines to be connected as nodes in a cluster; where each node has the same applications and only one node is online at any point in time. If a node in the cluster fails, another will take over. The failing node can then be restarted or replaced without serious downtime. However this service does not offer fault tolerance to the same extent as actual mainframes.

:Source: http://msdn.microsoft.com/en-us/library/ms952401.aspx

:Virtual machine nodes can be used in place of physical machine nodes in a cluster, providing redundant application services to end-users. If the a virtual machine fails, other virtual machines can take over, if the failure is on the Windows host machine then they will all fail. The virtual cluster can be maintained across multiple machines, allowing multiple users to have the reliability of clusters on fewer machines.

:Let me know what you think.
:[[User:Brobson|Brobson]] 18:25, 14 October 2010 (UTC)

== hot swapping ==
[[User:Nshires|Nshires]] 16:47, 13 October 2010 (UTC)
Another useful feature that mainframes have is the ability to hot-swap. Hot-swapping occurs when there is faulty hardware in one of the processors inside the mainframe and technicians are able to swap out this component without the mainframe being turned off or crashing. Hot-swapping is also used when upgrading processors inside the mainframe. With the right software and setup (redundancy) a mainframe is able to upgrade and/or repair their mainframe as they see fit. Using VMWare on a Windows system allows users to hot-add RAM and hot-plug adds a new virtual CPU to the virtualized system. Using these hot-adding and hot-plugging techniques the virtual computer can grow in size to be able to accept loads varying in size. In non-virtual systems, Windows coupled with the program Go-HotSwap can hot-plug CompactPCI components. CompactPCI components allow many different devices to be plugged into their slots (e.g. multiple SATA hard drives) which makes a Windows system with these technologies very modular.

These are the concepts I've been able to figure out so far about hot-swapping/hot-upgrading, feel free to add/edit and what-not!

Sources:
http://searchvmware.techtarget.com/tip/0,289483,sid179_gci1367631,00.html
http://www.jungo.com/st/hotswap_windows.html
[[User:Nshires|Nshires]] 16:47, 13 October 2010 (UTC)

:According to your searchvmware.techtarget.com source, a processor cannot be hot-plugged in the truest sense of the word in that the hardware needs to be rebooted to recognize the added hardware. Hot-swapping demands zero downtime.
:If you don't mind me suggesting but I don't think this section should be referring to the hot-swapping/hot-adding/or hot-plugging of virtual machines or client machines of the mainframe. I think for hot-swapping we should focus on the hot-swapping of hardware components. As such we can point out that Windows does support mainframe-level hot-swapping with its Windows Server 2008 R2 Datacenter OS
:<blockquote>"Hot Add/Replace Memory and Processors with supporting hardware"</blockquote> http://www.microsoft.com/windowsserver2008/en/us/2008-dc.aspx

:If we are only consider the capabilities of the PC OS, then Windows only supports plug and play devices, such as external hard drives, and does not support RAM or CPU hot-swap.

:I'm also wondering if this should tie into scalability of a mainframe or if scalability should have it's own section.
:[[User:Brobson|Brobson]] 17:12, 14 October 2010 (UTC)

== backwards-compatibility ==
Backwards-compatibility means that the newer software version can recognize what the old version write and how it work. It is a relationship between the two versions. If the new components provide all the functionality of the old one, we said that the new component is backwards compatible.In computer mainframe era, many applications are backwards compatible.For example,the code written 20 years ago in IBM System/360 can be run in latest mainframe (like zSeries, System/390 family,System z9,etc).This because that models in mainframe computer provide a combination of special hardware,special microcode and an emulation program to simulate the target system.(The IBM 7080 transistorized computer was backward compatible with all models of the IBM 705 vacuum tube computer.) Sometimes mainframe also need customers to halt the computer and download the emulation program.

In Windows OS,one method to implement backwards-compatibility is to add applications.Like Microsoft Windows Application Compatibility Toolkit.This application can make the platfrom to be compatible with most softwares from early version.The second method is the Windows Operating Systems usually have various subsystems.The software originally designed for older version or other OSs can be run in the subsystems.Such as Window NT, it has MS-DOS and Win16 subsystems.But Windows 7's backwards-compatibility is not very good.If kernel is different, the OSs can't be compatible with each other.But it doesn't mean that older programs won't run, virtualization will be used to make them run.The third method is to use shims to create the backwards-compatibility.Shims are just like the small libraries that can intercept the API, change parameters passed,handle and redirect the operations. In Windows OS,we can use shims to stimulate the behaviors of old version OS for legacy softwares.

--[[User:Zhangqi|Zhangqi]] 08:34, 13 October 2010 (UTC)

ps. I didn't find perfect resources,just these.If you guys think any opinion is not correct,plz edit it or give suggestions :)

http://www.windows7news.com/2008/05/23/windows-7-to-break-backwards-compatibility/

http://computersight.com/computers/mainframe-computers/

Hey, this sounds really good, I'd add an example where you say 'one method to implement backward-compatibility is to add applications'.
And I did a little research and I found another way to create backwards compatibility using shims: http://en.wikipedia.org/wiki/Shim_%28computing%29
it pretty much intercepts the calls and changes them so that the old program can run on a new system.
Good Work, [[User:Nshires|Nshires]] 16:56, 13 October 2010 (UTC)

Thanks for your suggetions.I have added some information to the paragraph.:)
--[[User:Zhangqi|Zhangqi]] 00:24, 14 October 2010 (UTC)

== High input/output ==
~Andrew Bown (October 13 2:08) I'll write this paragraph.
I don't have time to write this before work(12-5) but I can put out the information i got already with research so if someone could help me complete this that it would be awesome since I have to finish up my 3004 document as well tonight.
~[User:Abown|Andrew Bown] (October 14th 11:12am)
Mainframes are able to achieve high/input output rates with their specialized Message Passing Interfaces (MPIs) which allow for fast intercommunication by sharing memory in between the different cores.https://www.mpitech.com/mpitech.nsf/pages/mainframe-&-AS400-printing_en.html

The latest versions of Windows clusters support a Microsoft created MPI surprisingly called Microsoft MPI[http://msdn.microsoft.com/en-us/library/bb524831(VS.85).aspx].

Microsoft's MPI is based off the MPICH2 explanation here:http://www.springerlink.com/content/hc4nyva6dvg6vdpp/

'''Looking at the details of the Microsoft MPI only runs if a process is put into the Microsoft Job Scheduler. So we may want to combine input/ouput and throughtput.'''

== Massive Throughput ==
[[User:Achamney|Achamney]] 01:09, 14 October 2010 (UTC) 
I can grab this section.

Throughput, unlike input and output, is the measurement of the number of calculations per second that a machine can preform. This is usually measured in FLOPS (floating point logical operations per second). It is impossible for one sole Windows machine to compete with a mainframe's throughput. Not only do mainframe processors have extremely high frequencies, but they also have a considerable amount of cores. This all changes, however, when computer clustering is introduced. In the recent years, IBM has constructed a clustered system called The Roadrunner that ranks third in the TOP500 supercomputer list as of June 2010. It has a total of 60 connected units, over a thousand processors, and the capability of computing at a rate of 1.7 petaflops. The question is, with such complex hardware, how is it possible for any sort of software to use this clustered system? Luckily, Windows has introduced an OS called Windows Compute Cluster Server, which provides the necessary software to allow the main computer to utilize the computing power of its cluster nodes.

[http://webcache.googleusercontent.com/search?q=cache:EPlDExBxmDYJ:download.microsoft.com/download/9/e/d/9edcdeab-f1fb-4670-8914-c08c5c6f22a5/HPC_Overview.doc+Windows+Compute+Cluster+Server&cd=1&hl=en&ct=clnk&gl=ca&client=firefox-a]
[http://hubpages.com/hub/Most-Powerful-Computers-In-The-World]
[http://publib.boulder.ibm.com/infocenter/tpfhelp/current/index.jsp?topic=/com.ibm.ztpf-ztpfdf.doc_put.cur/gtpc3/c3thru.html]
[http://searchcio-midmarket.techtarget.com/sDefinition/0,,sid183_gci213140,00.html]

Talk:COMP 3000 Essay 1 2010 Question 3

2010-10-14T17:12:48Z

Brobson: /* hot swapping */

== Group 3 ==
Here's my email I'll add some of the stuff I find soon I'm just saving the question for last.
Andrew Bown(abown2@connect.carleton.ca)

I'm not sure if this is totally relevant, oh well.
-First time sharing system CTSS (Compatible Time Sharing System) in the 1950s. Created at MIT
http://www.kernelthread.com/publications/virtualization/

-achamney@connect.carleton.ca

Here's my contact info (qzhang13@connect.carleton.ca)
An article about the mainframe.
-Mainframe Migration http://www.microsoft.com/windowsserver/mainframe/migration.mspx

-[[User:Zhangqi|Zhangqi]] 15:02, 7 October 2010 (UTC)

Here's my contact information, look forward to working with everyone. - Ben Robson (brobson@connect.carleton.ca)

Hey, Here's my contact info, nshires@connect.carleton.ca, I'll have some sources posted by the weekend hopefully

Hey guys i'm not in your group but I found some useful information that could help you
http://en.wikipedia.org/wiki/Mainframe_computer i know we are not suppose to use wiki references but its a good place to start

Okay found an article paper titled called"Mainframe Scalability in the Windows Environment"
http://new.cmg.org/proceedings/2003/3023.pdf (required registration to access but is free)~ Andrew (abown2@connect.carleton.ca)sometime friday.

Folks, remember to do your discussions here. Use four tildes to sign your entries, that adds time and date. Email discussions won't count towards your participation grade...
[[User:Soma|Anil]] 15:43, 8 October 2010 (UTC)

Okay going to break the essay into points paragraphs on the main page which people can choose one paragraph to write. Then after all paragraphs are written we will communally edit it to have a cohesive voice. It is the only way I can viably think of to properly distribute the work. ~Andrew (abown2@connect.carleton.ca) 11:00 am, 10 October 2010.

Link to IBMs info on their mainframes --[[User:Lmundt|Lmundt]] 19:58, 7 October 2010 (UTC)
http://publib.boulder.ibm.com/infocenter/zos/basics/index.jsp?topic=/com.ibm.zos.zmainframe/zconc_valueofmf.htm

Just made the revelation that when trying to find information on the Windows equivalent to mainframe is refered to as '''clustering''' which should help finding information.
Here's the wiki article on the technology for an overview http://en.wikipedia.org/wiki/Microsoft_Cluster_Server ~ Andrew (abown2@connect.carleton.ca

hey,I agree with Andrew's idea. We should break the essay into several sections and work it together.From my point of view, I think we should focus on how Windows provide the mainframe functionality and the VMware and EMC's storage should be our examples. As listed on the main page, there are many advantages and disadvantages of the mainframe.But where is Windows? I'm confused...
In my opinion, the first paragraph can introduct the mainframe (such as the history,features,application,etc) and what mainframe-equivalent functionality Windows support. Then we can use some paragraphs to discuss the functionalities in details. And VMware and EMC's storage solution also can be involved in this part. At last we make a conclusion of the whloe essay. Do you think it's feasible?

--[[User:Zhangqi|Zhangqi]] 02:12, 11 October 2010 (UTC)

Ah but the question isn't the pros and cons of each. It is how to get mainframe functionality from a Windows Operating System. How I split up the essay has each paragraph focusing on one aspect of mainframes and how it can be duplicated in windows either with windows tools or 3rd party software. You don't need to go into the history or applications of mainframes since that is not required by the phrasing of the question.

~ Andrew Bown, 11:28 AM, October 11th 2010

Okay, I think I catch your meaning. So now we should do is to edit the content of each paragragh as soon as possible. Time is limited.

--[[User:Zhangqi|Zhangqi]] 19:57, 11 October 2010 (UTC)

If you guys are looking for an authoritative source on how Windows works, I *highly* recommend checking out "Window Internals 4th Edition" or "Windows Internals 5th Edition" by Mark Russinovich and David Solomon.

--[[User:3maisons|3maisons]] 18:59, 12 October 2010 (UTC)

OLD VERSION - Here for the time being while optimizing some sections --[[User:Dkrutsko|Dkrutsko]] 00:20, 14 October 2010 (UTC)

=Answer=
added introduction points and sections for each paragraph so you guys can edit one paragraph at a time instead of the whole document. If you want to claim a certain paragram just put your name into the section first. ~ Andrew (abown2@connect.carleton.ca) 12:00 10th of October 2010

== Introduction ==
Main Aspects of mainframes:
* redundancy which enables high reliability and security
* high input/output
* backwards-compatibility with legacy software
* support massive throughput
* Systems run constantly so they can be hot upgraded
http://www.exforsys.com/tutorials/mainframe/mainframe-features.html

Linking sentence about how windows can duplicate mainframe functionality.

here's the introduction ~ Abown (11:12 pm, October 12th 2010) 
Thanks Abown, just tweaked a couple of the sentences to improve flow [[User:Achamney|Achamney]] 01:13, 14 October 2010 (UTC)

Also, i removed this statement "Unfortunately, computers are only able to process data as fast as they can receive it". I couldn't find a good place to plug it in.

Mainframes have been always used for large corporations to process thousands of small transactions, but what strengths allow for mainframes to be useful in their purpose. Mainframes are extremely useful in business because they are designed to run without downtime. This is achieved by having tremendous redundancy which allows for mainframes to be extremely reliable. This also gives security when concerning data loss due to downtime. Mainframes can be upgraded without taking the system down to allow for repairs, which further increases reliability. After upgrading a mainframe, however, the software does not change, so they can offer the features of backwards compatibility through virtualization; software never needs to be replaced. Mainframes support high input/output so that the mainframe is always being utilized. To make sure mainframes are utilized to their fullest, they support powerful schedulers which ensure the fastest throughput for processing transactions as fast as possible. [http://www.exforsys.com/tutorials/mainframe/mainframe-features.html] With so many features, how are Windows based systems supposed to compete with a mainframe? The fact of the matter is that there are features in Windows, and software solutions which can duplicate these features in a Windows environment. Be it redundancy, real-time upgrading, virtualization, high input/output or utilizing resources.

Using this paragraph and my solution on the assignment I was able to expand on this topic. It is in the main page at the moment, see if you like it, add anything you think I missed --[[User:Dkrutsko|Dkrutsko]] 05:17, 14 October 2010 (UTC)

== History ==
Before comparing Windows systems and mainframes, the history of what mainframes were used for and where they came from must be understood. The first official mainframe computer was the UNIVAC I. [http://www.vikingwaters.com/htmlpages/MFHistory.htm] It was designed for the U.S. Census Bureau by J. Presper Eckert and John Mauchly. [http://www.thocp.net/hardware/univac.htm]. By this point in history, there were no personal computers, and the only people who could afford a computer were massive businesses. The main functionality of these mainframes were to calculate company payrolls, sales records, analyze sales performance, and store all company information. 
[[User:Achamney|Achamney]] 01:30, 12 October 2010 (UTC)

This doesn't seem to actually be pertinent to the question at hand. Question does not have any indication of the need to provide a history. [[User:Abown|Andrew Bown]] 11:16, 12 October 2010

I have to agree this doesn't seem relevant to the question. --[[User:Dkrutsko|Dkrutsko]] 00:10, 14 October 2010 (UTC)

== Redundancy ==
[[User:Nshires|Nshires]] 04:10, 13 October 2010 (UTC)
A large feature of mainframes is their ability for redundancy. Mainframes produce redundancy by using the provider's off-site redundancy faeture. This feature lets the customer move all of their processes and applications onto the providers mainframe while the provider makes repairs on the customers system. Another way that mainframes create redundancy is their use of multi-processors that share the same memory. If one processor dies, the rest of the processors still keep all of the cache. There are multiple ways windows systems can create this redundancy feature that mainframes have. The first way windows systems can create this is by creating a windows cluster server. The cluster uses the same feature of the mainframe's multi-processor system. Another way windows systems can create redundancy is by using virtual machines. VMWare has a feature called Microsoft Cluster Service, which allows users to create a cluster of virtual machines on one physical windows system (or multiple physical machines). The virtual machines set up two different networks. They create a private network for communication in between the virtual machines and then a public network to control I/O services. The virtual machines also share storage to create concurrency so that if one fails, the other still has all of the data.

(this is what I've gotten out of some researching so far, comments and any edits/suggestions if I'm on the right track or not are greatly apreciated :) )
*note: This is the second time I have written this, make sure to save whatever you edit in notepad or whatever first so that you don't lose everything*

link to VMWare's cluster virtualization http://www.vmware.com/pdf/vsphere4/r40/vsp_40_mscs.pdf

[[User:Nshires|Nshires]] 04:10, 13 October 2010 (UTC)

== hot swapping ==
[[User:Nshires|Nshires]] 16:47, 13 October 2010 (UTC)
Another useful feature that mainframes have is the ability to hot-swap. Hot-swapping occurs when there is faulty hardware in one of the processors inside the mainframe and technicians are able to swap out this component without the mainframe being turned off or crashing. Hot-swapping is also used when upgrading processors inside the mainframe. With the right software and setup (redundancy) a mainframe is able to upgrade and/or repair their mainframe as they see fit. Using VMWare on a Windows system allows users to hot-add RAM and hot-plug adds a new virtual CPU to the virtualized system. Using these hot-adding and hot-plugging techniques the virtual computer can grow in size to be able to accept loads varying in size. In non-virtual systems, Windows coupled with the program Go-HotSwap can hot-plug CompactPCI components. CompactPCI components allow many different devices to be plugged into their slots (e.g. multiple SATA hard drives) which makes a Windows system with these technologies very modular.

These are the concepts I've been able to figure out so far about hot-swapping/hot-upgrading, feel free to add/edit and what-not!

Sources:
http://searchvmware.techtarget.com/tip/0,289483,sid179_gci1367631,00.html
http://www.jungo.com/st/hotswap_windows.html
[[User:Nshires|Nshires]] 16:47, 13 October 2010 (UTC)

:According to your searchvmware.techtarget.com source, a processor cannot be hot-plugged in the truest sense of the word in that the hardware needs to be rebooted to recognize the added hardware. Hot-swapping demands zero downtime.
:If you don't mind me suggesting but I don't think this section should be referring to the hot-swapping/hot-adding/or hot-plugging of virtual machines or client machines of the mainframe. I think for hot-swapping we should focus on the hot-swapping of hardware components. As such we can point out that Windows does support mainframe-level hot-swapping with its Windows Server 2008 R2 Datacenter OS
:<blockquote>"Hot Add/Replace Memory and Processors with supporting hardware"</blockquote> http://www.microsoft.com/windowsserver2008/en/us/2008-dc.aspx

:If we are only consider the capabilities of the PC OS, then Windows only supports plug and play devices, such as external hard drives, and does not support RAM or CPU hot-swap.

:I'm also wondering if this should tie into scalability of a mainframe or if scalability should have it's own section.
:[[User:Brobson|Brobson]] 17:12, 14 October 2010 (UTC)

== backwards-compatibility ==
Backwards-compatibility means that the newer software version can recognize what the old version write and how it work. It is a relationship between the two versions. If the new components provide all the functionality of the old one, we said that the new component is backwards compatible.In computer mainframe era, many applications are backwards compatible.For example,the code written 20 years ago in IBM System/360 can be run in latest mainframe (like zSeries, System/390 family,System z9,etc).This because that models in mainframe computer provide a combination of special hardware,special microcode and an emulation program to simulate the target system.(The IBM 7080 transistorized computer was backward compatible with all models of the IBM 705 vacuum tube computer.) Sometimes mainframe also need customers to halt the computer and download the emulation program.

In Windows OS,one method to implement backwards-compatibility is to add applications.Like Microsoft Windows Application Compatibility Toolkit.This application can make the platfrom to be compatible with most softwares from early version.The second method is the Windows Operating Systems usually have various subsystems.The software originally designed for older version or other OSs can be run in the subsystems.Such as Window NT, it has MS-DOS and Win16 subsystems.But Windows 7's backwards-compatibility is not very good.If kernel is different, the OSs can't be compatible with each other.But it doesn't mean that older programs won't run, virtualization will be used to make them run.The third method is to use shims to create the backwards-compatibility.Shims are just like the small libraries that can intercept the API, change parameters passed,handle and redirect the operations. In Windows OS,we can use shims to stimulate the behaviors of old version OS for legacy softwares.

--[[User:Zhangqi|Zhangqi]] 08:34, 13 October 2010 (UTC)

ps. I didn't find perfect resources,just these.If you guys think any opinion is not correct,plz edit it or give suggestions :)

http://www.windows7news.com/2008/05/23/windows-7-to-break-backwards-compatibility/

http://computersight.com/computers/mainframe-computers/

Hey, this sounds really good, I'd add an example where you say 'one method to implement backward-compatibility is to add applications'.
And I did a little research and I found another way to create backwards compatibility using shims: http://en.wikipedia.org/wiki/Shim_%28computing%29
it pretty much intercepts the calls and changes them so that the old program can run on a new system.
Good Work, [[User:Nshires|Nshires]] 16:56, 13 October 2010 (UTC)

Thanks for your suggetions.I have added some information to the paragraph.:)
--[[User:Zhangqi|Zhangqi]] 00:24, 14 October 2010 (UTC)

== High input/output ==
~Andrew Bown (October 13 2:08) I'll write this paragraph.
I don't have time to write this before work(12-5) but I can put out the information i got already with research so if someone could help me complete this that it would be awesome since I have to finish up my 3004 document as well tonight.
~[User:Abown|Andrew Bown] (October 14th 11:12am)
Mainframes are able to achieve high/input output rates with their specialized Message Passing Interfaces (MPIs) which allow for fast intercommunication by sharing memory in between the different cores.https://www.mpitech.com/mpitech.nsf/pages/mainframe-&-AS400-printing_en.html

The latest versions of Windows clusters support a Microsoft created MPI surprisingly called Microsoft MPI[http://msdn.microsoft.com/en-us/library/bb524831(VS.85).aspx].

Microsoft's MPI is based off the MPICH2 explanation here:http://www.springerlink.com/content/hc4nyva6dvg6vdpp/

'''Looking at the details of the Microsoft MPI only runs if a process is put into the Microsoft Job Scheduler. So we may want to combine input/ouput and throughtput.'''

== Massive Throughput ==
[[User:Achamney|Achamney]] 01:09, 14 October 2010 (UTC) 
I can grab this section.

Throughput, unlike input and output, is the measurement of the number of calculations per second that a machine can preform. This is usually measured in FLOPS (floating point logical operations per second). It is impossible for one sole Windows machine to compete with a mainframe's throughput. Not only do mainframe processors have extremely high frequencies, but they also have a considerable amount of cores. This all changes, however, when computer clustering is introduced. In the recent years, IBM has constructed a clustered system called The Roadrunner that ranks third in the TOP500 supercomputer list as of June 2010. It has a total of 60 connected units, over a thousand processors, and the capability of computing at a rate of 1.7 petaflops. The question is, with such complex hardware, how is it possible for any sort of software to use this clustered system? Luckily, Windows has introduced an OS called Windows Compute Cluster Server, which provides the necessary software to allow the main computer to utilize the computing power of its cluster nodes.

[http://webcache.googleusercontent.com/search?q=cache:EPlDExBxmDYJ:download.microsoft.com/download/9/e/d/9edcdeab-f1fb-4670-8914-c08c5c6f22a5/HPC_Overview.doc+Windows+Compute+Cluster+Server&cd=1&hl=en&ct=clnk&gl=ca&client=firefox-a]
[http://hubpages.com/hub/Most-Powerful-Computers-In-The-World]
[http://publib.boulder.ibm.com/infocenter/tpfhelp/current/index.jsp?topic=/com.ibm.ztpf-ztpfdf.doc_put.cur/gtpc3/c3thru.html]
[http://searchcio-midmarket.techtarget.com/sDefinition/0,,sid183_gci213140,00.html]

Talk:COMP 3000 Essay 1 2010 Question 3

2010-10-07T15:51:19Z

Brobson: /* Group 3 */