<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://homeostasis.scs.carleton.ca/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=J+powers</id>
	<title>Soma-notes - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://homeostasis.scs.carleton.ca/wiki/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=J+powers"/>
	<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php/Special:Contributions/J_powers"/>
	<updated>2026-05-12T20:52:29Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.42.1</generator>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6980</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6980"/>
		<updated>2010-12-03T12:44:55Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Why is it any better than what came before? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates areas which may be susceptible to race condition errors and allows the race condition to be potentially fixed. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; An algorithm for proactively pausing (and potentially changing) a state of code that will likely lead to a Race condition. One &#039;&#039;evacuates&#039;&#039; code from an unsafe location in memory so that one can install an execution filter there and prevent a race condition. Code is &#039;&#039;evacuated&#039;&#039; by either pausing it before it enters ab area or allowing it to quickly leave if it is already there.   &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Execution Filters allow you inspect a request to use a block of data before and after any logic is executed on it. You can, in effect, filter what runs when, perhaps pausing one piece of code while another works on an area, preventing a race condition. &lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. It simply denies access when it is locked, and allows access when not locked. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Mutex (Mutual Exclusion) is an algorithm which prevents race conditions on a resource. Essentially, it forces any threads that are trying to access the resource to wait until the current thread accessing the resource has completed using it.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
&lt;br /&gt;
With the rise of multiple core systems, multi-threaded programs are often prone to race conditions. Races are hard to detect, test and debug. Many systems are designed to detect, reproduce and diagnose race conditions, but these do not directly address the actual race. This is normally dealt with via a software update which would often require a software restart and potentially introduces new bugs.[[#References | [7]]] Patches require you to be aware of the cause of the problem, and take time to produce, test, and install, leaving a potential user of the software waiting, potentially for months. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM. The goal being to quickly address the actual symptom, the race condition, in real time on live systems. This is in opposition to the conventional approach which focuses on the cause of the race condition, an unknown bug in the software. By targeting the race condition itself, the system is able to keep running without a software update or even a reset.&lt;br /&gt;
&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
Another similar system to LOOM is STUMP[[#References | [8]]]. STUMP is a system for releasing live updates for multi-threaded or single-threaded programs written in C. It has the ability to provide arbitrary patches to source code in running systems without requiring a reset. These patches require considerable annotation and preparation as source code modifications are considered to be unsafe. Unlike STUMP, LOOM does not operate on the source code and is considered more safe because of this. &lt;br /&gt;
&lt;br /&gt;
The most recent system for live updates to the kernels of Operating Systems is called Ksplice [[#References | [9]]]. It allows users to update the Linux kernel without resetting. This can be done either completely automatically if the code does not chance any data structures, or with on average 17 lines of code for an update that would otherwise require a reset. [[#References | [9]]] It does this by operating on the object layer.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests the advantages of LOOM were proven. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, rebooting a system is acceptable. However, servers often cannot reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct. All the while the vulnerability was exposed to Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both system updates and hot patches are that they are very difficult to properly develop, slow to implement, and result in potentially unsafe ad hoc solutions that are not scalable. &lt;br /&gt;
&lt;br /&gt;
Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay effectively convey their findings by staying focused on the thesis as well as supporting their topics with relevant examples and data. Examples throughout the paper, particularly the MySQL example, ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified to avoid confusing the reader with too much unnecessary information. The references throughout backup the reliability of the paper and let the reader to verify information from the sources.&lt;br /&gt;
&lt;br /&gt;
The essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The conclusion summarizes the paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
In terms of the technology that this paper serves to introduce, the offering is very strong. The authors make aggressive assertions stating the strengths of LOOM and back these assertions with unequivocal results through extensive testing. Specifically, the assertions made are that LOOM can dynamically locate and correct areas which may be susceptible to race conditions on live systems in real time, in a scalable, flexible, easy to use, safe to use, and fast to implement manor. The testing presented to support these assertions were nine real world race conditions of which each was successfully located and corrected, with little overhead.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that some of the examples are oversimplified. For example, Figure 9 attempts to represent the evacuation process. Unfortunately, this causes the problem to seem trivial.&lt;br /&gt;
&lt;br /&gt;
The writers are also biased towards LOOM. Although they do admit the limitations of LOOM, they do not elaborate any further. They promote LOOM without discussing possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] &amp;quot;Introduction to Hotpatching&amp;quot;.&#039;&#039; Microsoft TechNet &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 1st 2010. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx &amp;lt;http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx&amp;gt;].&lt;br /&gt;
&lt;br /&gt;
[2] &amp;quot;Introduction to Instrumentation and Tracing&amp;quot;. &#039;&#039; MSDN &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 2nd 2010. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx &amp;lt;http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx&amp;gt;] &lt;br /&gt;
&lt;br /&gt;
[3] Marshall, A. D. &amp;quot;Further Threads Programming:Synchronization.&amp;quot;. &#039;&#039; Cardiff School of Comp. Sci. and Info. &#039;&#039;. Cardiff University, 1999. Web. Accessed: Dec 2nd 2010. [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 &amp;lt;http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[4] &amp;quot;Description of race conditions and deadlocks&amp;quot;. &#039;&#039; Microsoft Support &#039;&#039;. Microsoft Corporation, December 6, 2006. Revision: 2.3. Web. Accessed: Dec 2nd 2010. [http://support.microsoft.com/kb/317723 &amp;lt;http://support.microsoft.com/kb/317723&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[5] Tanenbaum, A. S. Modern Operating Systems (3rd Edition), page 128, 2008. Print. &lt;br /&gt;
&lt;br /&gt;
[6] &amp;quot;QUIESCE Function.&amp;quot; &#039;&#039;IBM&#039;&#039; IBM Corporation, 2008. Web. Accessed: Dec 2nd 2010. [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm &amp;lt;http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[7] Lu, Shan; Park, Soyeon; Seo, Eunsoo; Zhou, Yuanyuan. &amp;quot;Learning from Mistakes — A Comprehensive Study on Real World Concurrency Bug Characteristics&amp;quot;. &#039;&#039; CiteSeerX &#039;&#039;. Dept. of Comp. Sci. at Univ. of Illinois, 2008. Web. Accessed Dec 2nd 2010. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203 &amp;lt;http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[8] Neamtiu, Iulian; Hicks, Michael. &amp;quot;Safe and Timely Dynamic Updates for Multi-threaded Programs&amp;quot;. &#039;&#039;ACM Digital Library&#039;&#039;. Association for Computing Machinery, 2009. Web. Accessed: Dec 2nd 2010. [http://portal.acm.org/citation.cfm?id=1542479 &amp;lt;http://portal.acm.org/citation.cfm?id=1542479&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[9] &amp;quot;Ksplice: Automatic Rebootless Kernel Updates&amp;quot;. &#039;&#039;Ksplice&#039;&#039;. Massachusetts Institute of Technology, April 2009. Web. Accessed Dec 2nd 2010. [http://www.ksplice.com/doc/ksplice.pdf &amp;lt;http://www.ksplice.com/doc/ksplice.pdf&amp;gt;]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=6617</id>
		<title>Talk:COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=6617"/>
		<updated>2010-12-03T01:26:25Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Maybe we can all add our names below so we know who&#039;s still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Group members:&lt;br /&gt;
&lt;br /&gt;
* Michael Yagi&lt;br /&gt;
* Nicolas Lessard&lt;br /&gt;
* Julie Powers&lt;br /&gt;
* Derek Langlois&lt;br /&gt;
* Dustin Martin&lt;br /&gt;
&lt;br /&gt;
Jeffrey Francom contacted me earlier so I know he is also still in the course. &amp;lt;strike&amp;gt;Now we are only waiting on Dustin Martin.&amp;lt;/strike&amp;gt; Everyone has been accounted for. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Just kicking things off. Feel free to make suggestions or change anything. --[[User:Myagi|Myagi]] 11:36, 17 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Edited and filled out the critique section. Edited a little bit here and there. --[[User:Afranco2|Afranco2]] 17:41, 22 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Moved stuff to the front page and cleaned up references. Still waiting for people to expand if possible. Also, spellcheck ;) --[[User:Myagi|Myagi]] 10:37, 24 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I worked on revising our report last night. There are still sections that have to be explained more clearly. I will work on those and try to remove our direct quotations as the Professor said that we should avoiding using them. I also noticed a few sentences were taken directly from the textbook (slight modifications). Those will need to be changed. &lt;br /&gt;
&lt;br /&gt;
I have a question related to what someone wrote. Does LOOM also find race conditions (if it does where is it mentioned in the paper)? [[User:J powers|J powers]] 19:54, 2 December 2010 (UTC)  &lt;br /&gt;
&lt;br /&gt;
-I can only see references to tools that find Race Conditions, not LOOM finding actual race conditions. It seems to be focused on fixing not finding them. I don&#039;t know if that&#039;s worth mentioning or not. --[[User:Nlessard|Nlessard]] 00:37, 3 December 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
I haven&#039;t done any work on the essay yet :( I&#039;m going to proof read the essay. I will also attempt to elaborate on some of the ideas as I see fit. -Dlangloi&lt;br /&gt;
&lt;br /&gt;
That is the impression I received from the paper as well Nicolas. Someone has implied that it does in our report. &lt;br /&gt;
&lt;br /&gt;
Unfortunately I do not think that I can edit what I planned to tonight. I have a in-class final tomorrow morning that I need to study for instead. I have some suggestions if anyone is interested. I do not think that the definitions for evacuation, execution filters and function quiescence are clear. The sentences that are copied from the book are in the semaphore definition (Very good idea to change this before it is marked). [[User:J powers|J powers]] 01:26, 3 December 2010 (UTC)   &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Essay==&lt;br /&gt;
===Paper===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;The paper&#039;s title, authors, and their affiliations. Include a link to the paper and any particularly helpful supplementary information.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
* Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
* Authors: Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
* Affiliations: Computer Science Department, Columbia University&lt;br /&gt;
* Supplementary Information: [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 Video], [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf Slides]&lt;br /&gt;
&lt;br /&gt;
===Background Concepts===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
-------------&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather find it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.  &lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manor.&lt;br /&gt;
-------------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
* Race Condition: &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [http://support.microsoft.com/kb/317723 Race Condition]&lt;br /&gt;
* Execution Filters: Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
* Hot Patches: &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx Hot Patching]&lt;br /&gt;
* Hybrid Instrumentation Engine: &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx Instrumentation]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
* Lock: A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 Mutex Locks]&lt;br /&gt;
* Mutex: Unable to be both true at the same time.&lt;br /&gt;
* Semaphore: &amp;quot;A semaphore is a protected variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment.&amp;quot; [http://en.wikipedia.org/wiki/Semaphore_%28programming%29 Semaphore]&lt;br /&gt;
&lt;br /&gt;
===Research problem===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What is the research problem being addressed by the paper? How does this problem relate to past related work?&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Problem being addressed==== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.&lt;br /&gt;
====Past related work====&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure. &lt;br /&gt;
===Contribution===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Current solution expressed====&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
-------------&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, this can sometimes be considered acceptable. However, server applications often do not have the luxury of being able to reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct, all the while the vulnerability was exposed to all Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both the system update and hot patch approach are they very difficult to properly development, slow to implement, and result in potential unsafe ad hoc solutions that are not scalable. Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
-------------&lt;br /&gt;
&lt;br /&gt;
===Critique===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Good====&lt;br /&gt;
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.&lt;br /&gt;
&lt;br /&gt;
The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
====Not-So-Good====&lt;br /&gt;
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.&lt;br /&gt;
&lt;br /&gt;
The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.&amp;lt;/blockquote&amp;gt;&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6499</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6499"/>
		<updated>2010-12-02T20:13:37Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Problem being addressed */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
&lt;br /&gt;
With the rise of multiple core systems, multi-threaded programs are often prone to race conditions. Races are hard to detect, test and debug. Many systems are designed to detect, reproduce and diagnose race conditions, but these do not directly address the actual race. This is normally dealt with via a software update which would often require a software restart and potentially introduces new bugs.[[#References | [7]]] Patches require you to be aware of the cause of the problem, and take time to produce, test, and install, leaving a potential user of the software waiting, potentially for months. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM. The Goal being to quickly address the actual symptom, the race condition, as opposed to the cause, an unknown bug in the software, allowing the system to keep running without a software update or a reset.&lt;br /&gt;
&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
Another similar system to LOOM is STUMP[[#References | [8]]]. STUMP is a system for releasing live updates for multi-threaded or single-threaded programs written in C. It has the ability to provide arbitrary patches to source code in running systems without requiring a reset. These patches require considerable annotation and preparation as source code modifications are considered to be unsafe. Unlike STUMP, LOOM does not operate on the source code and is considered more safe because of this. &lt;br /&gt;
&lt;br /&gt;
The most recent system for live updates to the kernels of Operating Systems is called Ksplice [[#References | [9]]]. It allows users to update the Linux kernel without resetting. This can be done either completely automatically if the code does not chance any data structures, or with on average 17 lines of code for an update that would otherwise require a reset. [[#References | [9]]] It does this by operating on the object layer.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests the advantages of LOOM were proven. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, rebooting a system is acceptable. However, servers often cannot reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct. All the while the vulnerability was exposed to Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both system updates and hot patches are that they are very difficult to properly develop, slow to implement, and result in potentially unsafe ad hoc solutions that are not scalable. &lt;br /&gt;
&lt;br /&gt;
Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay effectively convey their findings by staying focused on the thesis as well as supporting their topics with relevant examples and data. Examples throughout the paper, particularly the MySQL example, ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified to avoid confusing the reader with too much unnecessary information. The references throughout backup the reliability of the paper and let the reader to verify information from the sources.&lt;br /&gt;
&lt;br /&gt;
The essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The conclusion summarizes the paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that some of the examples are oversimplified. For example, Figure 9 attempts to represent the evacuation process. Unfortunately, this causes the problem to seem trivial.&lt;br /&gt;
&lt;br /&gt;
The writers are also biased towards LOOM. Although they do admit the limitations of LOOM, they do not elaborate any further. They promote LOOM without discussing possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] &amp;quot;Introduction to Hotpatching&amp;quot;.&#039;&#039; Microsoft TechNet &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 1st 2010. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx &amp;lt;http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx&amp;gt;].&lt;br /&gt;
&lt;br /&gt;
[2] &amp;quot;Introduction to Instrumentation and Tracing&amp;quot;. &#039;&#039; MSDN &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 2nd 2010. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx &amp;lt;http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx&amp;gt;] &lt;br /&gt;
&lt;br /&gt;
[3] Marshall, A. D. &amp;quot;Further Threads Programming:Synchronization.&amp;quot;. &#039;&#039; Cardiff School of Comp. Sci. and Info. &#039;&#039;. Cardiff University, 1999. Web. Accessed: Dec 2nd 2010. [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 &amp;lt;http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[4] &amp;quot;Description of race conditions and deadlocks&amp;quot;. &#039;&#039; Microsoft Support &#039;&#039;. Microsoft Corporation, December 6, 2006. Revision: 2.3. Web. Accessed: Dec 2nd 2010. [http://support.microsoft.com/kb/317723 &amp;lt;http://support.microsoft.com/kb/317723&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[5] Tanenbaum, A. S. Modern Operating Systems (3rd Edition), page 128, 2008. Print. &lt;br /&gt;
&lt;br /&gt;
[6] &amp;quot;QUIESCE Function.&amp;quot; &#039;IBM&#039; IBM Corporation, 2008. Web. Accessed: Dec 2nd 2010. [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm &amp;lt;http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[7] Lu, Shan; Park, Soyeon; Seo, Eunsoo; Zhou, Yuanyuan. &amp;quot;Learning from Mistakes — A Comprehensive Study on Real World Concurrency Bug Characteristics&amp;quot;. &#039;&#039; CiteSeerX &#039;&#039;. Dept. of Comp. Sci. at Univ. of Illinois, 2008. Web. Accessed Dec 2nd 2010. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203 &amp;lt;http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[8] Neamtiu, Iulian; Hicks, Michael. &amp;quot;Safe and Timely Dynamic Updates for Multi-threaded Programs&amp;quot;. &#039;&#039;ACM Digital Library&#039;&#039;. Association for Computing Machinery, 2009. Web. Accessed: Dec 2nd 2010. [http://portal.acm.org/citation.cfm?id=1542479 &amp;lt;http://portal.acm.org/citation.cfm?id=1542479&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[9] &amp;quot;Ksplice: Automatic Rebootless Kernel Updates&amp;quot;. &#039;&#039;Ksplice&#039;&#039;. Massachusetts Institute of Technology, April 2009. Web. Accessed Dec 2nd 2010. [http://www.ksplice.com/doc/ksplice.pdf &amp;lt;http://www.ksplice.com/doc/ksplice.pdf&amp;gt;]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6496</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6496"/>
		<updated>2010-12-02T20:11:37Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Related work */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
&lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. many systems exist to detect, reproduce and diagnose race conditions, but these do not directly address the actual race. This is normally dealt with via a software update which would often require a software restart and potentially introduces new bugs.[[#References | [7]]] patches require you to be aware of the cause of the problem, and take time to produce, test, and install, leaving a potential user of the software waiting, potentially for months. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM. The Goal being to quickly address the actual symptom, the race condition, as opposed to the cause, an unknown bug in the software, allowing the system to keep running without a software update or a reset. &lt;br /&gt;
&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
Another similar system to LOOM is STUMP[[#References | [8]]]. STUMP is a system for releasing live updates for multi-threaded or single-threaded programs written in C. It has the ability to provide arbitrary patches to source code in running systems without requiring a reset. These patches require considerable annotation and preparation as source code modifications are considered to be unsafe. Unlike STUMP, LOOM does not operate on the source code and is considered more safe because of this. &lt;br /&gt;
&lt;br /&gt;
The most recent system for live updates to the kernels of Operating Systems is called Ksplice [[#References | [9]]]. It allows users to update the Linux kernel without resetting. This can be done either completely automatically if the code does not chance any data structures, or with on average 17 lines of code for an update that would otherwise require a reset. [[#References | [9]]] It does this by operating on the object layer.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests the advantages of LOOM were proven. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, rebooting a system is acceptable. However, servers often cannot reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct. All the while the vulnerability was exposed to Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both system updates and hot patches are that they are very difficult to properly develop, slow to implement, and result in potentially unsafe ad hoc solutions that are not scalable. &lt;br /&gt;
&lt;br /&gt;
Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay effectively convey their findings by staying focused on the thesis as well as supporting their topics with relevant examples and data. Examples throughout the paper, particularly the MySQL example, ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified to avoid confusing the reader with too much unnecessary information. The references throughout backup the reliability of the paper and let the reader to verify information from the sources.&lt;br /&gt;
&lt;br /&gt;
The essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The conclusion summarizes the paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that some of the examples are oversimplified. For example, Figure 9 attempts to represent the evacuation process. Unfortunately, this causes the problem to seem trivial.&lt;br /&gt;
&lt;br /&gt;
The writers are also biased towards LOOM. Although they do admit the limitations of LOOM, they do not elaborate any further. They promote LOOM without discussing possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] &amp;quot;Introduction to Hotpatching&amp;quot;.&#039;&#039; Microsoft TechNet &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 1st 2010. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx &amp;lt;http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx&amp;gt;].&lt;br /&gt;
&lt;br /&gt;
[2] &amp;quot;Introduction to Instrumentation and Tracing&amp;quot;. &#039;&#039; MSDN &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 2nd 2010. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx &amp;lt;http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx&amp;gt;] &lt;br /&gt;
&lt;br /&gt;
[3] Marshall, A. D. &amp;quot;Further Threads Programming:Synchronization.&amp;quot;. &#039;&#039; Cardiff School of Comp. Sci. and Info. &#039;&#039;. Cardiff University, 1999. Web. Accessed: Dec 2nd 2010. [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 &amp;lt;http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[4] &amp;quot;Description of race conditions and deadlocks&amp;quot;. &#039;&#039; Microsoft Support &#039;&#039;. Microsoft Corporation, December 6, 2006. Revision: 2.3. Web. Accessed: Dec 2nd 2010. [http://support.microsoft.com/kb/317723 &amp;lt;http://support.microsoft.com/kb/317723&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[5] Tanenbaum, A. S. Modern Operating Systems (3rd Edition), page 128, 2008. Print. &lt;br /&gt;
&lt;br /&gt;
[6] &amp;quot;QUIESCE Function.&amp;quot; &#039;IBM&#039; IBM Corporation, 2008. Web. Accessed: Dec 2nd 2010. [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm &amp;lt;http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[7] Lu, Shan; Park, Soyeon; Seo, Eunsoo; Zhou, Yuanyuan. &amp;quot;Learning from Mistakes — A Comprehensive Study on Real World Concurrency Bug Characteristics&amp;quot;. &#039;&#039; CiteSeerX &#039;&#039;. Dept. of Comp. Sci. at Univ. of Illinois, 2008. Web. Accessed Dec 2nd 2010. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203 &amp;lt;http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[8] Neamtiu, Iulian; Hicks, Michael. &amp;quot;Safe and Timely Dynamic Updates for Multi-threaded Programs&amp;quot;. &#039;&#039;ACM Digital Library&#039;&#039;. Association for Computing Machinery, 2009. Web. Accessed: Dec 2nd 2010. [http://portal.acm.org/citation.cfm?id=1542479 &amp;lt;http://portal.acm.org/citation.cfm?id=1542479&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[9] &amp;quot;Ksplice: Automatic Rebootless Kernel Updates&amp;quot;. &#039;&#039;Ksplice&#039;&#039;. Massachusetts Institute of Technology, April 2009. Web. Accessed Dec 2nd 2010. [http://www.ksplice.com/doc/ksplice.pdf &amp;lt;http://www.ksplice.com/doc/ksplice.pdf&amp;gt;]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6493</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6493"/>
		<updated>2010-12-02T20:09:37Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Related work */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
&lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. many systems exist to detect, reproduce and diagnose race conditions, but these do not directly address the actual race. This is normally dealt with via a software update which would often require a software restart and potentially introduces new bugs.[[#References | [7]]] patches require you to be aware of the cause of the problem, and take time to produce, test, and install, leaving a potential user of the software waiting, potentially for months. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM. The Goal being to quickly address the actual symptom, the race condition, as opposed to the cause, an unknown bug in the software, allowing the system to keep running without a software update or a reset. &lt;br /&gt;
&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
Another similar system to LOOM is STUMP[[#References | [8]]]. STUMP is a system for releasing live updates for Multi- or single-threaded programs written in C. It has the ability to provide arbitrary patches to source code in running systems without requiring a reset. These patches require considerable annotation and preparation as source code modifications are considered to be unsafe. Unlike STUMP, LOOM does not operate on the source code and is considered more safe because of this. &lt;br /&gt;
&lt;br /&gt;
The most recent system for live updates to the kernels of Operating Systems is called Ksplice [[#References | [9]]]. It allows users to update the Linux kernel without resetting. This can be done either completely automatically if the code doesn&#039;t chance any data structures, or with on average 17 lines of code for an update that would otherwise require a reset. [[#References | [9]]] It does this by operating on the object layer.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests the advantages of LOOM were proven. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, rebooting a system is acceptable. However, servers often cannot reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct. All the while the vulnerability was exposed to Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both system updates and hot patches are that they are very difficult to properly develop, slow to implement, and result in potentially unsafe ad hoc solutions that are not scalable. &lt;br /&gt;
&lt;br /&gt;
Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay effectively convey their findings by staying focused on the thesis as well as supporting their topics with relevant examples and data. Examples throughout the paper, particularly the MySQL example, ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified to avoid confusing the reader with too much unnecessary information. The references throughout backup the reliability of the paper and let the reader to verify information from the sources.&lt;br /&gt;
&lt;br /&gt;
The essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The conclusion summarizes the paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that some of the examples are oversimplified. For example, Figure 9 attempts to represent the evacuation process. Unfortunately, this causes the problem to seem trivial.&lt;br /&gt;
&lt;br /&gt;
The writers are also biased towards LOOM. Although they do admit the limitations of LOOM, they do not elaborate any further. They promote LOOM without discussing possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] &amp;quot;Introduction to Hotpatching&amp;quot;.&#039;&#039; Microsoft TechNet &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 1st 2010. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx &amp;lt;http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx&amp;gt;].&lt;br /&gt;
&lt;br /&gt;
[2] &amp;quot;Introduction to Instrumentation and Tracing&amp;quot;. &#039;&#039; MSDN &#039;&#039;. Microsoft Corporation, 2010. Web. Accessed: Dec 2nd 2010. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx &amp;lt;http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx&amp;gt;] &lt;br /&gt;
&lt;br /&gt;
[3] Marshall, A. D. &amp;quot;Further Threads Programming:Synchronization.&amp;quot;. &#039;&#039; Cardiff School of Comp. Sci. and Info. &#039;&#039;. Cardiff University, 1999. Web. Accessed: Dec 2nd 2010. [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 &amp;lt;http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[4] &amp;quot;Description of race conditions and deadlocks&amp;quot;. &#039;&#039; Microsoft Support &#039;&#039;. Microsoft Corporation, December 6, 2006. Revision: 2.3. Web. Accessed: Dec 2nd 2010. [http://support.microsoft.com/kb/317723 &amp;lt;http://support.microsoft.com/kb/317723&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[5] Tanenbaum, A. S. Modern Operating Systems (3rd Edition), page 128, 2008. Print. &lt;br /&gt;
&lt;br /&gt;
[6] &amp;quot;QUIESCE Function.&amp;quot; &#039;IBM&#039; IBM Corporation, 2008. Web. Accessed: Dec 2nd 2010. [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm &amp;lt;http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[7] Lu, Shan; Park, Soyeon; Seo, Eunsoo; Zhou, Yuanyuan. &amp;quot;Learning from Mistakes — A Comprehensive Study on Real World Concurrency Bug Characteristics&amp;quot;. &#039;&#039; CiteSeerX &#039;&#039;. Dept. of Comp. Sci. at Univ. of Illinois, 2008. Web. Accessed Dec 2nd 2010. [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203 &amp;lt;http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.1203&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[8] Neamtiu, Iulian; Hicks, Michael. &amp;quot;Safe and Timely Dynamic Updates for Multi-threaded Programs&amp;quot;. &#039;&#039;ACM Digital Library&#039;&#039;. Association for Computing Machinery, 2009. Web. Accessed: Dec 2nd 2010. [http://portal.acm.org/citation.cfm?id=1542479 &amp;lt;http://portal.acm.org/citation.cfm?id=1542479&amp;gt;]&lt;br /&gt;
&lt;br /&gt;
[9] &amp;quot;Ksplice: Automatic Rebootless Kernel Updates&amp;quot;. &#039;&#039;Ksplice&#039;&#039;. Massachusetts Institute of Technology, April 2009. Web. Accessed Dec 2nd 2010. [http://www.ksplice.com/doc/ksplice.pdf &amp;lt;http://www.ksplice.com/doc/ksplice.pdf&amp;gt;]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=6491</id>
		<title>Talk:COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=6491"/>
		<updated>2010-12-02T19:59:30Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Maybe we can all add our names below so we know who&#039;s still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Group members:&lt;br /&gt;
&lt;br /&gt;
* Michael Yagi&lt;br /&gt;
* Nicolas Lessard&lt;br /&gt;
* Julie Powers&lt;br /&gt;
* Derek Langlois&lt;br /&gt;
* Dustin Martin&lt;br /&gt;
&lt;br /&gt;
Jeffrey Francom contacted me earlier so I know he is also still in the course. &amp;lt;strike&amp;gt;Now we are only waiting on Dustin Martin.&amp;lt;/strike&amp;gt; Everyone has been accounted for. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Just kicking things off. Feel free to make suggestions or change anything. --[[User:Myagi|Myagi]] 11:36, 17 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Edited and filled out the critique section. Edited a little bit here and there. --[[User:Afranco2|Afranco2]] 17:41, 22 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Moved stuff to the front page and cleaned up references. Still waiting for people to expand if possible. Also, spellcheck ;) --[[User:Myagi|Myagi]] 10:37, 24 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Expanded --[[User:Afranco2|Afranco2]] 19:04, 27 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
I worked on revising our report last night. There are still sections that have to be explained more clearly. I will work on those and try to remove our direct quotations as the Professor said that we should avoiding using them. I also noticed a few sentences were taken directly from the textbook (slight modifications). Those will need to be changed. &lt;br /&gt;
&lt;br /&gt;
I have a question related to what someone wrote. Does LOOM also find race conditions (if it does where is it mentioned in the paper)? [[User:J powers|J powers]] 19:54, 2 December 2010 (UTC)  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Essay==&lt;br /&gt;
===Paper===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;The paper&#039;s title, authors, and their affiliations. Include a link to the paper and any particularly helpful supplementary information.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
* Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
* Authors: Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
* Affiliations: Computer Science Department, Columbia University&lt;br /&gt;
* Supplementary Information: [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 Video], [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf Slides]&lt;br /&gt;
&lt;br /&gt;
===Background Concepts===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
-------------&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather find it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.  &lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manor.&lt;br /&gt;
-------------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
* Race Condition: &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [http://support.microsoft.com/kb/317723 Race Condition]&lt;br /&gt;
* Execution Filters: Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
* Hot Patches: &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx Hot Patching]&lt;br /&gt;
* Hybrid Instrumentation Engine: &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx Instrumentation]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
* Lock: A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 Mutex Locks]&lt;br /&gt;
* Mutex: Unable to be both true at the same time.&lt;br /&gt;
* Semaphore: &amp;quot;A semaphore is a protected variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment.&amp;quot; [http://en.wikipedia.org/wiki/Semaphore_%28programming%29 Semaphore]&lt;br /&gt;
&lt;br /&gt;
===Research problem===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What is the research problem being addressed by the paper? How does this problem relate to past related work?&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Problem being addressed==== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.&lt;br /&gt;
====Past related work====&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure. &lt;br /&gt;
===Contribution===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Current solution expressed====&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
-------------&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, this can sometimes be considered acceptable. However, server applications often do not have the luxury of being able to reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct, all the while the vulnerability was exposed to all Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both the system update and hot patch approach are they very difficult to properly development, slow to implement, and result in potential unsafe ad hoc solutions that are not scalable. Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
-------------&lt;br /&gt;
&lt;br /&gt;
===Critique===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Good====&lt;br /&gt;
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.&lt;br /&gt;
&lt;br /&gt;
The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
====Not-So-Good====&lt;br /&gt;
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.&lt;br /&gt;
&lt;br /&gt;
The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.&amp;lt;/blockquote&amp;gt;&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=6490</id>
		<title>Talk:COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=6490"/>
		<updated>2010-12-02T19:54:50Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Maybe we can all add our names below so we know who&#039;s still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Group members:&lt;br /&gt;
&lt;br /&gt;
* Michael Yagi&lt;br /&gt;
* Nicolas Lessard&lt;br /&gt;
* Julie Powers&lt;br /&gt;
* Derek Langlois&lt;br /&gt;
* Dustin Martin&lt;br /&gt;
&lt;br /&gt;
Jeffrey Francom contacted me earlier so I know he is also still in the course. &amp;lt;strike&amp;gt;Now we are only waiting on Dustin Martin.&amp;lt;/strike&amp;gt; Everyone has been accounted for. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Just kicking things off. Feel free to make suggestions or change anything. --[[User:Myagi|Myagi]] 11:36, 17 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Edited and filled out the critique section. Edited a little bit here and there. --[[User:Afranco2|Afranco2]] 17:41, 22 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Moved stuff to the front page and cleaned up references. Still waiting for people to expand if possible. Also, spellcheck ;) --[[User:Myagi|Myagi]] 10:37, 24 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Expanded --[[User:Afranco2|Afranco2]] 19:04, 27 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
I worked on revising our report last night. There are still sections that need to be explained more clearly. I will work on those and also try to remove our direct quotations as the Professor said that we should avoiding using them. &lt;br /&gt;
&lt;br /&gt;
I have a question related to what someone wrote. Does LOOM also find race conditions (if it does where is it mentioned in the paper)? [[User:J powers|J powers]] 19:54, 2 December 2010 (UTC)  &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Essay==&lt;br /&gt;
===Paper===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;The paper&#039;s title, authors, and their affiliations. Include a link to the paper and any particularly helpful supplementary information.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
* Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
* Authors: Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
* Affiliations: Computer Science Department, Columbia University&lt;br /&gt;
* Supplementary Information: [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 Video], [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf Slides]&lt;br /&gt;
&lt;br /&gt;
===Background Concepts===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;Explain briefly the background concepts and ideas that your fellow classmates will need to know first in order to understand your assigned paper.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
-------------&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather find it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.  &lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manor.&lt;br /&gt;
-------------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
* Race Condition: &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [http://support.microsoft.com/kb/317723 Race Condition]&lt;br /&gt;
* Execution Filters: Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
* Hot Patches: &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx Hot Patching]&lt;br /&gt;
* Hybrid Instrumentation Engine: &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx Instrumentation]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
* Lock: A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 Mutex Locks]&lt;br /&gt;
* Mutex: Unable to be both true at the same time.&lt;br /&gt;
* Semaphore: &amp;quot;A semaphore is a protected variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment.&amp;quot; [http://en.wikipedia.org/wiki/Semaphore_%28programming%29 Semaphore]&lt;br /&gt;
&lt;br /&gt;
===Research problem===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What is the research problem being addressed by the paper? How does this problem relate to past related work?&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Problem being addressed==== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.&lt;br /&gt;
====Past related work====&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure. &lt;br /&gt;
===Contribution===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Current solution expressed====&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
-------------&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, this can sometimes be considered acceptable. However, server applications often do not have the luxury of being able to reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct, all the while the vulnerability was exposed to all Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both the system update and hot patch approach are they very difficult to properly development, slow to implement, and result in potential unsafe ad hoc solutions that are not scalable. Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
-------------&lt;br /&gt;
&lt;br /&gt;
===Critique===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
====Good====&lt;br /&gt;
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up thier topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnessicary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.&lt;br /&gt;
&lt;br /&gt;
The whole essay flows well and the information is delievered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explination) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
====Not-So-Good====&lt;br /&gt;
One of the problems with this paper is that although many of the examples are simplified in order to expediate the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunatly, this ends up making the problem seem almost trivial and does little more than water down the information.&lt;br /&gt;
&lt;br /&gt;
The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
===References===&lt;br /&gt;
&amp;lt;blockquote&amp;gt;You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.&amp;lt;/blockquote&amp;gt;&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6062</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6062"/>
		<updated>2010-12-02T01:21:09Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Contribution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM.&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests the advantages of LOOM were proven. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a Superior Choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, rebooting a system is acceptable. However, servers often cannot reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct. All the while the vulnerability was exposed to Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both system updates and hot patches are that they are very difficult to properly develop, slow to implement, and result in potentially unsafe ad hoc solutions that are not scalable. &lt;br /&gt;
&lt;br /&gt;
Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay effectively convey their findings by staying focused on the thesis as well as supporting their topics with relevant examples and data. Examples throughout the paper, particularly the MySQL example, ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified to avoid confusing the reader with too much unnecessary information. The references throughout backup the reliability of the paper and let the reader to verify information from the sources.&lt;br /&gt;
&lt;br /&gt;
The essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The conclusion summarizes the paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that some of the examples are oversimplified. For example, Figure 9 attempts to represent the evacuation process. Unfortunately, this causes the problem to seem trivial.&lt;br /&gt;
&lt;br /&gt;
The writers are also biased towards LOOM. Although they do admit the limitations of LOOM, they do not elaborate any further. They promote LOOM without discussing possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].&lt;br /&gt;
&lt;br /&gt;
[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx] &lt;br /&gt;
&lt;br /&gt;
[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]&lt;br /&gt;
&lt;br /&gt;
[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]&lt;br /&gt;
&lt;br /&gt;
[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008&lt;br /&gt;
&lt;br /&gt;
[6] QUIESCE Function. IBM [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6056</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6056"/>
		<updated>2010-12-02T01:12:16Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Critique */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM.&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests LOOM proved its worth. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a Superior Choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, rebooting a system is acceptable. However, servers often cannot reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct. All the while the vulnerability was exposed to Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both system updates and hot patches are that they are very difficult to properly develop, slow to implement, and result in potentially unsafe ad hoc solutions that are not scalable. &lt;br /&gt;
&lt;br /&gt;
Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay effectively convey their findings by staying focused on the thesis as well as supporting their topics with relevant examples and data. Examples throughout the paper, particularly the MySQL example, ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified to avoid confusing the reader with too much unnecessary information. The references throughout backup the reliability of the paper and let the reader to verify information from the sources.&lt;br /&gt;
&lt;br /&gt;
The essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The conclusion summarizes the paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that some of the examples are oversimplified. For example, Figure 9 attempts to represent the evacuation process. Unfortunately, this causes the problem to seem trivial.&lt;br /&gt;
&lt;br /&gt;
The writers are also biased towards LOOM. Although they do admit the limitations of LOOM, they do not elaborate any further. They promote LOOM without discussing possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].&lt;br /&gt;
&lt;br /&gt;
[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx] &lt;br /&gt;
&lt;br /&gt;
[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]&lt;br /&gt;
&lt;br /&gt;
[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]&lt;br /&gt;
&lt;br /&gt;
[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008&lt;br /&gt;
&lt;br /&gt;
[6] QUIESCE Function. IBM [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6023</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6023"/>
		<updated>2010-12-02T00:21:44Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Contribution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM.&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests LOOM proved its worth. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a Superior Choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, rebooting a system is acceptable. However, servers often cannot reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct. All the while the vulnerability was exposed to Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both system updates and hot patches are that they are very difficult to properly develop, slow to implement, and result in potentially unsafe ad hoc solutions that are not scalable. &lt;br /&gt;
&lt;br /&gt;
Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up their topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnecessary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.&lt;br /&gt;
&lt;br /&gt;
The whole essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that although many of the examples are simplified in order to expedite the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunately, this ends up making the problem seem almost trivial and does little more than water down the information.&lt;br /&gt;
&lt;br /&gt;
The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].&lt;br /&gt;
&lt;br /&gt;
[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx] &lt;br /&gt;
&lt;br /&gt;
[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]&lt;br /&gt;
&lt;br /&gt;
[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]&lt;br /&gt;
&lt;br /&gt;
[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008&lt;br /&gt;
&lt;br /&gt;
[6] QUIESCE Function. IBM [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6001</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=6001"/>
		<updated>2010-12-01T23:31:43Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Contribution */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM.&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developer&#039;s application as a plugin and kept separate from the source code. The plugin will inject the LOOM Update Engine into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The filter&#039;s code declaration is easy to understand and can be inserted in a code region that needs to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs between performance and reliability in their application. These can include making two code regions mutually exclusive when accessing different objects; or, with a significant decrease in performance, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an application&#039;s binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
The evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests LOOM proved its worth. Overhead was tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL were minimal, (~1.83% and ~4% respectively) causing it to be a viable as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the overhead was still low: under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system as it fixed all of the race conditions studied. To demonstrate LOOM&#039;s reliability, it was compared against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of the installation of LOOM&#039;s fixes was demonstrated in a simple example. The LOOM based fix completed in 368ms whereas the function quiescence fix took the max test time (1 hour) and was not finished.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, this can sometimes be considered acceptable. However, server applications often do not have the luxury of being able to reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct, all the while the vulnerability was exposed to all Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both the system update and hot patch approach are they very difficult to properly development, slow to implement, and result in potential unsafe ad hoc solutions that are not scalable. Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up their topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnecessary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.&lt;br /&gt;
&lt;br /&gt;
The whole essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that although many of the examples are simplified in order to expedite the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunately, this ends up making the problem seem almost trivial and does little more than water down the information.&lt;br /&gt;
&lt;br /&gt;
The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].&lt;br /&gt;
&lt;br /&gt;
[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx] &lt;br /&gt;
&lt;br /&gt;
[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]&lt;br /&gt;
&lt;br /&gt;
[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]&lt;br /&gt;
&lt;br /&gt;
[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008&lt;br /&gt;
&lt;br /&gt;
[6] QUIESCE Function. IBM [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=5993</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=5993"/>
		<updated>2010-12-01T19:59:15Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Research problem */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and workarounds through the use of LOOM.&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]], these paths can later be reactivated and run as normal. This is not efficient for fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests LOOM proves its worth. Overhead is tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL are minimal, (~1.83% and ~4% respectively) making it an obvious choice as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the over head was still under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system. It fixed all of the race condition errors that it was tested against, proving that it has immense power as a reliable form of fix. To assure reliability LOOM was paired against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of LOOM was demonstrated using a simple example, showing a LOOM based fix in 368ms whereas the function quiescence fix took the max test time (1 hour) and still did not finish.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, this can sometimes be considered acceptable. However, server applications often do not have the luxury of being able to reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct, all the while the vulnerability was exposed to all Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both the system update and hot patch approach are they very difficult to properly development, slow to implement, and result in potential unsafe ad hoc solutions that are not scalable. Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up their topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnecessary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.&lt;br /&gt;
&lt;br /&gt;
The whole essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that although many of the examples are simplified in order to expedite the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunately, this ends up making the problem seem almost trivial and does little more than water down the information.&lt;br /&gt;
&lt;br /&gt;
The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].&lt;br /&gt;
&lt;br /&gt;
[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx] &lt;br /&gt;
&lt;br /&gt;
[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]&lt;br /&gt;
&lt;br /&gt;
[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]&lt;br /&gt;
&lt;br /&gt;
[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008&lt;br /&gt;
&lt;br /&gt;
[6] QUIESCE Function. IBM [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=5987</id>
		<title>COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_2_2010_Question_5&amp;diff=5987"/>
		<updated>2010-12-01T19:51:22Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Background Concepts */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Paper==&lt;br /&gt;
&#039;&#039;&#039;Title:&#039;&#039;&#039; [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Authors:&#039;&#039;&#039; Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Affiliations:&#039;&#039;&#039; Computer Science Department, Columbia University&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Supplementary Information:&#039;&#039;&#039; Video available [http://homeostasis.scs.carleton.ca/osdi/video/wu.mp4 here] as well as [http://homeostasis.scs.carleton.ca/osdi/slides/wu.pdf slides]&lt;br /&gt;
&lt;br /&gt;
==Background Concepts==&lt;br /&gt;
A race condition is a system flaw that “occurs when two threads access a shared variable at the same time.&amp;quot; Race conditions can be very complex, time consuming and expensive to fix. Unfortunately, the most challenging part of race condition is not fixing it, but rather finding it. Race conditions are notorious for being extremely difficult to find, isolate and recreate. To help ease this process, the authors of this paper, Jingyue Wu, Heming Cui, Junfeng Yang, propose the adoption of LOOM.&lt;br /&gt;
&lt;br /&gt;
LOOM is a system which dynamically locates and corrects areas which may be susceptible to race condition errors. The power of LOOM rests in its ability to operate on live applications in real time. This is possible thanks to its evacuation algorithm which injects execution filters to fix race conditions at runtime. Execution filters, otherwise known as request filtering, allow you to inspect the request before and after the main logic is executed. By leveraging execution filters as the means for correcting race conditions, LOOM is able to operate with very little performance overhead and is a highly scalable as the number of application threads increases.&lt;br /&gt;
&lt;br /&gt;
The authors tested LOOM on existing real world race conditions found in common applications. The tests found that all tested race conditions were solved, with little performance overhead, in a scalable and easy to implement manner. &lt;br /&gt;
&lt;br /&gt;
This paper consists of multiple terms which must be familiar to the reader in order to assist in reading the Bypassing Races in Live Applications with Execution Filters paper. These terms are listed and explained below:&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Deadlock:&#039;&#039;&#039; Deadlocks usually occur within the context of two threads. One thread tries to lock a variable that the other thread has already locked and vice versa. The result of this is that each thread is waiting for each other&#039;s thread to release the variable. Thus a deadlock occurs and nothing can happen.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Evacuation&#039;&#039;&#039; The process of proactively pausing and changing states of code sections so that those sections can be filtered for proper processing&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Execution Filters:&#039;&#039;&#039; Otherwise known as request filtering. Request filters allow you to inspect the request before and after the main logic is executed. These are mutual exclusion filters in the context of this paper.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Function Quiescence&#039;&#039;&#039; The process of pausing and altering states, in order to avoid race conditions and overlapping between code segments.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hot Patches:&#039;&#039;&#039; &amp;quot;Hot patching provides a mechanism to update system files without rebooting or stopping services and processes.&amp;quot;[[#References | [1]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Hybrid Instrumentation Engine:&#039;&#039;&#039; &amp;quot;Instrumentation refers to an ability to monitor or measure the level of a product&#039;s performance, to diagnose errors and writing trace information.&amp;quot; [[#References | [2]]]  Instrument programs can have low runtime overhead, but instrumentation has to be done at compile time. Dynamic instrumentation can update programs at runtime but incur high overhead. A hybrid instrumentation is an implementation of combined static and dynamic instrumentation.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Lock:&#039;&#039;&#039; A lock is a way of limiting access to a common resource when using multiple threads. Lock and unlock methods are usually called at the beginning and end of a target method, respectively. &amp;quot;Mutual exclusion locks (mutexes) are a common method of serializing thread execution. Mutual exclusion locks synchronize threads, usually by ensuring that only one thread at a time executes a critical section of code. Mutex locks can also preserve single-threaded code.&amp;quot; [[#References | [3]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Mutex:&#039;&#039;&#039; Unable to be both true at the same time.&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Race Condition:&#039;&#039;&#039; &amp;quot;A race condition occurs when two threads access a shared variable at the same time.&amp;quot; [[#References | [4]]]&lt;br /&gt;
&lt;br /&gt;
&#039;&#039;&#039;Semaphore:&#039;&#039;&#039; Semaphores are basically a special type of flag and generalize a down and up state (sleep or wakeup). The down operation checks to see if the value is greater than 0 and if so, decrements the value and uses up one stored wakeup. If the value is 0, the process is put to sleep. These steps are all done in a single indivisible atomic action. It is guaranteed that once a semaphore operation has started, no other process can access the semaphore until the operation has been completed or blocked. Semaphores are an essential part of solving synchronization problems.  [[#References | [5]]]&lt;br /&gt;
&lt;br /&gt;
==Research problem==&lt;br /&gt;
===Problem being addressed=== &lt;br /&gt;
With the rise of multiple core systems, multithreaded programs are often prone to race conditions. Races are hard to detect, test and debug. Due to the immaturity of current race detectors, this paper explains a new approach to race detection and work arounds through the use of LOOM.&lt;br /&gt;
===Related work===&lt;br /&gt;
Two common solutions to fixing deployed races are software updates and hot patches. Software updates require restarts whereas hot patches applies patches to live systems. However, relying on conventional patches can lead to new errors and could be unsafe, due to a multithreaded applications complexity. Releasing a reliable patch takes time, but developers often resort to more efficient fixes rather than placing proper locks in the application due to performance or work pressure.&lt;br /&gt;
&lt;br /&gt;
Using a QUIESCE function to &amp;quot;temporarily suspend...incoming messages on an IUCV path&amp;quot;[[#References | [6]]] These paths can later be reactivated and run as normal. This is not an efficient for of fixing a race condition because it only delays the problem in an attempt to avoid conflict. Although this does allow for a certain extent of safety it does not come near the reliability and flexibility of LOOM. Speed, reliability, flexibility and ease of use are all areas in which LOOM is demonstrated as being better than a QUIESCE function.&lt;br /&gt;
&lt;br /&gt;
==Contribution==&lt;br /&gt;
===Current solution expressed===&lt;br /&gt;
Compared to traditional solutions, LOOM differs in its approach to race fixes. It is designed to quickly develop safe, optimized, temporary workarounds while a concrete solution is developed. LOOM is also very easy to use. LOOM is compiled with a developers application as a plugin and kept separate from the source code. The plugin will inject the LOOM update into the application binary. &lt;br /&gt;
&lt;br /&gt;
Mutual exclusion filters are written by the developer and synced with the source code to filter out any racy threads. The code declaration used is easy to understand and can be inserted in a code region that need to be mutually exclusive. The developer does not need to deal with low level operations such as lock, unlock and semaphore operations. Users can then download the filter and apply it to the application while it is still live. &lt;br /&gt;
&lt;br /&gt;
LOOM is flexible in that developers can make trade-offs in performance and reliability in their application in conjunction with LOOM. These can include making two code regions mutually exclusive even when accessing different objects or with extreme measures, making them run in single threaded mode. &lt;br /&gt;
&lt;br /&gt;
An evacuation algorithm is used for safety as to not introduce new errors. A critical region is marked using static analysis. All threads in the critical region are then evacuated. After the evacuation is executed, the execution filter is installed and then the threads are resumed after a live update pause is done at a safe location. &lt;br /&gt;
&lt;br /&gt;
LOOM&#039;s hybrid instrumentation engine is used to reduce its overhead. The engine statically changes an applications binary to anticipate dynamic updates.&lt;br /&gt;
&lt;br /&gt;
Evaluation of LOOM was based on overhead, scalability, reliability, availability and timeliness. These were demonstrated using Apache and MySQL in conjunction with the multithreaded ApacheBench and SysBench, respectively.&lt;br /&gt;
&lt;br /&gt;
Through multiple tests LOOM proves its worth. Overhead is tested in a comparison of LOOM during normal runtime. The effects of LOOM on Apache and MySQL are minimal, (~1.83% and ~4% respectively) making it an obvious choice as a runtime fix for race errors. To test scalability the team discovered that on 32 server threads, the over head was still under 3% and 12% respectively. Reliability is one of the strongest facets of the LOOM system. It fixed all of the race condition errors that it was tested against, proving that it has immense power as a reliable form of fix. To assure reliability LOOM was paired against a conventional restart-based software update. In this test the software update was clearly slower, requiring time to reset itself, where LOOM running a live update had almost no effect on the throughput. Lastly the timeliness of LOOM was demonstrated using a simple example, showing a LOOM based fix in 368ms whereas the function quiescence fix took the max test time (1 hour) and still did not finish.&lt;br /&gt;
&lt;br /&gt;
====Why is it any better than what came before?====&lt;br /&gt;
Previously, the two standard ways of fixing deployed race conditions were system updates and hot patches. LOOM is a superior choice to both these options for a number of reasons.&lt;br /&gt;
&lt;br /&gt;
Unlike LOOM, the system update approach requires that the system be rebooted before the fix can be implemented. With desktop applications, this can sometimes be considered acceptable. However, server applications often do not have the luxury of being able to reboot because requests are coming from external sources and are expected to be processed.&lt;br /&gt;
&lt;br /&gt;
While hot patches do not require a system reboot, they do have their own specific vulnerabilities. Namely, it is very difficult to apply a patch that corrects the error, or errors, but leaves the rest of the system unaffected. Often when correcting a race condition via a hot patch, others can appear. The main concern with hot patches however, is that their development is a time consuming process. A process which until developed and deployed, leaves the race condition vulnerable and exposed. The paper chronicles a real world Mozilla race condition whose hot patch took nearly 8 years of development to correct, all the while the vulnerability was exposed to all Mozilla users. &lt;br /&gt;
&lt;br /&gt;
Flaws common to both the system update and hot patch approach are they very difficult to properly development, slow to implement, and result in potential unsafe ad hoc solutions that are not scalable. Conversely, LOOM is easy to use, fast to implement, highly flexible, scalable, and safe to use.&lt;br /&gt;
&lt;br /&gt;
==Critique==&lt;br /&gt;
===Good===&lt;br /&gt;
The authors of this essay are efficient at delivering the information surrounding their thesis both in staying focused on the main thesis as well as backing up their topics with relevant examples and data. This helps to keep the thesis paramount throughout the paper. Examples throughout the paper, particularly the MySQL example ensure that the use of execution filters is clear to the reader. All of the examples are well documented and some (ex. Figure 2) are simplified as to not confuse the reader with too much unnecessary information. References throughout the writing backup the reliability of the paper and let the user keep track of the sources to properly check information and sources.&lt;br /&gt;
&lt;br /&gt;
The whole essay flows well and the information is delivered in a well put together order, allowing the reader to learn enough about LOOM (or any of the sub-topics involved in the explanation) before being informed about the next relative subject. The paper ends with a conclusion that does a good job of wrapping up the whole paper in a clear and concise manner.&lt;br /&gt;
&lt;br /&gt;
===Not-So-Good===&lt;br /&gt;
One of the problems with this paper is that although many of the examples are simplified in order to expedite the understanding of the user, some are a little oversimplified. For example, Figure 9 is a graphic that attempts to represent the evacuation process in a visual manner. Unfortunately, this ends up making the problem seem almost trivial and does little more than water down the information.&lt;br /&gt;
&lt;br /&gt;
The writers are also a little bit one sided (with understandable reason) on the topic. Although they do admit the limitations of LOOM, they do not spend much time discussing any problems later. There is a large amount of play-up for LOOM without much discussion of the possible problems with it, such as the clients running LOOM may decide not to fix the race conditions and rather just let the program continue to run with LOOM as a permanent fix. This may cause further errors in the long term life of the program.&lt;br /&gt;
&lt;br /&gt;
==References==&lt;br /&gt;
[1] Introduction to Hotpatching. [http://technet.microsoft.com/en-us/library/cc781109%28WS.10%29.aspx http://technet.microsoft.com/en-us/library/cc781109(WS.10).aspx].&lt;br /&gt;
&lt;br /&gt;
[2] Introduction to Instrumentation and Tracing. [http://msdn.microsoft.com/en-us/library/aa983649%28VS.71%29.aspx http://msdn.microsoft.com/en-us/library/aa983649(VS.71).aspx] &lt;br /&gt;
&lt;br /&gt;
[3] A. D. Marshall. Further Threads Programming:Synchronization. Cardiff University, 1999 [http://www.cs.cf.ac.uk/Dave/C/node31.html#SECTION003110000000000000000 HTML]&lt;br /&gt;
&lt;br /&gt;
[4] Description of race conditions and deadlocks. [http://support.microsoft.com/kb/317723 http://support.microsoft.com/kb/317723]&lt;br /&gt;
&lt;br /&gt;
[5] A. S. Tanenbaum. Modern Operating Systems (3rd Edition), page 128, 2008&lt;br /&gt;
&lt;br /&gt;
[6] QUIESCE Function. IBM [http://publib.boulder.ibm.com/infocenter/zvm/v5r3/index.jsp?topic=/com.ibm.zvm.v53.hcpb4/hcse5b21270.htm]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=5026</id>
		<title>Talk:COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=5026"/>
		<updated>2010-11-15T19:13:25Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Maybe we can all add our names below so we know who&#039;s still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Group members:&lt;br /&gt;
&lt;br /&gt;
* Michael Yagi&lt;br /&gt;
* Nicolas Lessard&lt;br /&gt;
* Julie Powers&lt;br /&gt;
* Derek Langlois&lt;br /&gt;
&lt;br /&gt;
Jeffrey Francom contacted me earlier so I know he is also still in the course. &amp;lt;strike&amp;gt;Now we are only waiting on Dustin Martin.&amp;lt;/strike&amp;gt; Everyone has been accounted for. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==Essay==&lt;br /&gt;
* Paper&lt;br /&gt;
** Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
** Authors: Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
** Affiliations: Computer Science Department, Columbia University&lt;br /&gt;
** Supplementary Information:&lt;br /&gt;
* Background Concepts&lt;br /&gt;
* Research problem&lt;br /&gt;
* Contribution&lt;br /&gt;
* Critique&lt;br /&gt;
* References&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=4984</id>
		<title>Talk:COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=4984"/>
		<updated>2010-11-15T18:07:59Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Maybe we can all add our names below so we know who&#039;s still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Group members:&lt;br /&gt;
&lt;br /&gt;
* Michael Yagi&lt;br /&gt;
* Nicolas Lessard&lt;br /&gt;
* Julie Powers&lt;br /&gt;
* Derek Langlois&lt;br /&gt;
&lt;br /&gt;
Jeffrey Francom contacted me earlier so I know he is also still in the course. Now we are only waiting on Dustin Martin. [[User:J powers|J powers]] 18:07, 15 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
==Essay==&lt;br /&gt;
* Paper&lt;br /&gt;
** Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
** Authors: Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
** Affiliations: Computer Science Department, Columbia University&lt;br /&gt;
** Supplementary Information:&lt;br /&gt;
* Background Concepts&lt;br /&gt;
* Research problem&lt;br /&gt;
* Contribution&lt;br /&gt;
* Critique&lt;br /&gt;
* References&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=4970</id>
		<title>Talk:COMP 3000 Essay 2 2010 Question 5</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_2_2010_Question_5&amp;diff=4970"/>
		<updated>2010-11-15T12:18:54Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Maybe we can all add our names below so we know who&#039;s still in this course? --[[User:Myagi|Myagi]] 12:38, 14 November 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Group members:&lt;br /&gt;
&lt;br /&gt;
* Michael Yagi&lt;br /&gt;
* Nicolas Lessard&lt;br /&gt;
*Julie Powers&lt;br /&gt;
&lt;br /&gt;
==Essay==&lt;br /&gt;
* Paper&lt;br /&gt;
** Title: [http://www.usenix.org/events/osdi10/tech/full_papers/Wu.pdf Bypassing Races in Live Applications with Execution Filters]&lt;br /&gt;
** Authors: Jingyue Wu, Heming Cui, Junfeng Yang&lt;br /&gt;
** Affiliations: Computer Science Department, Columbia University&lt;br /&gt;
** Supplementary Information:&lt;br /&gt;
* Background Concepts&lt;br /&gt;
* Research problem&lt;br /&gt;
* Contribution&lt;br /&gt;
* Critique&lt;br /&gt;
* References&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4266</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4266"/>
		<updated>2010-10-15T00:18:34Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Alright, so I edited the whole thing and expanded the Mars-Rover case. I edited the introduction and the conclusion. It didn&#039;t make a lot sense to me to talk about mutual exclusion, while none of the cases mention anything about that, so I just removed it. Its good to see that theres almost a theme going through each one of those cases, and we made it clear through our writing. Whether its the error pattern, the consequences, what lead to those exceptions in the first place, who to blame, etc.&lt;br /&gt;
So thats good, I feel that our essay is quite readable and easy to follow. &lt;br /&gt;
&lt;br /&gt;
chaOs, if you&#039;re reading this, please try to provide resources for the Blackout case. I know theres one resource floating around the net that has a report and everything, but I don&#039;t wanna add it myself, because I&#039;m not sure whether its the one you were using. Please if you used anything, then include in the resources. I will do one final edit late at midnight, maybe organize it a bit and link the resources to the article sections. If anyone wants to do an edit, please just organize and do a grammar-vocab check, don&#039;t modify the content of the cases.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 23:56, 14 October 2010 (UTC)&lt;br /&gt;
----&lt;br /&gt;
I attempted to write a more complete introduction and it looks like everything is starting to come together. Good work everyone!&lt;br /&gt;
[[User:J powers|J powers]] 00:16, 15 October 2010 (UTC)&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language (C/C++, Assembly)&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
*Race conditions are a common problem&lt;br /&gt;
*Software Design&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4260</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4260"/>
		<updated>2010-10-15T00:16:07Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Alright, so I edited the whole thing and expanded the Mars-Rover case. I edited the introduction and the conclusion. It didn&#039;t make a lot sense to me to talk about mutual exclusion, while none of the cases mention anything about that, so I just removed it. Its good to see that theres almost a theme going through each one of those cases, and we made it clear through our writing. Whether its the error pattern, the consequences, what lead to those exceptions in the first place, who to blame, etc.&lt;br /&gt;
So thats good, I feel that our essay is quite readable and easy to follow. &lt;br /&gt;
&lt;br /&gt;
chaOs, if you&#039;re reading this, please try to provide resources for the Blackout case. I know theres one resource floating around the net that has a report and everything, but I don&#039;t wanna add it myself, because I&#039;m not sure whether its the one you were using. Please if you used anything, then include in the resources. I will do one final edit late at midnight, maybe organize it a bit and link the resources to the article sections. If anyone wants to do an edit, please just organize and do a grammar-vocab check, don&#039;t modify the content of the cases.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 23:56, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language (C/C++, Assembly)&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
*Race conditions are a common problem&lt;br /&gt;
*Software Design&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I attempted to write a more complete introduction and it looks like everything is starting to come together. Good work everyone!&lt;br /&gt;
[[User:J powers|J powers]] 00:16, 15 October 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4240</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4240"/>
		<updated>2010-10-15T00:08:54Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result might be lead to unpredictable results depending on the exact timing of those processes. Consequently a major system failure can occur. &lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions are notorious in the history of software bugs. Examples range from a section of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. All of the system failures due to race conditions have common patterns and are caused by inadequate management of shared memory. &lt;br /&gt;
&lt;br /&gt;
During development of these systems, programmers do not realize that their designs incorporate a race condition until they occur. They are unexpected, infrequent, and the specific failure conditions are difficult to duplicate. Therefore the origin of the failure may take weeks up to years to discover. This is also dependent on the complexity of the system. A lack of testing before deployment may also be responsible. &lt;br /&gt;
&lt;br /&gt;
Race conditions occasionally reoccur in the same software. An example of this is when the race condition is mistaken as another problem. Another example is when a system contains multiple race conditions. Programming languages where memory management is an important aspect of development, such as Assembly and C/C++, are also common to all of the systems. &lt;br /&gt;
   &lt;br /&gt;
In this article, we will examine the most well known cases involving race conditions. For each of the cases we will explain why the race condition occurred, its significance and the aftermath of the failure.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Root Causes &amp;amp; Outcomes===&lt;br /&gt;
&lt;br /&gt;
A number of factors contributed to the failure of the Therac-25. The code was put together by a single programmer and no proper testing was conducted. In addition, code was reused from previous generation machines without verifying it was fully compatible with the new hardware. Previous Therac-6 and Therac-20 had hardware interrupts which prevent race conditions from occurring. It is clear that proper planning and forethought could have prevented this incident.&lt;br /&gt;
&lt;br /&gt;
Six incidents involving the Therac-25 took place over the span 1985 and 1987. It took 2 years until the FDA took the machines out of service. The FDA forced AECL to make modifications to the Therac-25 before it was allowed back on the market. Software bugs were fixed to suspend all other operations while the magnets positioned themselves to administer the correct radiation strength. In addition, a dead mans switch was added the switch was a foot pedal which the operator must hold down to enable motion of the x-ray machine. This prevented the operator of being unaware of changes in the x-ray machines state.&lt;br /&gt;
&lt;br /&gt;
After these changes were made the Therac-25 was reintroduced into the market in 1988. Some of the machines are still in service today. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, &lt;br /&gt;
the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same &lt;br /&gt;
piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halted state for a few days. In efforts to keep the Rover functioning, the NASA team attempted to avoid the problem by restricting another module from operating during that time-frame, allowing enough time for&lt;br /&gt;
the IM process to carry on its task. However, the NASA team were aware of the fact that the bug could actually resurface again. And it actually did later on in the Spirit Rover Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
===Aftermath and current status===&lt;br /&gt;
While those race conditions errors were clearly due to a lack of memory management and proper co-ordination among processes, they were largely unexpected and unforeseen. In contrast to the other cases mentioned so far, the consequences that the NASA team had to deal with weren&#039;t life threatening. So it seems that their main concern was to keep the Rovers functioning in order to obtain as much information as possible. No effort was even made to alter the software. Also, one could imagine that the task of examining and debugging those errors was quite a challenge, since they couldn&#039;t deal with the Rovers physically, rather everything was done via transmission and messages. Another thing to note is the fact that the single CPU used in those Rovers had a lot to deal with beside the usual software implementation. Had NASA considered the possibility of implementing a multiple CPU design, it could have made a difference.&lt;br /&gt;
&lt;br /&gt;
The Spirit Rover has experienced a number of problem since then. Most recent reports revealed that the Rover has been largely inactive, with no data being received. The Opportunity Rover on the other hand continues to function successfully.&lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The main challenge with race condition errors is that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved, the implementation of software, the hardware design and the surrounding environment. However, the human element plays an huge part here as well, as far as applying the required amount of testing and anticipating certain schemes and situations where an error might occur.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. July 1993. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25] &lt;br /&gt;
* Nancy Leveson and Clark Turner. July 1993. [http://www.stanford.edu/class/cs240/readings/therac-25.pdf An Investigation of the Therac-25 Accidents]  &lt;br /&gt;
* Anne Marie Porrello. July 1993. [http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/THERAC25.html Death and Denial: The Failure of the THERAC-25, A Medical Linear Accelerator]  &lt;br /&gt;
* Reeves and Snyder. 10 January 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* Update: Spirit and Opportunity [http://marsrover.nasa.gov/mission/status.html]&lt;br /&gt;
* John Chan. 12 August 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] &lt;br /&gt;
* Dr. Dobb&#039;s Journal. 9 June 2010. Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4237</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4237"/>
		<updated>2010-10-15T00:07:30Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Introduction */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions are notorious in the history of software bugs. Examples range from a section of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. All of the system failures due to race conditions have common patterns and are caused by inadequate management of shared memory. &lt;br /&gt;
&lt;br /&gt;
During development of these systems, programmers do not realize that their designs incorporate a race condition until they occur. They are unexpected, infrequent, and the specific failure conditions are difficult to duplicate. Therefore the origin of the failure may take weeks up to years to discover. This is also dependent on the complexity of the system. A lack of testing before deployment may also be responsible. &lt;br /&gt;
&lt;br /&gt;
Race conditions occasionally reoccur in the same software. An example of this is when the race condition is mistaken as another problem. Another example is when a system contains multiple race conditions. Programming languages where memory management is an important aspect of development, such as Assembly and C/C++, are also common to all of the systems. &lt;br /&gt;
   &lt;br /&gt;
In this article, we will examine the most well known cases involving race conditions. For each of the cases we will explain why the race condition occurred, its significance and the aftermath of the failure.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result might be lead to unpredictable results depending on the exact timing of those processes. Consequently a major system failure can occur. &lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Root Causes &amp;amp; Outcomes===&lt;br /&gt;
&lt;br /&gt;
A number of factors contributed to the failure of the Therac-25. The code was put together by a single programmer and no proper testing was conducted. In addition, code was reused from previous generation machines without verifying it was fully compatible with the new hardware. Previous Therac-6 and Therac-20 had hardware interrupts which prevent race conditions from occurring. It is clear that proper planning and forethought could have prevented this incident.&lt;br /&gt;
&lt;br /&gt;
Six incidents involving the Therac-25 took place over the span 1985 and 1987. It took 2 years until the FDA took the machines out of service. The FDA forced AECL to make modifications to the Therac-25 before it was allowed back on the market. Software bugs were fixed to suspend all other operations while the magnets positioned themselves to administer the correct radiation strength. In addition, a dead mans switch was added the switch was a foot pedal which the operator must hold down to enable motion of the x-ray machine. This prevented the operator of being unaware of changes in the x-ray machines state.&lt;br /&gt;
&lt;br /&gt;
After these changes were made the Therac-25 was reintroduced into the market in 1988. Some of the machines are still in service today. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, &lt;br /&gt;
the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same &lt;br /&gt;
piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halted state for a few days. In efforts to keep the Rover functioning, the NASA team attempted to avoid the problem by restricting another module from operating during that time-frame, allowing enough time for&lt;br /&gt;
the IM process to carry on its task. However, the NASA team were aware of the fact that the bug could actually resurface again. And it actually did later on in the Spirit Rover Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
===Aftermath and current status===&lt;br /&gt;
While those race conditions errors were clearly due to a lack of memory management and proper co-ordination among processes, they were largely unexpected and unforeseen. In contrast to the other cases mentioned so far, the consequences that the NASA team had to deal with weren&#039;t life threatening. So it seems that their main concern was to keep the Rovers functioning in order to obtain as much information as possible. No effort was even made to alter the software. Also, one could imagine that the task of examining and debugging those errors was quite a challenge, since they couldn&#039;t deal with the Rovers physically, rather everything was done via transmission and messages. Another thing to note is the fact that the single CPU used in those Rovers had a lot to deal with beside the usual software implementation. Had NASA considered the possibility of implementing a multiple CPU design, it could have made a difference.&lt;br /&gt;
&lt;br /&gt;
The Spirit Rover has experienced a number of problem since then. Most recent reports revealed that the Rover has been largely inactive, with no data being received. The Opportunity Rover on the other hand continues to function successfully.&lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The main challenge with race condition errors is that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved, the implementation of software, the hardware design and the surrounding environment. However, the human element plays an huge part here as well, as far as applying the required amount of testing and anticipating certain schemes and situations where an error might occur.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. July 1993. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25] &lt;br /&gt;
* Nancy Leveson and Clark Turner. July 1993. [http://www.stanford.edu/class/cs240/readings/therac-25.pdf An Investigation of the Therac-25 Accidents]  &lt;br /&gt;
* Anne Marie Porrello. July 1993. [http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/THERAC25.html Death and Denial: The Failure of the THERAC-25, A Medical Linear Accelerator]  &lt;br /&gt;
* Reeves and Snyder. 10 January 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* Update: Spirit and Opportunity [http://marsrover.nasa.gov/mission/status.html]&lt;br /&gt;
* John Chan. 12 August 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] &lt;br /&gt;
* Dr. Dobb&#039;s Journal. 9 June 2010. Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4134</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4134"/>
		<updated>2010-10-14T22:22:37Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Overview */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect&lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Root Causes &amp;amp; Outcomes===&lt;br /&gt;
&lt;br /&gt;
A number of factors contributed to the failure of the Therac-25. The code was put together by a single programmer and no proper testing was conducted. In addition, code was reused from previous generation machines without verifying it was fully compatible with the new hardware. Previous Therac-6 and Therac-20 had hardware interrupts which prevent race conditions from occurring. It is clear that proper planning and forethought could have prevented this incident.&lt;br /&gt;
&lt;br /&gt;
Six incidents involving the Therac-25 took place over the span 1985 and 1987. It took 2 years until the FDA took the machines out of service. The FDA forced AECL to make modifications to the Therac-25 before it was allowed back on the market. Software bugs were fixed to suspend all other operations while the magnets positioned themselves to administer the correct radiation strength. In addition, a dead mans switch was added the switch was a foot pedal which the operator must hold down to enable motion of the x-ray machine. This prevented the operator of being unaware of changes in the x-ray machines state.&lt;br /&gt;
&lt;br /&gt;
After these changes were made the Therac-25 was reintroduced into the market in 1988. Some of the machines are still in service today. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. July 1993. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25] &lt;br /&gt;
* Nancy Leveson and Clark Turner. July 1993. [http://www.stanford.edu/class/cs240/readings/therac-25.pdf An Investigation of the Therac-25 Accidents]  &lt;br /&gt;
* Anne Marie Porrello. July 1993. [http://users.csc.calpoly.edu/~jdalbey/SWE/Papers/THERAC25.html Death and Denial: The Failure of the THERAC-25, A Medical Linear Accelerator]  &lt;br /&gt;
* Reeves and Snyder. 10 January 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* John Chan. 12 August 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] &lt;br /&gt;
* Dr. Dobb&#039;s Journal. 9 June 2010. Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4128</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4128"/>
		<updated>2010-10-14T22:08:23Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language (C/C++, Assembly)&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
*Race conditions are a common problem&lt;br /&gt;
*Software Design&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4127</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4127"/>
		<updated>2010-10-14T22:07:57Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language (C/C++, Assembly)&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
*Race conditions are a common problem&lt;br /&gt;
*Software Design (poor)&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4090</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4090"/>
		<updated>2010-10-14T21:06:57Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language (C/C++, Assembly)&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
*Race conditions are a common problem&lt;br /&gt;
*Software Design&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4082</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4082"/>
		<updated>2010-10-14T20:50:22Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect &lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is &lt;br /&gt;
that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for&lt;br /&gt;
the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. July 1993. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25]  &lt;br /&gt;
* Reeves and Snyder. 10 January 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* John Chan. 12 August 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] &lt;br /&gt;
* Dr. Dobb&#039;s Journal. 9 June 2010. Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4081</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4081"/>
		<updated>2010-10-14T20:49:31Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect &lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is &lt;br /&gt;
that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for&lt;br /&gt;
the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. July 1993. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25]  &lt;br /&gt;
* Reeves and Snyder. 10 January 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* John Chan. 12 August 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] &lt;br /&gt;
* Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4080</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4080"/>
		<updated>2010-10-14T20:49:04Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect &lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is &lt;br /&gt;
that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for&lt;br /&gt;
the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. July 1993. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25]  &lt;br /&gt;
* Reeves and Snyder. January 10, 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* John Chan. August, 12, 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] &lt;br /&gt;
* Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4079</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4079"/>
		<updated>2010-10-14T20:48:49Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect &lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is &lt;br /&gt;
that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for&lt;br /&gt;
the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25] July 1993. &lt;br /&gt;
* Reeves and Snyder. January 10, 2006. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* John Chan. August, 12, 2008. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] &lt;br /&gt;
* Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4077</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4077"/>
		<updated>2010-10-14T20:47:53Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect &lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is &lt;br /&gt;
that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for&lt;br /&gt;
the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25] July 1993. &lt;br /&gt;
* Reeves and Snyder. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software]. January 10, 2006.  [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* John Chan. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] August, 12, 2008.&lt;br /&gt;
* Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4074</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4074"/>
		<updated>2010-10-14T20:44:06Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect &lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is &lt;br /&gt;
that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for&lt;br /&gt;
the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25] July 1993. &lt;br /&gt;
* Reeves and Snyder. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software],  [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* John Chan. Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html] August, 12, 2008.&lt;br /&gt;
* Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4073</id>
		<title>COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=COMP_3000_Essay_1_2010_Question_6&amp;diff=4073"/>
		<updated>2010-10-14T20:42:56Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* References */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Question=&lt;br /&gt;
&lt;br /&gt;
What are some examples of notable systems that have failed due to flawed efforts at mutual exclusion and/or race conditions? How significant was the failure in each case?&lt;br /&gt;
&lt;br /&gt;
=Answer=&lt;br /&gt;
&lt;br /&gt;
=Introduction=&lt;br /&gt;
&lt;br /&gt;
Race conditions have their fare share of notoriety in the history of software bugs. This may range from a piece of Java code causing an application to halt, the corruption of web services, or the failure of a life-critical system with fatal consequences. In this article, we will define race conditions, examine some of the most well known cases involving race conditions. We will also take a look at some of the solution schemes and ways the industry have proposed to track and detect race conditions.&lt;br /&gt;
&lt;br /&gt;
=Overview=&lt;br /&gt;
&lt;br /&gt;
A race condition occurs when two or more processes receive write access to shared data simultaneously. The end result may be incorrect &lt;br /&gt;
depending on the exact timing of those processes. Consequently a major system failure can occur. The main challenge with race condition errors is &lt;br /&gt;
that they&#039;re usually unpredictable and can be triggered in various ways depending on the processes involved and the surrounding environment, making it a nightmare for&lt;br /&gt;
the programmers to debug and track the error.&lt;br /&gt;
&lt;br /&gt;
=Examples=&lt;br /&gt;
== Therac-25 ==&lt;br /&gt;
&lt;br /&gt;
The Therac-25 was an x-ray machine developed in Canada by Atomic Energy of Canada Limited (AECL). The machine was used to treat people using radiation therapy. Between 1985 and 1987 six patients were given overdoses of radiation by the machine. Half these patients died due to the accident. The incident is quite possibly the most infamous software bug relating to race conditions. The cause of the incidents has been traced back to a programming bug which caused a race-condition.&lt;br /&gt;
The Therac-25 software was written by a single programmer in PDP-11 assembly language. Portions of code were reused from software in the previous Therac-6 and Therac-20 machines. &lt;br /&gt;
The main portion of the code runs a function called “Treat” this function determins which of the programs 8 main subroutines it should be executing. The Keyboard handler task ran concurrently with “Treat”.&lt;br /&gt;
&lt;br /&gt;
===Main Subroutines===&lt;br /&gt;
&lt;br /&gt;
The Therac-25 had 8 main subroutines it made use of. The Datent had its own helper routine called magnet which prepared the x-rays magnets to administer the correct dosage of radiation.&lt;br /&gt;
&lt;br /&gt;
#Reset&lt;br /&gt;
#Datent&lt;br /&gt;
##Magnet&lt;br /&gt;
#Set Up Done&lt;br /&gt;
#Set Up Test&lt;br /&gt;
#Patient Treatment&lt;br /&gt;
#Pause Treatment&lt;br /&gt;
#Terminate Treatment&lt;br /&gt;
#Date, Time, ID Changes&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine communicated with the keyboard hander task through a shared variable which signaled if the operator was finished entering the necessary data. Once the Datent subroutine sets the flag signifying the operator has entered the necessary information it allows the main program to move onto the next subroutine. If the flag was not set the “Treat” task reschedules itself in turn rescheduling the Datent subroutine. This continues until the shared data entry flag is set.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
The Datent subroutine was also responsible for preparing the x-ray to administer the correct radiation dosage. The subroutine was setup so that before returning to “Treat” instructions to move on to the next of its 8 subroutines it would first call the “Magnet” subroutine. This subroutine parsed the operators input and moved the x-ray machines magnets into position to administer the prescribed radiation. This magnet subroutine took approximately 8 seconds to complete and while it ran the keyboard handler was also running. If the operator modified the data before the “magnet” subroutine returned their changes would not be register and the x-ray strength would already be set to its prior value ignoring the operator’s changes.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
===Example Bug Situation===&lt;br /&gt;
&lt;br /&gt;
The situation below illustrates a chain of events that would result in an unintended dose of radiation being administered.&lt;br /&gt;
&lt;br /&gt;
#Operator types up data, presses return&lt;br /&gt;
#(Magnet subroutine is initiated)&lt;br /&gt;
#Operator realizes there is an extra 0 in the radiation intensity field&lt;br /&gt;
#Operator quickly moves cursor up and fixes the error and presses return again.&lt;br /&gt;
#Magnets are set to previous power level .subroutine returns &lt;br /&gt;
#Program moves on to next subroutine without registering changes&lt;br /&gt;
#Patient is administered a lethal overdose of radiation&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
== Black-out of 2003 ==&lt;br /&gt;
&lt;br /&gt;
An energy management system failed due to a race condition, ultimately leading to Ontario and parts of the United States experiencing a black-out.&lt;br /&gt;
&lt;br /&gt;
The incident occured on August 14th, 2003, when a power plant located in Eastlake, Ohio went offline. The system was set up so that if this were to occur, a warning would be sent to FirstEnergy&#039;s control center in Akron, Ohio. Upon recieving this warning, power would be re-routed through other plants to isolate the failure.However, no warning was recieved, resulting in a domino effect causing ultimately over 100 power plants to go offline.&lt;br /&gt;
&lt;br /&gt;
FirstEnergy at the time was using General Eletric&#039;s Unix-based XA/21 energy management system. This system was responsible for alerting the operators of the control center whenever there was a problem. Unfortunately, a flaw in the software caused the system to crash.The energy management system crashed silently, so that the operators at the control center had no idea they were not receiving alerts the otherwise would be. Without any warnings, the operators had no idea the power plant went offline, and so took no measures to prevent the cascading effect leading to the black-out.&lt;br /&gt;
 &lt;br /&gt;
===Cause of Race Condition===&lt;br /&gt;
&lt;br /&gt;
The XA/21 energy management system failed due to three sagging power lines being tripped simultaneously. These three seperate events then attempted to execute on a shared state, causing the main system to fail. A back-up server went online to attempt to handle the requests. By the time it kicked in the accumulation of events since the main system failure caused the back-up to fail as well.&lt;br /&gt;
&lt;br /&gt;
===Aftermath===&lt;br /&gt;
With the system failure that ultimately led to 256 plants going offline, a massive black-out was experienced in North Eastern USA and Ontario. It is estimated that 55 million people were effected by the black-out. Investigations in the aftermath revealed both negligence on FirstEnergy&#039;s part and revealed the deeply embedded bug within the XA/21 energy management system. The bug has since been fixed with a patch.&lt;br /&gt;
&lt;br /&gt;
== The NASA Mars-Rover ==&lt;br /&gt;
The NASA Mars-Rover incident is another well known case of system failure due to race conditions. The Mars-Rover is a six wheeled driven, four wheeled steered vehicle designed by NASA to navigate the surface of Mars in order to gather videos, images, samples or and possible data about the planet. NASA landed two Rover vehicles, the Spirit and Opportunity Rovers, on January 4 and January 25, 2004, respectively. The Rover was controlled on a daily basis by the NASA team on earth by sending messages and tasks. Each solar day in the life of the Rover is called a Sol. &lt;br /&gt;
&lt;br /&gt;
===Hardware design and architecture===&lt;br /&gt;
The vehicle&#039;s main operating equipment consists of a set of high-resolution cameras, a collection of specialized spectrometers and a set of radio antennas for transmitting and receiving data. The main computer was built around a BAE RAD-6000 CPU (Rad6k), RAM and non-volatile memory (a combination of FLASH and ROM). &lt;br /&gt;
&lt;br /&gt;
===Software design===&lt;br /&gt;
The Rover is controlled by the VxWorks real-time operating system.  The Rover flight software was mostly implemented in ANSI C, with some fragements of code written in C++ and assembly. &lt;br /&gt;
The rover relied on an autonomous system that enabled it to drive itself and carry out a number of self-maintenance operations. The system implements a time-multiplexing system, where all processes share and access resources on the single CPU. The Rover records progress through the use of three primary log-file systems: event reports (EVRs), engineering data (EH&amp;amp;A) and data products.&lt;br /&gt;
&lt;br /&gt;
===System failures and vulnerabilities===&lt;br /&gt;
The first race-condition bug occured in the Spirit Rover Sol 131. The initilazation module (IM) process was preparing to increment a counter that keeps track of the number of times an initilazation occured, in order to do that, the IM process must request permission and be granted access to write that counter to memory (critical section). While requesting the permission, another process was granted access to use that very same piece of memory (critical section). This resulted in the IM process generating a fatal exception through its EVR log. The exception lead to loss and trouble in transmitting data to the NASA team on earth, which eventually led to&lt;br /&gt;
the Rover being in a halt state for a few days. The NASA team attempted to solve the problem by rebooting the Rover and restricting another module from operating during that time-frame. However, the same bug reoccured in the Spirit Rover on Sol 209 and then on the Opportunity Rover on Sol 596 and Sol 622.&lt;br /&gt;
&lt;br /&gt;
A similar type of error occurred on the Spirit Sol 136, this time the Imaging Services Module (IMG) was involved. Just as the NASA team requested data from the Rover to be transmitted, the IMG was beginning a deactivation state, the IMG reading cycles from memory were suddenly interrupted by the deactivation process which was attempting to power off the piece of memory associated with the IMG reading task. This resulted in a failure to return the requested data from the Rover. &lt;br /&gt;
&lt;br /&gt;
==Windows Blue-Screens-Of-Death==&lt;br /&gt;
&lt;br /&gt;
When a problem in Windows forces the operation systems to fail, the computer often displays an error screen, know as Stop message, that describes the cause of the problem, most people called this a Blue Screen of Death (BSOD).&lt;br /&gt;
&lt;br /&gt;
The error 0X0000001a, MEMORY_MANAGEMENT, occurs because of the race condition of memory management. It is a hardware error related to memory management. It is possible that the computer can not timely get enough power to the memory for the process. &lt;br /&gt;
&lt;br /&gt;
The BSOD has surfaced on a number of Windows versions including Windows 7. It has also caused system failures in airports, ATM machines and street hoardings. However, the most notable public incident happened on the opening ceremony of the 2008 Beijing Summer Olympics in China, when one of the projectors crashed because of a BSOD bug.  &lt;br /&gt;
&lt;br /&gt;
=Conclusions=&lt;br /&gt;
The need to control race conditions and maintain concurrency and safe sharing of resources among &lt;br /&gt;
processes brings us to the concept of mutual exclusion (Mutex). Mutual exclusion is the idea of making sure &lt;br /&gt;
processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or &lt;br /&gt;
using a particular data structure (called a critical section), then no other process like B would be allowed&lt;br /&gt;
to execute or use that very same data structure (critical section) until process A finishes executing or decides&lt;br /&gt;
to leave the data structure. Common algorithms and techniques used to establish mutual exclusion include locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
A handful of commercial software tools have been developed to address and detect race conditions errors as well. More recently, a US software company that goes by the name of ReplaySolutions has been awarded a patent from the US government for developing an innovative kit for debugging race conditions found in software.  &lt;br /&gt;
&lt;br /&gt;
As the industry strives for faster and more efficient level of performance through the use of multi-processor systems and multi-core chips, this area continues to be a vast field for research and innovation within the computing world. &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=References=&lt;br /&gt;
* Nancy Leveson. [http://sunnyday.mit.edu/papers/therac.pdf Medical Devices: The Therac-25] July 1993. &lt;br /&gt;
* Reeves and Snyder. [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;amp;arnumber=1571113&amp;amp;userType=inst An Overview of the Mars Exploration Rovers&#039; Flight Software],  [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/37499/1/05-0539.pdf another source]&lt;br /&gt;
* Matijevic and E. Dewell. 2006 [http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf Anomaly Recovery and the Mars Exploration Rovers]&lt;br /&gt;
* Dreaded Blue Screen of Death strikes Olympics [http://news.cnet.com/8301-17938_105-10015872-1.html]&lt;br /&gt;
* Patent Awarded for Debugging Race Conditions [http://www.drdobbs.com/tools/225600068]&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4057</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4057"/>
		<updated>2010-10-14T20:19:16Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language (C/C++, Assembly)&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
*Common problem&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4054</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4054"/>
		<updated>2010-10-14T20:14:44Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language (C/C++, Assembly)&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4053</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4053"/>
		<updated>2010-10-14T20:14:17Z</updated>

		<summary type="html">&lt;p&gt;J powers: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 20:14, 14 October 2010 (UTC)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4051</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4051"/>
		<updated>2010-10-14T20:13:52Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* BSOD */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4050</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4050"/>
		<updated>2010-10-14T20:13:46Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Mars Rover */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4049</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4049"/>
		<updated>2010-10-14T20:13:37Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Blackout */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4048</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4048"/>
		<updated>2010-10-14T20:13:27Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Therac-25 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4047</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4047"/>
		<updated>2010-10-14T20:12:38Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
*At least 1 recurrence (except for the blackout)&lt;br /&gt;
*Human error (especially in Therac-25 and the blackout; preventable)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Lack of communication between developer and users&lt;br /&gt;
*Lack of testing &lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4037</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4037"/>
		<updated>2010-10-14T20:03:21Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
*No ideas about the root of the failure (each case required varied amounts of time to find it)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Lack of communication between developer and users&lt;br /&gt;
*Lack of testing &lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4034</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4034"/>
		<updated>2010-10-14T19:59:11Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
*Inability to test for all real-life situations (before release)&lt;br /&gt;
*Type of programming language&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Lack of communication between developer and users&lt;br /&gt;
*Lack of testing &lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4019</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4019"/>
		<updated>2010-10-14T19:50:35Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Therac-25 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Lack of communication between developer and users&lt;br /&gt;
*Lack of testing &lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4015</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4015"/>
		<updated>2010-10-14T19:49:59Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Therac-25 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Lack of communication&lt;br /&gt;
*Lack of testing&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4013</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=4013"/>
		<updated>2010-10-14T19:49:25Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Thesis */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
Common:&lt;br /&gt;
*Unexpected cases (infrequent occurrences and hard to duplicate conditions that caused the failure)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Other Therac-25 users were unaware of the accidents (until much later)-&amp;gt;Lack of communication&lt;br /&gt;
*Lack of testing&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3986</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3986"/>
		<updated>2010-10-14T19:22:24Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Therac-25 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Other Therac-25 users were unaware of the accidents (until much later)-&amp;gt;Lack of communication&lt;br /&gt;
*Lack of testing&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3985</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3985"/>
		<updated>2010-10-14T19:20:04Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Therac-25 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Other Therac-25 users were unaware of the accidents (until much later)&lt;br /&gt;
*Lack of testing&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3982</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3982"/>
		<updated>2010-10-14T19:18:53Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Therac-25 */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
*Other Therac-25 users were unaware of the accidents (until much later)&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3980</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3980"/>
		<updated>2010-10-14T19:16:32Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Blackout */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
*Ignored warning calls (reason why is stated above)&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3979</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3979"/>
		<updated>2010-10-14T19:15:04Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Blackout */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
*Operators relied on visual alerts and assumed the system was working correctly&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
	<entry>
		<id>https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3978</id>
		<title>Talk:COMP 3000 Essay 1 2010 Question 6</title>
		<link rel="alternate" type="text/html" href="https://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_6&amp;diff=3978"/>
		<updated>2010-10-14T19:13:08Z</updated>

		<summary type="html">&lt;p&gt;J powers: /* Blackout */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Hey guys, this is Munther. I&#039;m one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.&lt;br /&gt;
&lt;br /&gt;
Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.&lt;br /&gt;
&lt;br /&gt;
Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.&lt;br /&gt;
&lt;br /&gt;
Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.&lt;br /&gt;
&lt;br /&gt;
Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a &amp;quot;failure&amp;quot; we should check with the prof. Anyways, its a good starting point.&lt;br /&gt;
http://rosettacode.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well.&lt;br /&gt;
http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition&lt;br /&gt;
&lt;br /&gt;
Couple notable ones:&lt;br /&gt;
&lt;br /&gt;
The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html&lt;br /&gt;
&lt;br /&gt;
A blackout in 2003 was caused by a race condition in one of the power company&#039;s alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)&lt;br /&gt;
&lt;br /&gt;
--Andrew&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by &amp;quot;systems&amp;quot;, is any device based operating system. It doesn&#039;t necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.&lt;br /&gt;
&lt;br /&gt;
Other notable examples:&lt;br /&gt;
&lt;br /&gt;
1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well. &lt;br /&gt;
&lt;br /&gt;
Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772&lt;br /&gt;
&lt;br /&gt;
Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to &lt;br /&gt;
explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I&#039;m sure we can find a lot of resources for this.&lt;br /&gt;
&lt;br /&gt;
A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html&lt;br /&gt;
&lt;br /&gt;
- - - - - - - - - - -&lt;br /&gt;
&lt;br /&gt;
Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions :&lt;br /&gt;
http://www.google.ca/url?sa=t&amp;amp;source=web&amp;amp;cd=4&amp;amp;ved=0CCoQFjAD&amp;amp;url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&amp;amp;rct=j&amp;amp;q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&amp;amp;ei=FTCtTOzRN8mVnAeL-OThDA&amp;amp;usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&amp;amp;sig2=u2Qo9kdemxdCWAlH10GNeQ&lt;br /&gt;
&lt;br /&gt;
Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&amp;amp;coll=Portal&amp;amp;dl=GUIDE&amp;amp;CFID=104720795&amp;amp;CFTOKEN=13393160&lt;br /&gt;
&lt;br /&gt;
If anyone can&#039;t access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca&lt;br /&gt;
You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.&lt;br /&gt;
&lt;br /&gt;
I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.&lt;br /&gt;
&lt;br /&gt;
PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
--------------------&lt;br /&gt;
&lt;br /&gt;
Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I&#039;m ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you&#039;ve rounded up quite a bit of info on the subject already, great job!&lt;br /&gt;
&lt;br /&gt;
 Introduction Paragraph: Introduces the question and gives some general background etc.&lt;br /&gt;
 Paragraph 1: Gives first example in detail&lt;br /&gt;
 Paragraph 2: Gives second example in detail&lt;br /&gt;
 Paragraph 3: Gives third example in detail&lt;br /&gt;
 Conclusion: Relates it all back together or something (never been good with conclusions) &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I think each example paragraph should be broken down like this:&lt;br /&gt;
&lt;br /&gt;
 1. Introduction to the example&lt;br /&gt;
 2. What they tried to use the Multi-Threading to do (or something like that)&lt;br /&gt;
 3. Story of the system failing&lt;br /&gt;
 4. The significance/involvement of race condition and mutual exclusion in the failure&lt;br /&gt;
 5. Conclusion (how it was solved and stuff like that can go here too)&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:05, 11 October 2010 (UTC) (this date is wrong for this edit)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys, I&#039;m Fangchen. I am also in group 6. (So I might be the last member lol) &lt;br /&gt;
I found a chapter of a book from sun, which name of the chapter is Race Conditions and&lt;br /&gt;
Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.&lt;br /&gt;
&lt;br /&gt;
The link of the book chapter is here.&lt;br /&gt;
&lt;br /&gt;
http://java.sun.com/developer/Books/performance2/chap3.pdf&lt;br /&gt;
&lt;br /&gt;
On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
--Fangchen&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members.&lt;br /&gt;
It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant. &lt;br /&gt;
&lt;br /&gt;
Note:  This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken. &lt;br /&gt;
&lt;br /&gt;
I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.&lt;br /&gt;
&lt;br /&gt;
4.1 Blackout (pg. 5 – 6)&lt;br /&gt;
&lt;br /&gt;
4.3 Therac-25 (pg. 7 – 8)&lt;br /&gt;
&lt;br /&gt;
I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.&lt;br /&gt;
&lt;br /&gt;
Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions. &lt;br /&gt;
&lt;br /&gt;
Lastly, what is our plan on how divide the work for this essay?  Also do we want to meet in person someday?&lt;br /&gt;
&lt;br /&gt;
--[[User:J powers|J powers]] 16:08, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster.&lt;br /&gt;
--[[User:J powers|J powers]] 16:50, 9 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Cool, its good to have the other members of the group on board. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible. &lt;br /&gt;
&lt;br /&gt;
What Julie mentioned is right. The prof said that 3 examples are alright. But he&#039;s really looking for 4-5 cases. We need to impress him a little bit here. The other case he mentioned was the Blue-Screens-Of-Death incidents. I believe a mail man was killed because of that. I will try to find some information on that later on today. &lt;br /&gt;
&lt;br /&gt;
Also, if you guys wanna meet up a couple of days before the due date, thats ok by me. We can meet up in the Herzberg labs in the 4th floor, not the undergrad ones, the ones at the end of the hall. Or I can reserve a room for us in the library. Or if you just want to continue doing this online, I know that each one of us has probably a different schedule and everything.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Alright, Seems we needed more than i originally thought :p so i tweaked the other page to have 5 of them instead of 3.  I would absolutely like to meet up :D. Doing this online thing makes me feel wierd for some reason...&lt;br /&gt;
&lt;br /&gt;
But if we do meet up lets put all our discussion and decisions on the page here so it can get reviewed etc.&lt;br /&gt;
&lt;br /&gt;
If we are gonna meet up I would prefer Herzberg (not that it really matters, its just where i hang out anyways)&lt;br /&gt;
&lt;br /&gt;
Also is this due on tuesday or thursday?&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 03:06, 11 October 2010 (UTC) this date is wrong for this edit&lt;br /&gt;
&lt;br /&gt;
Started using tildes now thanks julie&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
Ok everyone write in here when you are available before the 14th&lt;br /&gt;
&lt;br /&gt;
 Daniel: all day Monday, Tuesday, and Thursday&lt;br /&gt;
 Munther: --&lt;br /&gt;
 Fangchen: --&lt;br /&gt;
 Andrew: After 12:30 Tues-Wed-Thurs&lt;br /&gt;
 Julie: Tuesday after 2:30, and Wednesday/Thursday after 1:00　[[User:J powers|J powers]] 19:32, 10 October 2010 (UTC)&lt;br /&gt;
 cha0s: monday in the afternoon, tuesday after 1, and all day wednesday&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Hey Everyone. Awesome looks like we have a lot of information and resources here to work from. Daniels template structure looks good and we should follow that. We should come up with a plan for executing this, what topics we want to cover and who would like to focus on what. I think the 3 big examples we&#039;ve found lots of resources for are the Therac-25, Mars Rover and the Blackout. The professor mentioned he&#039;d like to see some more exotic examples lets try and find some for examples 4/5.&lt;br /&gt;
&lt;br /&gt;
Layout we can build on.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Introduction&lt;br /&gt;
&lt;br /&gt;
Therac-25&lt;br /&gt;
&lt;br /&gt;
Mars Rover&lt;br /&gt;
&lt;br /&gt;
Blackout&lt;br /&gt;
&lt;br /&gt;
Example 4&lt;br /&gt;
&lt;br /&gt;
Example 5&lt;br /&gt;
&lt;br /&gt;
Conclusion&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
I&#039;m going to try and read up a bit more on the Therac-25 and put in a few paragraphs today.&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 21:55, 10 October 2010 (UTC) (did not know about the 4 tildes thing, thanks for sharing)&lt;br /&gt;
----&lt;br /&gt;
I do not mind which topic I write about but I feel a personal connection with the blackout. My hometown was affected for a long time and there were concerns about chemical plants nearby. Therefore I have an interest in writing/researching about it.&lt;br /&gt;
&lt;br /&gt;
Has the group member above (&amp;lt;strike&amp;gt;Could you please put your name? Was it Andrew?&amp;lt;/strike&amp;gt;) decided on Therac-25 then? &lt;br /&gt;
&lt;br /&gt;
Also I have noticed that everyone has not been using 4 tildes. I am not sure if this how the professor knows who wrote what but it would not hurt to use it (Less to type as well). &lt;br /&gt;
&lt;br /&gt;
Any ideas on a deadline for all of our writing?&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:05, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I tried writing up a bit about the Therac-25. Still pretty rough but its a start.&lt;br /&gt;
&lt;br /&gt;
Good information in this paper http://sunnyday.mit.edu/papers/therac.pdf&lt;br /&gt;
&lt;br /&gt;
Pages 22-28 deal with the software bug&lt;br /&gt;
&lt;br /&gt;
[[User:Atubman|Atubman]] 23:27, 10 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, I&#039;m guessing I&#039;m the last member, putting us at 6. I&#039;ll post what I&#039;ve got for my section later tonight. I&#039;m good to meet monday in the afternoon, tuesday after 1, and all day wednesday.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 20:00, 10 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Looks like tuesday is a good day, wait to see for the rest to confirm?&lt;br /&gt;
[[User:Dsont|Dsont]] 03:08, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Yo, after looking around a bit, it seems like it might be better to just cover three topics in greater depth, as the three we have currently have a lot of documentation. This will also demonstrate the ability we have to work together more so than us doing a seperate paragraph each&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 3:02, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
------&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
Hey guys. Like I mentioned before, I will handle the editing, introductory paragraph, conclusions and the Mars-Rover incidents case. In the mean time, I strongly urge other members of the group to look into the Blackout case and try to find us another case like the Blue-Screens-of-Death which the prof mentioned in class. Most of the cases I found were all software related. Nothing major. So it would be great to have someone help with the research. We we will try as much as possible to deliver 4 cases.&lt;br /&gt;
&lt;br /&gt;
-- Munther --[[User:Hesperus|Hesperus]] 16:21, 11 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ve been looking for a while now, and I can&#039;t find any major system failures related to the topic except the three we already have. I&#039;ll focus my research on the blackout case for now. &lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 16:34, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
Posted a rough section for the 2003 Black-Out. Will add citations and contribute to the Therac 25 section later tonight. If anyone has found a fourth topic, post it and i&#039;ll try and find some more info on it.&lt;br /&gt;
&lt;br /&gt;
[[User:cha0s|cha0s]] 18:54, 11 October 2010 (EDT)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
Hey guys. I&#039;ve edited the article, provided an introduction and an overview piece. Plus, I&#039;ve posted the first part of the Mars-Rover incident. This is just a rough version. The article of course needs further editing. I will keep editing and updating the Mars-Rover case in the next 24 hours. I also started a section for the Blue-Screens-Of-Death incidents. I don&#039;t think theres any harm in doing that, I&#039;ve found that this was a fairly common problem in some versions of Windows leading to a handful of system failures in airports, electronic hoardings, it even happened at the Beijing Summer Olympics of 2008 ! So this could be a potential case as well. I will try to consult the prof regarding this today, he might provide us with some hints or crucial talking points.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 06:20, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I guess ill do Blue Screens then&lt;br /&gt;
&lt;br /&gt;
[[User:Dsont|Dsont]] 13:36, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
Ok, so in today&#039;s lecture, Thomas (chaOs) inquired about the essay and the prof mentioned that three cases would be enough. But if we wanna go fancy, a fourth case might be a good idea. I think it would be a lot better if we we focus on the three cases at hand and leave the blue-screens-of-death to the end. The prof also talked about plagiarism and emphasized the need to be &#039;&#039;&#039;original&#039;&#039;&#039;. Even if we cite the resources, the article itself has to be original in the sense that it carries through the reader&#039;s understanding. So no copy and pasting will be tolerated. In fact, I&#039;m going back to the Mars-Rover incident to do a re-edit and make sure theres no direct phrasing or imitation of style. He suggested that it would be a good idea to read and understand the article and then put it away and try to phrase and deliver the concepts and notions using one&#039;s words. It would be ok to use the exact scientific terms, though. Theres no escaping that I guess.&lt;br /&gt;
 &lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:35, 12 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Hey, If you guys want more things to talk about, the Linux kernel has suffered many a race condition failure leading to security vulnerabilities that allow root / kernel level access.  I remember one from a while ago that hit Slashdot where a local user could cause a race condition that caused a null pointer (a pointer that&#039;s essentially set to 0x00000000) to be dereferenced resulting in the kernel trying to execute at address 0.  Now if you stick your own code at 0, you can now run your own code in the kernel ;)&lt;br /&gt;
&lt;br /&gt;
--[[User:3maisons|3maisons]] 19:19, 12 October 2010 (UTC)&lt;br /&gt;
-----&lt;br /&gt;
Hey guys, I saw that there might be some documentation lack of blue-screen-death. I found this article of how the problem of blue screen occurs. http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=2bGxMzOtUMsC&amp;amp;oi=fnd&amp;amp;pg=PR15&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=aYecJYK84q&amp;amp;sig=vXttqNmGEONz3K8Txt3PkLsJze4#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false &lt;br /&gt;
&lt;br /&gt;
On page 54, it described the reason why that happened.&lt;br /&gt;
&lt;br /&gt;
http://books.google.com/books?hl=zh-CN&amp;amp;lr=&amp;amp;id=cp0k20nfMBcC&amp;amp;oi=fnd&amp;amp;pg=PR6&amp;amp;dq=Blue-Screens-of-Death&amp;amp;ots=PDaXQZiTdu&amp;amp;sig=AGmADvRIu1VTdBjMI1csIFWmn9o#v=onepage&amp;amp;q=Blue-Screens-of-Death&amp;amp;f=false&lt;br /&gt;
&lt;br /&gt;
And here is an example how blue-screen affects people&#039;s life. I think this book might be useful since it is related to software performance.&lt;br /&gt;
&lt;br /&gt;
BTW,i&#039;ll be available the whole afternoon tomorrow.&lt;br /&gt;
&lt;br /&gt;
---Fangchen&lt;br /&gt;
------&lt;br /&gt;
I found the only explain of BOSD is that error 0X0000001a occurs because of the race condition of memory usage, but there is no further explain. Have any one found something on that?&lt;br /&gt;
&lt;br /&gt;
---Fangchen 21:40, 14 October 2010&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
Yo, I&#039;ll be at herzberg around 12-12:30 tommorow if you guys want to meet up.&lt;br /&gt;
&lt;br /&gt;
--[[User: cha0s|cha0s]] 3:40, 13 October 2010&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m currently having office hours in HP 1175 from 10 am - 12 pm. I will try to drop by the labs in the third and fourth floor to meet up with chaOs. Anyways, I will be finishing the Mars-Rovers part today and I will re-edit the overview and the introduction as well. Other members of the group should probably help with the Therac-25, that case is supposed to be the most important one in the whole essay.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:01, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Just re-edited the Mars Rover and BSOD sections (just added a few examples to the incident, didn&#039;t alter the main content). Provided resources as well.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 15:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;m in the lounge right now.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 11;57, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry dude. I had to leave. Best chance for us is to meet tomorrow after the lecture. Like mentioned before, I will make sure that the Mars-Rover section is finished today. chaOs is doing the Blackout. I don&#039;t think theres much to add to the BSOD. Atubman wrote the first blurb about the Therac-25, if you could go back and to refine it a little bit and provide the resources, that would be great. Other members should help as well. I&#039;ll try to do the conclusions today If I could. I&#039;m also thinking about seeing the prof tomorrow in his office hours, he might give us some tips as far as presenting the cases and all.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:44, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Sorry I have not been participating lately. I had a group presentation today but now I am free to work on this essay. I will gladly meet after class tomorrow and help until 3007. After 3007, I can work for the rest of the day. Tonight I will try to read about Therac-25 and write more in that section. I also have ideas to contribute to the blackout section.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 21:02, 13 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey guys. Just did another edit. The Rover case is now finished. I can also see that Atubman refined the Therac-25 case. I added a single line to that section, again, I didn&#039;t alter the main content at all.&lt;br /&gt;
&lt;br /&gt;
Wrote a little something for the conclusions and moved the mutual exclusion paragraph from the overview to the conclusions, since we didn&#039;t really talk about any mutual exclusion techniques or solution throughout the cases, so why mention them there ? However, having them in the conclusions section at the end is a bit jerky I guess, because we&#039;re introducing this whole concept at the end of the article. Also, the resources used throughout the article must be mentioned in the resources section.&lt;br /&gt;
&lt;br /&gt;
If anyone wants to help with the editing as far as grammar or vocab goes, please do so. I will be seeing the prof in his office hours tomorrow, if anyone wants to join me, that would be great. After our lecture, I have a class from 11:30 to 1:00 pm and then another one from 4:30 pm to 5:30 pm, in case you guys wanna meet up.&lt;br /&gt;
&lt;br /&gt;
I think we&#039;re pretty much set to go. The prof wanted three cases, we did four, so this has to mean something.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 05:34, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am currently in HP4115 if anyone is around. Or is everyone meeting somewhere else? Munther, I can come with you after 3007 to talk to Anil. I need to ask him about what I am planning to contribute. &lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Hey Julie. Yeah I&#039;m definitely seeing the prof today at 1:00 pm, so I&#039;ll see you there. I think the essay is pretty much done, we just need to refine the conclusion a little bit, and thats what I&#039;m planning on asking him. Also, guys please add the resources that were used, we don&#039;t wanna get into any trouble.&lt;br /&gt;
&lt;br /&gt;
Also, I&#039;m currently thinking of some potential questions that we might add to the end of the essay, like the prof suggested today. &lt;br /&gt;
Heres some ideas:&lt;br /&gt;
&lt;br /&gt;
* What is the main idea behind race conditions errors ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; more like a definition.&lt;br /&gt;
* What are some of the techniques used to establish mutual exclusion and how do they work ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; locks, semaphores, busy waiting &amp;amp; monitors. Refer to the textbook for the details.&lt;br /&gt;
* How does Windows and Linux differ in terms of handling race conditions and applying mutual exclusion ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; I honestly have no idea, but I&#039;m pretty sure Linux uses semaphores. I will discuss this with the prof today.&lt;br /&gt;
* What are the mechanisms that Linux uses to apply mutual exclusion (or even synchronization for that matter) ? &lt;br /&gt;
&#039;&#039;&#039;Answer:&#039;&#039;&#039; Semaphores, pipes, signals. Processes can generate signal to notify other processes that a specific event is occurring in a particular data structure.&lt;br /&gt;
&lt;br /&gt;
I might add this section today prior to midnight if I end up with some potential talking points. I will also edit  the overview and the conclusion.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 14:48, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I am working on revising at the moment. I read through and revised the introduction.  &lt;br /&gt;
&lt;br /&gt;
The first question is fine but I do not see how the last two (possibly three; we do talk about techniques and Windows briefly) questions relate to our essay specifically. They relate more to the classroom material. Maybe we should have something like &amp;quot;Describe (at least? or three?) two famous system failures caused by race conditions. Why did they occur and what were the consequences of their failures?&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
[[User:J powers|J powers]] 15:12, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in going to see the prof right now. Yeah, the questions somehow relate more to the class material.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 16:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
I&#039;ll be on later tonight. I&#039;ll expand the black-out section and contribute anything i find to the other sections then.&lt;br /&gt;
&lt;br /&gt;
--[[User:cha0s|cha0s]] 14:24, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
I&#039;m in the library, 4th floor, near the computers if anyone wants to join me. If you&#039;re in the lower flowers, just post something here and I&#039;ll come down to see you. I&#039;ll be here for the next 2 or 3 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:28, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Julie and I are in the 4th floor of Herzberg labs, its the graduate lab at the end of the hall. We will be here for the next 3 or 4 hours.&lt;br /&gt;
&lt;br /&gt;
Munther --[[User:Hesperus|Hesperus]] 18:52, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
Brainstorming-Patterns&lt;br /&gt;
=Thesis=&lt;br /&gt;
Everyone we need to agree on a thesis ASAP. Our cases are not connected. The Professor told us to look for patterns that are common to each case. We should incorporate these into each section and form of thesis around them as well. [[User:J powers|J powers]] 18:58, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Therac-25=&lt;br /&gt;
*Believed that there was nothing wrong with the software (suspected hardware)&lt;br /&gt;
*Both the operators and the developer trusted the machine &lt;br /&gt;
*Programmed in Assembly&lt;br /&gt;
[[User:J powers|J powers]] 19:06, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Blackout=&lt;br /&gt;
*Spent weeks finding the race condition which implies that they did not understand why their system failed&lt;br /&gt;
*Programmed in C/C++&lt;br /&gt;
[[User:J powers|J powers]] 19:13, 14 October 2010 (UTC)&lt;br /&gt;
&lt;br /&gt;
=Mars Rover=&lt;br /&gt;
&lt;br /&gt;
=BSOD=&lt;/div&gt;</summary>
		<author><name>J powers</name></author>
	</entry>
</feed>