Talk:COMP 3000 Essay 1 2010 Question 6

From Soma-notes
Revision as of 15:44, 9 October 2010 by Hesperus (talk | contribs)
Jump to navigation Jump to search

Hey guys, this is Munther. I'm one of the members of the group assigned to this question. Before we start, let me just say that since this is a collective piece of work thats supposed to include contributions from each member of the group, let us all assume the role of the editor. So we will all contribute and help edit the final version of the article.

Regarding our question. As a starting point, I figured it would be appropriate to start defining what mutual exclusion (mutex) and race conditions mean. Lets start with race conditions, since mutual exclusion basically came to life because of the need to control race conditions.

Race conditions: situations where one or more processes are trying to write, read or access the same piece of data, and the final result depends on who runs precisely when. Look at the text book in pages 117-118 for a detailed example of that.

Mutual exclusion (mutex): the idea of making sure that processes access data in a serialized way. Meaning that, if process A for instance, happens to be executing or using a particular data structure (called a critical section), then no other process like B would be allowed to execute or use that very same data structure (critical section) until process A finishes executing or decides to leave the data structure. Common algorithms and techniques used in mutual exclusion include: locks, semaphores and monitors.

Our question asks for examples of systems that have failed due to flawed efforts. For starters, this is a wiki-programming page (Rosetta code) that examines race conditions and offers an example from the Unix/Linux operating systems, whether the example mentioned here is considered a "failure" we should check with the prof. Anyways, its a good starting point. http://rosettacode.org/wiki/Race_condition

Heres also a paper that goes back to 1992, which basically examines the excessive amount of expenses and resources used in older versions of the Unix system when implementing mutual exclusion. The paper goes to explain the problem and offers a better solution. Its pretty easy to follow and understand, worth reading as well. http://www.usenix.org/publications/library/proceedings/sa92/moran.pdf

-- Munther



Hey Andrew here another member of this group. Those are some good starting points. The Wikipedia page on race conditions have references to a few good examples http://en.wikipedia.org/wiki/Race_condition

Couple notable ones:

The Therac-25 x-ray machine which killed a bunch of people http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Side_bar_1.html

A blackout in 2003 was caused by a race condition in one of the power company's alarm systems http://www.securityfocus.com/news/8412 (really awful block of text)

--Andrew



Alright, so the things that the prof mentioned in our last lecture proved to be super helpful. Basically, what he means by "systems", is any device based operating system. It doesn't necessarily has to be a PC-based operating system (Windows, Linux, etc.). So the Therac-25 story mentioned by Andrew in the above post is a prime example of the type of things we might be looking for.

Other notable examples:

1. The Opportunity Mars-Rover 1116 incident. (A rover is basically a space exploration vehicle designed to navigate the surface of a planet in order to gather images, samples or any possible information about that particular surface.). The rover experienced a rare unexpected error due to a race-conditions fault. For some reason, this seems to be a fairly common problem for those Mars-Rovers, since the same kind of error was experienced on the Spirit Mars-Rover as well.

Heres an overview of the Opportunity 1116 incident from MarsToday : http://www.marstoday.com/news/viewsr.html?pid=23772

Heres a paper that examines the race conditions experienced on those rovers, discuses the Spirit Rover incident and even goes to explain the underlying architecture of the rover hardware: http://trs-new.jpl.nasa.gov/dspace/bitstream/2014/39897/1/06-0922.pdf


2. A file-system based type of race condition involves an older version of the Unix operating system, in which the user-mode can actually be bypassed, allowing the user to access the entire system. I can see this being considered an error or a case of failure as well. This actually may be a bit more approachable, as far as understanding the Unix kernel and stuff like that, I'm sure we can find a lot of resources for this.

A small article exploring the issue: http://www.osdata.com/holistic/security/attacks/racecond.html

- - - - - - - - - - -

Heres also a paper that examines Race Conditions in depth, talks about the importance of mutual exclusion and provides a number of solutions : http://www.google.ca/url?sa=t&source=web&cd=4&ved=0CCoQFjAD&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.1.5897%26rep%3Drep1%26type%3Dpdf&rct=j&q=race%20conditions%20case%20study%20steve%20carr%2010.1.1.1&ei=FTCtTOzRN8mVnAeL-OThDA&usg=AFQjCNHdyHdeFSpES0nMjzb7lPkFxKwC2g&sig2=u2Qo9kdemxdCWAlH10GNeQ

Heres another paper from the ACM Portal: http://portal.acm.org/citation.cfm?id=130616.130623&coll=Portal&dl=GUIDE&CFID=104720795&CFTOKEN=13393160

If anyone can't access the pdf files on the ACM Portal or even CiteSeer for that matter, you need to log in to the netwrk using your Carleton library account. Go to the following: http://portal.acm.org.proxy.library.carleton.ca You will be prompted to enter your Student ID card barcode number, thats the number below your name on your student ID. And the password is your CarletonCentral password.

I think so far we have managed to gather a handful amount of cases. In the next couple of days, we should probably delve deeper into some of those cases.

PS: If you wanna contact me, go to my profile in the history tab. Click on Hesperus.

-- Munther


Hey guys, I am Daniel. I am also in group 6 (Am i the final group member?). I'm ready to help get this show on the road! I am going to set up a basic essay structure on the other page so that we know what to aim for. You guys look like you've rounded up quite a bit of info on the subject already, great job!

Introduction Paragraph: Introduces the question and gives some general background etc.
Paragraph 1: Gives first example in detail
Paragraph 2: Gives second example in detail
Paragraph 3: Gives third example in detail
Conclusion: Relates it all back together or something (never been good with conclusions) 


I think each example paragraph should be broken down like this:

1. Introduction to the example
2. What they tried to use the Multi-Threading to do (or something like that)
3. Story of the system failing
4. The significance/involvement of race condition and mutual exclusion in the failure
5. Conclusion (how it was solved and stuff like that can go here too)


Hey guys, I'm Fangchen. I am also in group 6. (So I might be the last member lol) I found a chapter of a book from sun, which name of the chapter is Race Conditions and Mutual Exclusion.There are some examples on race condition in Java programming which i think we could study for sure.

The link of the book chapter is here.

http://java.sun.com/developer/Books/performance2/chap3.pdf

On page 2 of the pdf file, there is a first example of race condition. I think this might be useful in our essay as a case study.


--Fangchen


My name is Julie and I believe that I am the last group member. Our professor said that every group has 5 to 6 members. It appears that we have quite the list of resources. Are we planning to use them all? It might be a good idea to list the resources we believe are the most relevant.

Note: This link, http://www.osdata.com/holistic/security/attacks/racecond.html, is broken.

I only have one resource to add. I found a paper that summarizes information about Therac-25 and the blackout of 2003: http://x4.6times7.org/downloads/software_catastrophes.pdf.

4.1 Blackout (pg. 5 – 6)

4.3 Therac-25 (pg. 7 – 8)

I think we should agree on a thesis soon. Currently the examples in our essay are not connected by a central argument. If we have time, I think we should try to find another example (assuming we have agreed to write about Therac-25, the blackout of 2003 and the Mars rovers). Prof. Anil said that he was expecting four to five examples. Three examples is a minimum. I have been trying to search for one that is not as well known (as encouraged in class) but I have not had any luck.

Are the series of Mars rovers (Opportunity and Spirit from 2004-2005) the most recent examples? I have not found any that are more recent so far. I wonder if systems programmers have learned from these past failures. I noticed, while searching for resources, that researchers have developed/are now developing tools and strategies to detect race conditions.

Lastly, what is our plan on how divide the work for this essay? Also do we want to meet in person someday?

--J powers 16:08, 9 October 2010 (UTC)

One suggestion I have for dividing the work is for everyone to write a paragraph of the essay or about a specific disaster. --J powers 16:50, 9 October 2010 (UTC)



Cool, its good to have the other members of the group on board as well. I will handle the editing and the introductory paragraph. I will try to make it as academic as possible.

-- Munther