Soma-notes - User contributions [en]

COMP 3000 Essay 2 2010 Question 2

2010-12-03T01:43:38Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===Monolithic vs Modular===

The internet today has become a important part in our life, it used everywhere and in everyday life. It all started as if the user is allowed to read and browse the content but today the expansion has led to have ability to write our content and have impact on rest of the world. The introduction of web 2.0 application brought greater change to the web and top of that bringing security concerns. Modern web applications are mostly designed to meet the new modular architecture.

Modular designed architecture browser such as Chrome is designed in a way that each browsing instance is assigned to a one operating system process. It is designed to better in terms of fault-tolerance, accountability, security, memory management and performance. Chrome has the ability to be able to run even when other web program crashes. Memory management in modular architecture is designed to handle each process at a time and when the program closes it is ready to be used in a another program. Scheduling for modular browsers handled at the OS level and web programs are able to run parallel.

Monolithic architectures browser such as Firefox, are monolithic and easy to exploit, since they run in a single address space. It is designed in a way that all the components for the web run in a single process. Disadvantage about monolithic architectures are if a single web program crashes all other web browsers crashes which causes the data to be lost which is saved in JavaScript in memory. Memory management is also considered poor in the architecture because it allocates memory at the beginning of the web program and might contain leaks which at the end of the program still be huge.

===TCB===

The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency which leads to reduced TCB. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

An IBOS label contains a protocol, domain and port. By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation. IBOS also contains tabs abstraction which multiplexes the display between different web pages instances. Input devices are programmed to route to a visible tab, which prevents keylogger from hijacking another instance's session.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

IBOS have the same functionality as Firefox and Chrome which uses the standard Web kit but it gives better performance.

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-03T01:22:22Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===Monolithic vs Modular===

The internet today has become a important part in our life, it used everywhere and in everyday life. It all started as if the user is allowed to read and browse the content but today the expansion has led to have ability to write our content and have impact on rest of the world. The introduction of web 2.0 application brought greater change to the web and top of that bringing security concerns. Modern web applications are mostly designed to meet the new modular architecture.

Modular designed architecture browser such as Chrome is designed in a way that each browsing instance is assigned to a one operating system process. It is designed to better in terms of fault-tolerance, accountability, security, memory management and performance. Chrome has the ability to be able to run even when other web program crashes. Memory management in modular architecture is designed to handle each process at a time and when the program closes it is ready to be used in a another program. Scheduling for modular browsers handled at the OS level and web programs are able to run parallel.

Monolithic architectures browser such as Firefox, are monolithic and easy to exploit, since they run in a single address space. It is designed in a way that all the components for the web run in a single process. Disadvantage about monolithic architectures are if a single web program crashes all other web browsers crashes which causes the data to be lost which is saved in JavaScript in memory. Memory management is also considered poor in the architecture because it allocates memory at the beginning of the web program and might contain leaks which at the end of the program still be huge.

===TCB===

The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency which leads to reduced TCB. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-03T01:17:01Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===Monolithic vs Modular===

The internet today has become a important part in our life, it used everywhere and in everyday life. It all started as if the user is allowed to read and browse the content but today the expansion has led to have ability to write our content and have impact on rest of the world. The introduction of web 2.0 application brought greater change to the web and top of that bringing security concerns. Modern web applications are mostly designed to meet the new modular architecture.

Modular designed architecture browser such as Chrome is designed in a way that each browsing instance is assigned to a one operating system process. It is designed to better in terms of fault-tolerance, accountability, security, memory management and performance. Chrome has the ability to be able to run even when other web program crashes. Memory management in modular architecture is designed to handle each process at a time and when the program closes it is ready to be used in a another program. Scheduling for modular browsers handled at the OS level and web programs are able to run parallel.

Monolithic architectures browser such as Firefox, are monolithic and easy to exploit, since they run in a single address space. It is designed in a way that all the components for the web run in a single process. Disadvantage about monolithic architectures are if a single web program crashes all other web browsers crashes which causes the data to be lost which is saved in JavaScript in memory. Memory management is also considered poor in the architecture because it allocates memory at the beginning of the web program and might contain leaks which at the end of the program still be huge.

===TCB===

The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-03T01:10:50Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===Monolithic vs Modular===

The internet today has become a important part in our life, it used everywhere and in everyday life. It all started as if the user is allowed to read and browse the content but today the expansion has led to have ability to write our content and have impact on rest of the world. The introduction of web 2.0 application brought greater change to the web and top of that bringing security concerns. Modern web applications are mostly designed to meet the new modular architecture.

Modular designed architecture browser such as Chrome is designed in a way that each browsing instance is assigned to a one operating system process. It is designed to better in terms of fault-tolerance, accountability, security, memory management and performance. Chrome has the ability to be able to run even when other web program crashes. Memory management in modular architecture is designed to handle each process at a time and when the program closes it is ready to be used in a another program. Scheduling for modular browsers handled at the OS level and web programs are able to run parallel.

Monolithic architectures browser such as Firefox, are monolithic and easy to exploit, since they run in a single address space. It is designed in a way that all the components for the web run in a single process. Disadvantage about monolithic architectures are if a single web program crashes all other web browsers crashes which causes the data to be lost which is saved in JavaScript in memory.

===TCB===

The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-03T00:48:12Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===Monolithic vs Modular===

The internet today has become a important part in our life, it used everywhere and in everyday life. It all started as if the user is allowed to read and browse the content but today the expansion has led to have ability to write our content and have impact on rest of the world. The introduction of web 2.0 application brought greater change to the web and top of that bringing security concerns. Modern web applications are mostly designed to meet the new modular architecture.

Modular designed architecture browser such as Chrome is designed in a way that each browsing instance is assigned to a one operating system process. It is designed to better in terms of fault-tolerance, accountability, security, memory management and performance. Chrome has the ability to be able to run even when other web program crashes. Memory management in modular architecture is designed to handle each process at a time and when the program closes it is ready to be used in a another program. Scheduling for modular browsers handled at the OS level and web programs are able to run parallel.

Web browsers, such as Firefox, are monolithic and easy to exploit, since they run in a single address space. "Secure" web browsers are not enough, since they still have a huge TCB (including the TCP stack, X server, file system, drivers, etc.). A microkernel would be better, but still need to trust all of the system components.

The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-02T23:42:24Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===Monolithic vs Modular===

The internet today has become a important part in our life, it used everywhere and in everyday life. It all started as if the user is allowed to read and browse the content but today the expansion has led to have ability to write our content and have impact on rest of the world. The introduction of web 2.0 application brought greater change to the web and top of that bringing security concerns. Modern web applications are mostly designed to meet the new modular architecture.

Web browsers, such as Firefox, are monolithic and easy to exploit, since they run in a single address space. "Secure" web browsers are not enough, since they still have a huge TCB (including the TCP stack, X server, file system, drivers, etc.). A microkernel would be better, but still need to trust all of the system components.

The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

===Security===

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==
You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-02T23:39:39Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===Monolithic vs Modular===

The internet today has become a important part in our life, it used everywhere and in everyday life. It all started as if the user is allowed to read and browse the content but today the expansion has led to have ability to write our content and have impact on rest of the world. The introduction of web 2.0 application brought greater change to the web and top of that bringing security concerns.

Web browsers, such as Firefox, are monolithic and easy to exploit, since they run in a single address space. "Secure" web browsers are not enough, since they still have a huge TCB (including the TCP stack, X server, file system, drivers, etc.). A microkernel would be better, but still need to trust all of the system components.
The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

===Security===

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==
You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-02T23:20:28Z

Vviveka2:

==Paper==
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Background Concepts==
In the world we are in, the web is everywhere and runs in different operating systems and browsers. The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It was developed by three graduate students at the University of Illinois. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. These attackers are always finding new ways of exploiting even the most secure systems. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow [[#References | [1]]].

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks [[#References | [1]]]. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted computing base(TCB). IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computer's TCB.

===TCB===
The TCB is the hardware and software that is critical to the computer's security. It's a combination of kernel and trusted processes made of hardware, firmware, and software that are critical to a computer's security [[#References | [2]]]. In the words of Lampson et al., TCB is defined as:

''"a small amount of software and hardware that security depends on and that we distinguish from a much larger amount that can misbehave without affecting security."'' [[#References | [3]]].

Modern operating system-browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, you make it smaller, thereby reducing the risk of having an attack get inside.

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are continuously being revised and updated to keep ahead of the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system and its libraries. Since the browser runs lower on the shared storage stack, a successful attack on a browser can have horrible repercussions because it gives access to all of the browser data for all of the web application. It also provides the attacker with access to other resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system because the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent and decrease the attacks on the browser, libraries, operating systems and system services.

==Contribution==

===Architecture and Design===
The authors have developed IBOS to reduce security risks, without compromising speed and efficiency. One of the ways they have achieved this is through the use of process creation. Essentially there are two types of processes. A web page instance and a traditional process. Any time the user opens a new tab, clicks on a link, or enters a web address in the uniform resource locator(URL) bar, the IBOS kernel creates a new process. Upon creating a web page instance process, the kernel labels it with the originating address of the HTTP request. If a web site such as ''facebook.com'' decides to host an outside script, also known as an iframe, from another website, the kernel creates a new process for the embedded script and labels it appropriately. Traditional processes are every other process that is created for the local machine. These processes are simply labeled as ''localhost''.

By creating unique labels for each web page instance, the kernel can isolate them from one another. This prevents a compromised component from taking control of other processes. Also by labeling where requests come from, the IBOS kernel can ensure that the data it is receiving is in fact from the expected origin.

IBOS has considerably smaller TCB compared to other modern browsers. Where both Chrome and Firefox come in at over 4 million plus lines of code in their trusted computing base, IBOS has only about 42,000. Since IBOS isolates each process, it was also able to prevent between 75-100% of vulnerabilities from affected components on a machine. Using Chrome, the researchers tested 175 known issues on the IBOS kernel which ranged from memory exploits to interface spoofing. Out of all the known issues, IBOS was able to prevent 135 or 77% of the issues whereas Chrome was only able to contain 83 of them. The issue is that Chrome is able to catch exploits in its rendering engine since it is in a sandbox but any exploits that took advantage of the browser kernel could not be prevented. This is not a problem for IBOS because many of the browser components inside the trusted computing base in Chrome have been brought outside of the IBOS TCB limiting what can be done with exploitation.

===Performance===
In terms of performance, IBOS is comparable to the two best performing web browsers currently released: Firefox and Chrome. For websites such as Google Maps and Facebook, IBOS actually performs much better than Firefox while loading pages. This may be due partly to the fact that IBOS was developed with the WebKit engine, which has been optimized to run Google Maps. For Facebook and Wikipedia, sites that use many HTTP requests, IBOS performs slightly slower than the other two browsers, but for the others, where there are only a few HTTP requests, IBOS runs just as quickly as the others.

===Security===

==Critique==

===Structure===
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

===Evaluation===
The evaluation of the IBOS security has some flaws,it is not very thorough and the data set the testing against is potentially confounding.

The IBOS has shown through internal testing that it is able to resist 77% of attacks from a set of 175 security bugs whereas Chrome is only able prevent 46%. The improvement sounds impressive however, the set of security bugs they tested against was obtained from Google “Chrome’s bug tracker”. The fact they are comparing known security flaws in Chrome against the new IBOS makes their improvement of 31% far less impressive.

In addition, Their initial test set contained 217 bugs with duplicates removed, and 42 bugs were omitted because they were denial of service attacks and the IBOS does not protect against that form of attack. It is understandable this is out of the scope of this research. However, that is a big set of flaws which are not addressed.

Furthermore, the researches only compared their results against the Chrome web browser. A comparison which also includes other browsers such as Mozilla Firefox, Internet Explorer and Safari would be much more compelling.

==References==
You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.

[1] CVE - Common Vulnerabilities and Exposures (CVE). http://cve.mitre.org.

[2] Rushby, John (1981). "Design and Verification of Secure Systems". 8th ACM Symposium on Operating System Principles. Pacific Grove, California, US. pp. 12–21.

[3] B. Lampson, M. Abadi, M. Burrows and E. Wobber, Authentication in Distributed Systems: Theory and Practice, ACM Transactions on Computer Systems 1992, on page 6.

COMP 3000 Essay 2 2010 Question 2

2010-12-02T08:00:52Z

Vviveka2:

=Paper=
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are constantly being revised and updated to keep up with the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system/libraries. A successful attack on a browser can have horrible repercussion because these occur lower on the shared storage stack than the attacks on the applications because it gives access to all the browser data for all the web application and also provides the attacker with the access to other system resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system this is due to the fact that the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent or/and to decrease the attacks on browser, libraries, operating systems and system services.

=Background Concepts=
The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. Just recently, cross-site scripting (XSS) has become the most common security vulnerability over the age old buffer overflow, it is basically a form of script injection into a web application.

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted code base. IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computers Trusted Computing Base(TCB).

===TCB===
The TCB is the hardware and software that is critical to the computer's security. Modern operating system/browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, the risk of having an attack get inside is greatly reduced.

=Contribution=
What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

=Critique=
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

'''!! Don't forget to erase this !!''' ''What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.''

=References=
You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.

COMP 3000 Essay 2 2010 Question 2

2010-12-02T07:39:08Z

Vviveka2:

=Paper=
'''Trust and Protection in the Illinois Browser Operating System'''

http://www.usenix.org/events/osdi10/tech/full_papers/Tang.pdf

Shuo Tang, Haohui Mai, Samuel T. King

''University of Illinois at Urbana-Champaig''

Presentation slides to go along with the paper: Trust and Protection in the Illinois Browser Operating System. http://www.cs.uiuc.edu/homes/stang6/ibos.html#slide1

==Research Problem==
Modern browsers, such as Google Chrome and Mozilla Firefox, are constantly being revised and updated to keep up with the latest attacks, but continuously have hundreds of security vulnerabilities. Most of these attacks are simple, slightly harmful assaults on web applications, but many attacks are on the browser or even the operating system/libraries. A successful attack on a browser can have horrible repercussion because these occur lower on the shared storage stack than the attacks on the applications because it gives access to all the browser data for all the web application and also provides the attacker with the access to other system resources on the system which is being exploited. An attack on the operating system can be disastrous if it is successful and may cause serious damage to the entire system this is due to the fact that the attackers can access arbitrary states and events, allowing them to have full control over the system. The focus of this research is to prevent or/and to decrease the attacks on browser, libraries, operating systems and system services.

=Background Concepts=
The Illinois Browser Operating System (IBOS) is not just a new browser to improve security, it is also a full operating system. It’s main goal is to expose browser-level abstractions at the lowest possible software layer, reducing the trusted computing base for web browsers. Many websites and web applications have become major targets for attackers and hackers. Just recently, cross-site scripting has become the most common security vulnerability over the age old buffer overflow.

Plenty of research has gone in to improving security among the various web browsers on the market today but all browsers still remain susceptible to attacks on the lower layers. Compromised Ethernet drivers can send sensitive HTTP packets to third parties, compromised storage modules can send persistent data to unwanted viewers and compromised window managers can overlay fake interfaces common in phishing attacks. Common web browsers run on top of commodity operating systems with shared system services and user-mode libraries, increasing the trusted code base. IBOS looks to solve this issue by exposing browser-level abstractions rather than just general-purpose abstractions. Important concepts such as cookies, HTTP connections and tabs for displaying pages are all brought into the browser abstraction layer. By using all of these methods, the IBOS system ultimately aims to reduce the computers Trusted Computing Base(TCB).

===TCB===
The TCB is the hardware and software that is critical to the computer's security. Modern operating system/browser combinations have massive TCBs that may have several millions of lines of code. By extracting components such as device drivers from the kernel, one can lower a systems TCB considerably. If a device driver is outside of the TCB and becomes corrupted, the effects would not be too severe, but if the driver is left in the TCB, then the results could be cataclysmic. By removing elements from the TCB, the risk of having an attack get inside is greatly reduced.

=Contribution=
What are the research contribution(s) of this work? Specifically, what are the key research results, and what do they mean? (What was implemented? Why is it any better than what came before?)

=Critique=
This paper was very well organized and executed. It naturally flows and keeps order in what it is trying to explain without the need to flip back and reference another piece of content in the paper. Starting with the core mechanics of why it is needed to how the kernel is organized and working its way up to many high-level pieces of information it felt like a natural progression of ideas, giving you the information you need to understand upcoming concepts.

'''!! Don't forget to erase this !!''' ''What is good and not-so-good about this paper? You may discuss both the style and content; be sure to ground your discussion with specific references. Simple assertions that something is good or bad is not enough - you must explain why.''

=References=
You will almost certainly have to refer to other resources; please cite these resources in the style of citation of the papers assigned (inlined numbered references). Place your bibliographic entries in this section.

Talk:COMP 3000 Essay 2 2010 Question 2

2010-11-22T17:37:51Z

Vviveka2:

=Comments & Discussion=

It seems we only have 5/7 members. We should start splitting up the tasks and assign who gets what. So if everybody writes what section they would like to work on that would be great.

--[[User:Ymoussou|Youcef M.]] 15:19, 20 November 2010 (UTC)

=Group Members=

Leave your name and e-mail address if you are assigned to this question.

[[User:Ymoussou|Youcef M.]] moussoud@gmail.com

I am alive and still in the class, selliot3@connect.carleton.ca

--[[User:Selliot3|Selliot3]] 18:12, 15 November 2010 (UTC)

Still in the class, andrewtubman84@gmail.com

[[User:Atubman|Atubman]]

I'm here. I have received an email reply from John Vanden Heuvel as well (he may not see this) gsmith0413@gmail.com
--[[User:Gsmith6|Gsmith6]] 22:31, 15 November 2010 (UTC)

[[User:vviveka2|vG]]

I am here... and replied to the email

=Question 2 members=

Elliott Charles selliot3

Moussoud Youcef ymoussou

Pharand Alexandre apharan2

Smith Geoffrey gsmith6

Tubman Andrew atubman

Vanden Heuvel John jvheuvel

Vivekanandarajah Vijitharan vviveka2

The web itself is ubiquitous which a person can use for communication; banking, business, social networking and it can be useful for other purposes. There are different type of vulnerabilities web applications, browser, OS and library vulnerabilities. Insecure web browsers are monolithic, and they are easy to exploit. Secure web browser such as chrome isolate web applications and it still contain huge trusted computing base (TCB). Browser abstractions as the first-class OS, contains reduced TCB for web browser and it also have protection to withstand attacks to most components. [[User:vviveka2|vG]]

Talk:COMP 3000 Essay 2 2010 Question 2

2010-11-16T04:15:22Z

Vviveka2:

=Group Members=

Leave your name and e-mail address if you are assigned to this question.

[[User:Ymoussou|Youcef M.]] moussoud@gmail.com

I am alive and still in the class, selliot3@connect.carleton.ca

--[[User:Selliot3|Selliot3]] 18:12, 15 November 2010 (UTC)

Still in the class, andrewtubman84@gmail.com

[[User:Atubman|Atubman]]

I'm here. I have received an email reply from John Vanden Heuvel as well (he may not see this) gsmith0413@gmail.com
--[[User:Gsmith6|Gsmith6]] 22:31, 15 November 2010 (UTC)

[[User:vviveka2|vG]]

I am here... and replied to the email

=Question 2 members=

Elliott Charles selliot3

Moussoud Youcef ymoussou

Pharand Alexandre apharan2

Smith Geoffrey gsmith6

Tubman Andrew atubman

Vanden Heuvel John jvheuvel

Vivekanandarajah Vijitharan vviveka2

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-15T13:35:51Z

Vviveka2:

== To Do ==
# Grab your references for the Essay proper, set your info to refer to the references, leave out any references we didn't use.
# Remove signatures from the Essay Proper by 10:00 (this is an arbitrary time)

== Log ==
'''Suggestion:''' Let us maintain our edits here instead of on littering the main page with our names. Also please do not edit without writing to the log so that we know who has done what and when.

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity. Everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about)
More moving for clarity. added an essay outline at bottom (feel free to change)
filled in the outline somewhat added questions to the outline for everyone to think on.--[[User:Rannath|Rannath]]

First Draft for essay. Please modify and add on. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

Edited Scheduling Priorities and rewrote some areas to provide a better paragraph structure. --[[User:Spanke|Shane]] 15:25, 13 October 2010 (UTC)

Added to the memory management section. --[[User:Hirving|Hirving]] 21:42, 13 October 2010 (UTC)

Edited Scalable Threads Problems. Also did a little re-arrangement. --[[User:Gautam|Gautam]] 01:03, 14 October 2010 (UTC)

Answered Essay Questions in Discussion. --[[User:Spanke|Shane]] 01:25, 13 October 2010 (UTC)

I posted Main point 2. It is nearing completion, --[[User:Praubic|Praubic]] 17:43, 14 October 2010 (UTC)

Added introduction and edited design and models [[vG]]

Minor edits in Scheduler part. --[[User:Gautam|Gautam]] 19:09, 14 October 2010 (UTC)

Added a paragraph about locks to memory section. --[[User:Hirving|Hirving]] 19:36, 14 October 2010 (UTC)

Proof read and edited article for clarity and grammar. (commas are nice) --[[User:Hirving|Hirving]] 19:57, 14 October 2010 (UTC)

Proof read once more. Seems good to go. And yes commas are nice :) --[[User:Gautam|Gautam]] 07:46, 15 October 2010 (UTC)

 <Add your future activities here>

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics.

To be more clear: we can limit ourselves from the thread implementations to the thread scalability... ignore the stuff that required for all threads, unless its required for many threads. (I didn't find any implementations that required hardware)

I would also argue that since OSs have to run on multiple hardwares one cannot guarantee that unique/rare hardware bits will be there. While we can talk about hardware we should limit it to a mention at most. OR we could mention prospective hardware that could help out, but is not yet standard. It depends on whether we want to do "as it is" or "as it might be"

utility of such massively scalable thread implementations. I took this as: what functionality (of single strings) does one have to give up to make threads scalable.

'''Gautam: '''
I think the hardware is as relevant as the software. Not all things can be done in software and hardware support is an important factor in most of the solutions to many problems that OS face. My take.

'''Henry: '''
Since the question is about the system as a whole, I think the answer should include both software and hardware support for large amounts of threads. The questions revolves around how a system can handle millions of threads and what are the major factors that allow the system to do it. Also, the last part of the question seems to ask what this amount of threads allows a process to do.

'''Shane: '''
In response to the above's idea on the last part of the question, I would argue that it would enable fast execution because all threads that receive a cache miss would be picked up by the other threads so long as there was enough resources. Also the use of more threads would help synchronize the cache (through sharing) so that it would not miss. Of course this would be if they were assigned to the same task, you cannot sync threads running different applications it just wouldn't make sense. The only issue with this idea is the software must support this number.

'''vG: '''
We should talk about type of relationship models (1:1 N:M N:N and so on) also talk about the application vs hardware multi-threading within single processor.

'''Paul: '''
I discussed Main Point 2 and how UMS threading is stretched onto multiple cores. Design that involves multiple processors differs from single proc comps so hardware definitely plays significant role here.

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek <support.tamiltreasure@gmail.com>

Shane Panke <shanepanke@msn.com>

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Multi-threading is a term used to describe:

* A facility provided by the operating system that enables an application to create threads of execution within a process
* Applications whose architecture takes advantage of the multi-threading provided by the operating system
[[vG]]
----
These are all related ideas.

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

One implementation of UMS is: combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5'' -[[Rannath]]

----

I would scrap the first two below, at most mention them...

#time-division multiplexing
#threads vs processes
#I/O Scheduling -[[vG]]

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
This is a '''need''' section 4 below is not '''needed''' 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes
4. Examples of systems that require many threads such as mainframe servers or banking client processing.--[[User:Praubic|Praubic]] 17:34, 11 October 2010 (UTC)

Here is an example of a design: (the topic asks for key design choices here is one)

Capriccio is a specific design for scalable user level threads. They are distinct from most designs by being independent of event based mechanisms as well as kernel thread models. They are very good choice for internet servers and this implementations could easily support 100,000 threads. They are characterized by high scalability, efficient stack management and scheduling based on resource usage however the performance is not comparable to event-based systems.--[[User:Praubic|Praubic]] 13:32, 12 October 2010 (UTC)

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step, or the last step, we can always present our info and make our thesis match the info.
#List all questions and points we have about the topic

Questions:
# What makes threads non-scalable? List the problems
# What utility do some scalable implementations lack? Why?
# Just how scalable does a full utility implementation get?

Answers:
# Memory Usage, Context Switching. Consider using a thread pool.
# Signals, portability(maybe) both add overhead which would slow down threads
# If using thread pools, the scalability is then limited to the number of threads in the pool
----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point 1 -[[Rannath]] 
- efficient thread creation/destruction is more scalable 
-- NPTL's improvements over LinuxThreads- primarily due to lower overhead of creation/destruction ''1''

Main Point 2 -[[Rannath]] 
- UMS & user-space threads are more scalable - maybe 
-- context switches are costly ''From class'' 
-- blocking locks have lower latency when twinned with a user space scheduler ''8''

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --[[User:Praubic|Praubic]] 00:18, 14 October 2010 (UTC)

Main Point 3 
- Certain bottleneck appear in scaled implementations, removing these improves scalability. 
-- "False cache-line sharing" ''14'' 
-- xtime lock to a lockless lock ''14''

Main Point 3.5 
Fine-Grain over course-grain 
-- "Big Kernel Lock" ''14'' 
-- dcache_lock ''14''

Link the Main points to the thesis

----

Conclusion
# restate info
# affirmation of thesis

Here is the first paragraph that I attempted. Please feel free to change or even delete it from here.

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
--[[User:Praubic|Praubic]] 19:04, 12 October 2010 (UTC)

we can add this for intro paragraph:

How is it possible for systems to supports millions of threads or more within a single process?

It is possible for systems to supports millions of threads or more within a single processor, it has the ability to switch execution resource between threads, thus making a concurrent execution. Concurrency is when multiple threads stays on the ques for switching but incapable of running at the same time but it has the ability to make it look like they are running at same time due to the speed they switch. [[vG]] You stated it is possible you did not state how, or rather did not make it clear. The below should be a better interpretation. --[[User:Spanke|Shane]]

Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.

Added more == vG

Process is known as an instance of a program running in a computer which has its own resources such as address space, files, I/O devices and threads on the other hand thread is similar to a process but it but it does a single operation within the process. Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. [[vG]]

----
I suggest that we start filling out the main points of the essay. We can discuss the intricacies as we go along. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6590&rep=rep1&type=pdf Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/http://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_7&action=edit&section=7hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# [http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt Interesting presentation: really worth checking out] [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)
# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83 Scalability in linux]
# [http://hillside.net/plop/2007/papers/PLoP2007_Ahluwalia.pdf This has something to do with our question...]
# [http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx Scheduling Priorities (Windows)], Microsoft (23 September 2010) --[[User:Spanke|Shane]]
# [http://www.novell.com/coolsolutions/feature/14878.html Linux Scheduling Priorities Explained], Novell (11 October 2005) --[[User:Spanke|Shane]]
# [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/ Inside the Linux 2.6 Completely Fair Scheduler], IBM (15 December 2009) --[[User:Spanke|Shane]]
#http://www.megaupload.com/?d=R4VMK3A1 (PDF Document on Multithreading) [[vG]]
# [http://www.linuxjournal.com/article/1363 what is multithreading?] [[vG]]
# [http://en.wikipedia.org/wiki/Thread_%28computer_science%29 type of threadings and multithreading in general] [[vG]]
#On the design of Chant: a talking threads package [http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=344298 http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=344298]

COMP 3000 Essay 1 2010 Question 7

2010-10-15T01:11:40Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=

== The Background ==

A 'process' is defined to be "an address-space and a group of resources dedicated to running the program". On the other hand a 'thread' is an independent sequential unit of computation that executes within the context of a kernel supported entity like a 'process'. Threads are often classified by their “weight” (or overhead), which corresponds to the amount of context that must be saved when a thread is removed from the processor, and restored when a thread is reinstated on a processor that is a context switch. The context for a process usually includes the hardware register, kernel stack, user-level stack, interrupt vectors, page tables, and more. Threads require less system resources then concurrent cooperating processes and start much easier, therefore, there may exist millions of them in a single process. Loosely based on this there are two major types of threads: kernel and user-mode. Kernel threads are usually considered heavier and designs that involve them are not very scalable. User threads, on the other hand, are mapped to kernel threads and lightwieght. The ratio of the user threads to kernel threads is an important factor when designing scalable systems.

There are a few designs, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools, which are yet another mechanism that allows for high scalability. Systems can support millions of threads within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. 

== Scalable Threads: The Problems ==

One of the basic challenges is to create code which is stable and at the same time scalable. Furthermore, the challenge in making an existing code base scalable is the identification and elimination of bottlenecks once scaled. Ray Bryant and John Hawkes found the following bottlenecks when porting Linux to a 64-core NUMA system. Each of these bottle necks are an example of a type of bottleneck that can appear in any program.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83]

When expensive operations are '''needlessly called''' one type of bottleneck appears. In Linux, there can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is expensive when compared to what would happen if the information was in the 'right place'. Once the misplaced information that causes this problem all the time is identified it can be moved to limit the problem. Anywhere expensive operations are called a needless number of times, this bottleneck can appear (this problem is not inherent, but is a result of bad-design).

Another type of bottleneck is from '''starvation.''' An example of one such bottleneck is the xtime_lock in Linux. Having locking read prevented writing to the timer value, causing the kernel to waste CPU time to keep trying. This problem was solved by using a lockless-read. This problem would appear anywhere that a thread must keep trying to execute, but cannot, leading to wasted CPU cycles.

The next type of bottleneck is from '''course-grained''' operations. Granularity refers to the execution time of a code segment. Both examples eat alot of CPU time, where a finer-grained implementation would eat less. The closer a segment is to the speed of an atomic action the finer its granularity. One course-grained bottleneck was the dcache_lock. It ate up some time in normal use, but it was also called in the much more popular dnotify_parent() function. This was deemed an unacceptable state of affairs. So, the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux. Another big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottleneck. Both those examples are the result of course granularity.

Bottlenecks can be from '''multiple problems.''' One example of that is the multiqueue scheduler from linux 2.4. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems: the spinlock ate up a fair majority of the CPU time, it was course-grained. While, the rest went into computing and recomputing information in the cache, a needless expensive operation. The Scheduler also had O(n) time complexity which essentially meant that the scheduler had scalability issues and would become inefficient after a particular number of processes. These problems were fixed by replacing the scheduler (That scheduler was then replaced by a more efficient scheduler with a O(1)time complexity which meant that any number of threads/processes could be scheduled without any overhead.

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements that handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Fibers embrace essentially the same abstraction as coroutines. The distinction emerges from the fact that fibers are on the system level while coroutines execute on the language level. Unlike UMS, fibers do not utilize multiprocessor machines, however, they require less operating system support. Symbian Operating System presents an example of fibers usage in its Active Scheduler. An object of active scheduler contains a single fiber that is scheduled when an asynchronous call returns and blocks lower priority fibers until all above are finished.

Thread Pools consist of queues of threads that stay open and await new tasks to become assigned to them. If there is no new tasks to be completed, they sleep or wait. This pattern eliminates the overhead of creation and destruction of threads which reflects in better system stability and improved performance. The long living threads can, for instance, handle multiple transaction requests from socket connection from other machines over a short time frame while at the same time avoiding the millions of cycles to drop/reestablish a thread. Often, thread pool operate on server farms and therefore thread-safety has to be carefully implemented.

== Design Choices ==
'''(A) Kernel Threads and User Threads (1:1 vs M:N) '''
This is the most basic design and the lightweight process. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Management and scheduling is done through thread management. [[vG]] Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model. Thread aware operating system is found on Windows XP, Windows 2000, Windows Vista and any latest operating system. [[vG]]

''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.

The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.

Locks are an important part of the performance of threads and there are multiple way of controlling and creating locks in order to create a large amount of threads. Single lock (having the data structures all be in one lock) has the advantage that once the processor has acquired the lock it can modify any of the stored data. Using the single lock method means only one lock is needed per thread, decreasing the thread overhead but this also limits the throughput of the system. Multiple lock (having each data structure have it's own lock) has the advantage of that each action on the data structure is it's own locking/unlocking operations. Multiple has greater thread overhead (because there are more locks) but the thread throughput is much higher allowing for fast creation of threads. Another downside of multiple lock systems are deadlocks, a deadlock happens when two different threads are waiting for data that the other task holds. Single and multiple lock systems are the inverse of each other and using both depending on the situation can greatly increase the performance of a system.

The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]

''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency.[http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx][http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/], Microsoft (23 September 2010) --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==
Linux Symposium, pg83 [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83] 
PicoThreads: Lightweight Threads in Java [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf]

COMP 3000 Essay 1 2010 Question 7

2010-10-15T00:35:32Z

Vviveka2: /* Design Choices */

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=

== The Background ==

A 'process' is defined to be "an address-space and a group of resources dedicated to running the program". On the other hand a 'thread' is an independent sequential unit of computation that executes within the context of a kernel supported entity like a 'process'. Threads are often classified by their “weight” (or overhead), which corresponds to the amount of context that must be saved when a thread is removed from the processor, and restored when a thread is reinstated on a processor that is a context switch. The context for a process usually includes the hardware register, kernel stack, user-level stack, interrupt vectors, page tables, and more. Threads require less system resources then concurrent cooperating processes and start much easier, therefore, there may exist millions of them in a single process. Loosely based on this there are two major types of threads: kernel and user-mode. Kernel threads are usually considered heavier and designs that involve them are not very scalable. User threads, on the other hand, are mapped to kernel threads and lightwieght. The ratio of the user threads to kernel threads is an important factor when designing scalable systems.

There are a few designs, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools, which are yet another mechanism that allows for high scalability. Systems can support millions of threads within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. [[vG]] && [[Paul]] && [[Shane]] && [[Gautam]]

== Scalable Threads: The Problems ==

One of the basic challenges is to create code which is stable and at the same time scalable. Furthermore, the challenge in making an existing code base scalable is the identification and elimination of bottlenecks once scaled. Ray Bryant and John Hawkes found the following bottlenecks when porting Linux to a 64-core NUMA system. Each of these bottle necks are an example of a type of bottleneck that can appear in any program.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83]

When expensive operations are '''needlessly called''' one type of bottleneck appears. In Linux, there can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is expensive when compared to what would happen if the information was in the 'right place'. Once the misplaced information that causes this problem all the time is identified it can be moved to limit the problem. Anywhere expensive operations are called a needless number of times, this bottleneck can appear (this problem is not inherent, but is a result of bad-design).

Another type of bottleneck is from '''starvation.''' An example of one such bottleneck is the xtime_lock in Linux. Having locking read prevented writing to the timer value, causing the kernel to waste CPU time to keep trying. This problem was solved by using a lockless-read. This problem would appear anywhere that a thread must keep trying to execute, but cannot, leading to wasted CPU cycles.

The next type of bottleneck is from '''course-grained''' operations. Granularity refers to the execution time of a code segment. Both examples eat alot of CPU time, where a finer-grained implementation would eat less. The closer a segment is to the speed of an atomic action the finer its granularity. One course-grained bottleneck was the dcache_lock. It ate up some time in normal use, but it was also called in the much more popular dnotify_parent() function. This was deemed an unacceptable state of affairs. So, the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux. Another big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottleneck. Both those examples are the result of course granularity.

Bottlenecks can be from '''multiple problems.''' One example of that is the multiqueue scheduler from linux 2.4. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems: the spinlock ate up a fair majority of the CPU time, it was course-grained. While, the rest went into computing and recomputing information in the cache, a needless expensive operation. The Scheduler also had O(n) time complexity which essentially meant that the scheduler had scalability issues and would become inefficient after a particular number of processes. These problems were fixed by replacing the scheduler (That scheduler was then replaced by a more efficient scheduler with a O(1)time complexity which meant that any number of threads/processes could be scheduled without any overhead.
--[[Rannath]] A few additions--[[Gautam]] --Cache-coherency is not the important part --[[Rannath]]

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements that handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Fibers embrace essentially the same abstraction as coroutines. The distinction emerges from the fact that fibers are on the system level while coroutines execute on the language level. Unlike UMS, fibers do not utilize multiprocessor machines, however, they require less operating system support. Symbian Operating System presents an example of fibers usage in its Active Scheduler. An object of active scheduler contains a single fiber that is scheduled when an asynchronous call returns and blocks lower priority fibers until all above are finished.

Thread Pools consist of queues of threads that stay open and await new tasks to become assigned to them. If there is no new tasks to be completed, they sleep or wait. This pattern eliminates the overhead of creation and destruction of threads which reflects in better system stability and improved performance. The long living threads can, for instance, handle multiple transaction requests from socket connection from other machines over a short time frame while at the same time avoiding the millions of cycles to drop/reestablish a thread. Often, thread pool operate on server farms and therefore thread-safety has to be carefully implemented.

== Design Choices ==
--[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
'''(A) Kernel Threads and User Threads (1:1 vs M:N) '''
This is the most basic design and the lightweight process. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Management and scheduling is done through thread management. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model. Thread aware operating system is found on Windows XP, Windows 2000, Windows Vista and any latest operating system.

''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.

The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.

Locks are an important part of the performance of threads and there are multiple way of controlling and creating locks in order to create a large amount of threads. Single lock (having the data structures all be in one lock) has the advantage that once the processor has acquired the lock it can modify any of the stored data. Using the single lock method means only one lock is needed per thread, decreasing the thread overhead but this also limits the throughput of the system. Multiple lock (having each data structure have it's own lock) has the advantage of that each action on the data structure is it's own locking/unlocking operations. Multiple has greater thread overhead (because there are more locks) but the thread throughput is much higher allowing for fast creation of threads. Another downside of multiple lock systems are deadlocks, a deadlock happens when two different threads are waiting for data that the other task holds. Single and multiple lock systems are the inverse of each other and using both depending on the situation can greatly increase the performance of a system.

The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]

''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency.[http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx][http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/], Microsoft (23 September 2010) --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==
Linux Symposium, pg83 [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83] 
PicoThreads: Lightweight Threads in Java [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf]

COMP 3000 Essay 1 2010 Question 7

2010-10-14T23:41:16Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=

== The Background ==

A 'process' is defined to be "an address-space and a group of resources dedicated to running the program". On the other hand a 'thread' is an independent sequential unit of computation that executes within the context of a kernel supported entity like a 'process'. Threads are often classified by their “weight” (or overhead), which corresponds to the amount of context that must be saved when a thread is removed from the processor, and restored when a thread is reinstated on a processor that is a context switch. The context for a process usually includes the hardware register, kernel stack, user-level stack, interrupt vectors, page tables, and more. Threads require less system resources then concurrent cooperating processes and start much easier, therefore, there may exist millions of them in a single process. Loosely based on this there are two major types of threads: kernel and user-mode. Kernel threads are usually considered heavier and designs that involve them are not very scalable. User threads, on the other hand, are mapped to kernel threads and lightwieght. The ratio of the user threads to kernel threads is an important factor when designing scalable systems.

There are a few designs, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools, which are yet another mechanism that allows for high scalability. Systems can support millions of threads within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. [[vG]] && [[Paul]] && [[Shane]] && [[Gautam]]

== Scalable Threads: The Problems ==

One of the basic challenges is to create code which is stable and at the same time scalable. Furthermore, the challenge in making an existing code base scalable is the identification and elimination of bottlenecks once scaled. Ray Bryant and John Hawkes found the following bottlenecks when porting Linux to a 64-core NUMA system. Each of these bottle necks are an example of a type of bottleneck that can appear in any program.[http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83]

When expensive operations are '''needlessly called''' one type of bottleneck appears. In Linux, there can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is expensive when compared to what would happen if the information was in the 'right place'. Once the misplaced information that causes this problem all the time is identified it can be moved to limit the problem. Anywhere expensive operations are called a needless number of times, this bottleneck can appear (this problem is not inherent, but is a result of bad-design).

Another type of bottleneck is from '''starvation.''' An example of one such bottleneck is the xtime_lock in Linux. Having locking read prevented writing to the timer value, causing the kernel to waste CPU time to keep trying. This problem was solved by using a lockless-read. This problem would appear anywhere that a thread must keep trying to execute, but cannot, leading to wasted CPU cycles.

The next type of bottleneck is from '''course-grained''' operations. Granularity refers to the execution time of a code segment. Both examples eat alot of CPU time, where a finer-grained implementation would eat less. The closer a segment is to the speed of an atomic action the finer its granularity. One course-grained bottleneck was the dcache_lock. It ate up some time in normal use, but it was also called in the much more popular dnotify_parent() function. This was deemed an unacceptable state of affairs. So, the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux. Another big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottleneck. Both those examples are the result of course granularity.

Bottlenecks can be from '''multiple problems.''' One example of that is the multiqueue scheduler from linux 2.4. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems: the spinlock ate up a fair majority of the CPU time, it was course-grained. While, the rest went into computing and recomputing information in the cache, a needless expensive operation. The Scheduler also had O(n) time complexity which essentially meant that the scheduler had scalability issues and would become inefficient after a particular number of processes. These problems were fixed by replacing the scheduler (That scheduler was then replaced by a more efficient scheduler with a O(1)time complexity which meant that any number of threads/processes could be scheduled without any overhead.
--[[Rannath]] A few additions--[[Gautam]] --Cache-coherency is not the important part --[[Rannath]]

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements that handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Fibers embrace essentially the same abstraction as coroutines. The distinction emerges from the fact that fibers are on the system level while coroutines execute on the language level. Unlike UMS, fibers do not utilize multiprocessor machines, however, they require less operating system support. Symbian Operating System presents an example of fibers usage in its Active Scheduler. An object of active scheduler contains a single fiber that is scheduled when an asynchronous call returns and blocks lower priority fibers until all above are finished.

Thread Pools consist of queues of threads that stay open and await new tasks to become assigned to them. If there is no new tasks to be completed, they sleep or wait. This pattern eliminates the overhead of creation and destruction of threads which reflects in better system stability and improved performance. The long living threads can, for instance, handle multiple transaction requests from socket connection from other machines over a short time frame while at the same time avoiding the millions of cycles to drop/reestablish a thread. Often, thread pool operate on server farms and therefore thread-safety has to be carefully implemented.

== Design Choices ==
--[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
'''(A) Kernel Threads and User Threads (1:1 vs M:N) '''
This is the most basic design and the lightweight process. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model. Thread aware operating system is found on Windows XP, Windows 2000, Windows Vista and any latest operating system.

''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.

The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.

Locks are an important part of the performance of threads and there are multiple way of controlling and creating locks in order to create a large amount of threads. Single lock (having the data structures all be in one lock) has the advantage that once the processor has acquired the lock it can modify any of the stored data. Using the single lock method means only one lock is needed per thread, decreasing the thread overhead but this also limits the throughput of the system. Multiple lock (having each data structure have it's own lock) has the advantage of that each action on the data structure is it's own locking/unlocking operations. Multiple has greater thread overhead (because there are more locks) but the thread throughput is much higher allowing for fast creation of threads. Another downside of multiple lock systems are deadlocks, a deadlock happens when two different threads are waiting for data that the other task holds. Single and multiple lock systems are the inverse of each other and using both depending on the situation can greatly increase the performance of a system.

The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]

''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency.[http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx][http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/], Microsoft (23 September 2010) --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==
Linux Symposium, pg83 [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83] 
PicoThreads: Lightweight Threads in Java [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf]

COMP 3000 Essay 1 2010 Question 7

2010-10-14T15:41:09Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=

Process is known as an instance of a program running in a computer which has its own resources such as address space, files, I/O devices and thread on the other hand is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously and it can either execute the same code or a different code within the same application because it has its own state, run-time stack and execution context. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable. User threads on the other hand, are mapped to kernel threads by the threads library such as libpthreads. There are a few designs that incorporate it, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools which are yet another mechanism that allows for high scalability. Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. [[vG]] && [[Paul]]

'''Taken the liberty to add Praubic's tentative first para. ''' and '''i have added my version to pauls and modified it [[vG]]'''

----

== Scalable Threads: The Problems ==

One of the challenges in making an existing code base scalable is the identification and elimination of bottlenecks once scaled. When porting Linux to a 64-core NUMA system Ray Bryant and John Hawkes found the following bottlenecks (or just wrote a paper about them). Each of these bottle necks is an example of a type of bottleneck that can appear in any program.

When expensive operations are '''needlessly called''' one type of bottleneck appears. In Linux there can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is expensive compared to what would happen if the information was in the 'right place'. Once misplaced information that causes this problem all the time is identified it can be moved to limit the problem. Anywhere expensive operations are called a needless number of times this bottleneck can appear (this problem is not inherent, but is a result of bad-design).

Another type of bottleneck is from '''starvation.''' One such bottleneck is the xtime_lock in Linux. Having locking reading prevented writing to the timer value, causing the kernel to waste CPU time to keep trying. This problem was solved by using a lockless-read. This problem would appear anywhere that a thread must keep trying to execute, but cannot, leading to wasted CPU cycles.

The next type of bottleneck is from '''course-grained''' operations. Granularity refers to the execution time of a code segment. Both examples eat alot of CPU time, where a finer-grained implementation would eat less. The closer a segment is to the speed of an atomic action the finer its granularity. One course-grained bottleneck was the dcache_lock. It ate up some time in normal use, but it was also called in the much more popular dnotify_parent() function. That was an unacceptable state of affairs. So the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux. Another big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottleneck. Both those examples are the result of course granularity.

Bottlenecks can be from '''multiple problems.''' One example of that is the multiqueue scheduler from linux 2.4. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems. The spinlock ate up a fair majority of the CPU time, it was course-grained. While, the rest went into computing and recomputing information in the cache, a needless expensive operation. These problems were fixed by replacing the scheduler (That scheduler was then replaced by a more efficient scheduler [O(1) scheduler]).

--[[Rannath]]

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements which handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Fibers embrace essentially the same abstraction as coroutines. The distinction emerges from the fact that fibers are on the system level while coroutines execute on the language level. Unlike UMS, fibers do not utilize multiprocessor machines however they require less operating system support. Symbian Operating System presents an example of fibers usage in its Active Scheduler. An object of active scheduler contains a single fiber that is scheduled when an asynchronous call returns and blocks lower priority fibers until all above are finished.

Thread Pools consist of queues of threads that stay open and await new tasks to become assigned to them. If there is no new tasks to be completed they sleep or wait.This pattern eliminates the overhead of creation and destruction of threads which reflects in better system stability and improved performance. The long living threads can for instance handle multiple transaction requests from socket connection from other machines over a short time frame while at the same time avoiding the millions of cycles to drop/reestablish a thread. Often, thread pool operate on server farms and therefore thread-safety has to be carefully implemented.

== Design Choices ==
'''(A) Kernel Threads and User Threads (1:1 vs M:N) ''' --[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
This is the most basic design choice. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model.
''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.

The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads that are in contention), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.

The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]
''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency. --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==

COMP 3000 Essay 1 2010 Question 7

2010-10-14T15:17:51Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=

Process is known as an instance of a program running in a computer which has its own resources such as address space, files, I/O devices and thread on the other hand is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously and it can either execute the same code or a different code within the same application because it has its own state, run-time stack and execution context. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable. User threads on the other hand, are mapped to kernel threads by the threads library such as libpthreads. There are a few designs that incorporate it, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools which are yet another mechanism that allows for high scalability. Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. [[vG]] && [[Paul]]

'''Taken the liberty to add Praubic's tentative first para. No changes made as of yet.''' and '''i have added my version to pauls and modified it [[vG]]'''

----

== Scalable Threads: The Problems ==

One of the challenges in making an existing code base scalable is the identification and elimination of bottlenecks once scaled. When porting Linux to a 64-core NUMA system Ray Bryant and John Hawkes found the following bottlenecks (or just wrote a paper about them). Each of these bottle necks is an example of a type of bottleneck that can appear in any program.

When expensive operations are '''needlessly called''' one type of bottleneck appears. In Linux there can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is expensive compared to what would happen if the information was in the 'right place'. Once misplaced information that causes this problem all the time is identified it can be moved to limit the problem. Anywhere expensive operations are called a needless number of times this bottleneck can appear (this problem is not inherent, but is a result of bad-design).

Another type of bottleneck is from '''starvation.''' One such bottleneck is the xtime_lock in Linux. Having locking reading prevented writing to the timer value, causing the kernel to waste CPU time to keep trying. This problem was solved by using a lockless-read. This problem would appear anywhere that a thread must keep trying to execute, but cannot, leading to wasted CPU cycles.

The next type of bottleneck is from '''course-grained''' operations. Granularity refers to the execution time of a code segment. Both examples eat alot of CPU time, where a finer-grained implementation would eat less. The closer a segment is to the speed of an atomic action the finer its granularity. One course-grained bottleneck was the dcache_lock. It ate up some time in normal use, but it was also called in the much more popular dnotify_parent() function. That was an unacceptable state of affairs. So the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux. Another big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottleneck. Both those examples are the result of course granularity.

Bottlenecks can be from '''multiple problems.''' One example of that is the multiqueue scheduler from linux 2.4. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems. The spinlock ate up a fair majority of the CPU time, it was course-grained. While, the rest went into computing and recomputing information in the cache, a needless expensive operation. These problems were fixed by replacing the scheduler (That scheduler was then replaced by a more efficient scheduler [O(1) scheduler]).

--[[Rannath]]

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements which handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Fibers embrace essentially the same abstraction as coroutines. The distinction emerges from the fact that fibers are on the system level while coroutines execute on the language level. Unlike UMS, fibers do not utilize multiprocessor machines however they require less operating system support. Symbian Operating System presents an example of fibers usage in its Active Scheduler. An object of active scheduler contains a single fiber that is scheduled when an asynchronous call returns and blocks lower priority fibers until all above are finished.

Thread Pools consist of queues of threads that stay open and await new tasks to become assigned to them. If there is no new tasks to be completed they sleep or wait.This pattern eliminates the overhead of creation and destruction of threads which reflects in better system stability and improved performance. The long living threads can for instance handle multiple transaction requests from socket connection from other machines over a short time frame while at the same time avoiding the millions of cycles to drop/reestablish a thread. Often, thread pool operate on server farms and therefore thread-safety has to be carefully implemented.

== Design Choices ==
'''(A) Kernel Threads and User Threads (1:1 vs M:N) ''' --[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
This is the most basic design choice. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model.
''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.

The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads that are in contention), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.

The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]
''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency. --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==

COMP 3000 Essay 1 2010 Question 7

2010-10-14T15:13:00Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=

Process is known as an instance of a program running in a computer which has its own resources such as address space, files, I/O devices and thread on the other hand is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable. User threads on the other hand, are mapped to kernel threads by the threads library such as libpthreads. There are a few designs that incorporate it, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools which are yet another mechanism that allows for high scalability. Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. [[vG]] && [[Paul]]

'''Taken the liberty to add Praubic's tentative first para. No changes made as of yet.''' and '''i have added my version to pauls and modified it [[vG]]'''

----

== Scalable Threads: The Problems ==

One of the challenges in making an existing code base scalable is the identification and elimination of bottlenecks once scaled. When porting Linux to a 64-core NUMA system Ray Bryant and John Hawkes found the following bottlenecks (or just wrote a paper about them). Each of these bottle necks is an example of a type of bottleneck that can appear in any program.

When expensive operations are '''needlessly called''' one type of bottleneck appears. In Linux there can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is expensive compared to what would happen if the information was in the 'right place'. Once misplaced information that causes this problem all the time is identified it can be moved to limit the problem. Anywhere expensive operations are called a needless number of times this bottleneck can appear (this problem is not inherent, but is a result of bad-design).

Another type of bottleneck is from '''starvation.''' One such bottleneck is the xtime_lock in Linux. Having locking reading prevented writing to the timer value, causing the kernel to waste CPU time to keep trying. This problem was solved by using a lockless-read. This problem would appear anywhere that a thread must keep trying to execute, but cannot, leading to wasted CPU cycles.

The next type of bottleneck is from '''course-grained''' operations. Granularity refers to the execution time of a code segment. Both examples eat alot of CPU time, where a finer-grained implementation would eat less. The closer a segment is to the speed of an atomic action the finer its granularity. One course-grained bottleneck was the dcache_lock. It ate up some time in normal use, but it was also called in the much more popular dnotify_parent() function. That was an unacceptable state of affairs. So the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux. Another big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottleneck. Both those examples are the result of course granularity.

Bottlenecks can be from '''multiple problems.''' One example of that is the multiqueue scheduler from linux 2.4. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems. The spinlock ate up a fair majority of the CPU time, it was course-grained. While, the rest went into computing and recomputing information in the cache, a needless expensive operation. These problems were fixed by replacing the scheduler (That scheduler was then replaced by a more efficient scheduler [O(1) scheduler]).

--[[Rannath]]

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements which handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Fibers embrace essentially the same abstraction as coroutines. The distinction emerges from the fact that fibers are on the system level while coroutines execute on the language level. Unlike UMS, fibers do not utilize multiprocessor machines however they require less operating system support. Symbian Operating System presents an example of fibers usage in its Active Scheduler. An object of active scheduler contains a single fiber that is scheduled when an asynchronous call returns and blocks lower priority fibers until all above are finished.

Thread Pools consist of queues of threads that stay open and await new tasks to become assigned to them. If there is no new tasks to be completed they sleep or wait.This pattern eliminates the overhead of creation and destruction of threads which reflects in better system stability and improved performance. The long living threads can for instance handle multiple transaction requests from socket connection from other machines over a short time frame while at the same time avoiding the millions of cycles to drop/reestablish a thread. Often, thread pool operate on server farms and therefore thread-safety has to be carefully implemented.

== Design Choices ==
'''(A) Kernel Threads and User Threads (1:1 vs M:N) ''' --[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
This is the most basic design choice. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model.
''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.

The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads that are in contention), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.

The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]
''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency. --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-14T15:09:23Z

Vviveka2:

== Log ==
'''Suggestion:''' Let us maintain our edits here instead of on littering the main page with our names. Also please do not edit without writing to the log so that we know who has done what and when.

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity. Everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about)
More moving for clarity. added an essay outline at bottom (feel free to change)
filled in the outline somewhat added questions to the outline for everyone to think on.--[[User:Rannath|Rannath]]

First Draft for essay. Please modify and add on. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

Edited Scheduling Priorities and rewrote some areas to provide a better paragraph structure. --[[User:Spanke|Shane]] 15:25, 13 October 2010 (UTC)

Added to the memory management section. --[[User:Hirving|Hirving]] 21:42, 13 October 2010 (UTC)

Edited Scalable Threads Problems. Also did a little re-arrangement. --[[User:Gautam|Gautam]] 01:03, 14 October 2010 (UTC)

Answered Essay Questions in Discussion. --[[User:Spanke|Shane]] 01:25, 13 October 2010 (UTC)

 <Add your future activities here>

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics.

To be more clear: we can limit ourselves from the thread implementations to the thread scalability... ignore the stuff that required for all threads, unless its required for many threads. (I didn't find any implementations that required hardware)

I would also argue that since OSs have to run on multiple hardwares one cannot guarantee that unique/rare hardware bits will be there. While we can talk about hardware we should limit it to a mention at most. OR we could mention prospective hardware that could help out, but is not yet standard. It depends on whether we want to do "as it is" or "as it might be"

utility of such massively scalable thread implementations. I took this as: what functionality (of single strings) does one have to give up to make threads scalable.

'''Gautam: '''
I think the hardware is as relevant as the software. Not all things can be done in software and hardware support is an important factor in most of the solutions to many problems that OS face. My take.

'''Henry: '''
Since the question is about the system as a whole, I think the answer should include both software and hardware support for large amounts of threads. The questions revolves around how a system can handle millions of threads and what are the major factors that allow the system to do it. Also, the last part of the question seems to ask what this amount of threads allows a process to do.

'''Shane: '''
In response to the above's idea on the last part of the question, I would argue that it would enable fast execution because all threads that receive a cache miss would be picked up by the other threads so long as there was enough resources. Also the use of more threads would help synchronize the cache (through sharing) so that it would not miss. Of course this would be if they were assigned to the same task, you cannot sync threads running different applications it just wouldn't make sense. The only issue with this idea is the software must support this number.

'''vG: '''
We should talk about type of relationship models (1:1 N:M N:N and so on) also talk about the application vs hardware multi-threading within single processor.

'''Paul: '''
I discussed Main Point 2 and how UMS threading is stretched onto multiple cores. Design that involves multiple processors differs from single proc comps so hardware definitely plays significant role here.

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek <support.tamiltreasure@gmail.com>

Shane Panke <shanepanke@msn.com>

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Multi-threading is a term used to describe:

* A facility provided by the operating system that enables an application to create threads of execution within a process
* Applications whose architecture takes advantage of the multi-threading provided by the operating system
[[vG]]
----
These are all related ideas.

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

One implementation of UMS is: combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5'' -[[Rannath]]

----

I would scrap the first two below, at most mention them...

#time-division multiplexing
#threads vs processes
#I/O Scheduling -[[vG]]

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
This is a '''need''' section 4 below is not '''needed''' 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes
4. Examples of systems that require many threads such as mainframe servers or banking client processing.--[[User:Praubic|Praubic]] 17:34, 11 October 2010 (UTC)

Here is an example of a design: (the topic asks for key design choices here is one)

Capriccio is a specific design for scalable user level threads. They are distinct from most designs by being independent of event based mechanisms as well as kernel thread models. They are very good choice for internet servers and this implementations could easily support 100,000 threads. They are characterized by high scalability, efficient stack management and scheduling based on resource usage however the performance is not comparable to event-based systems.--[[User:Praubic|Praubic]] 13:32, 12 October 2010 (UTC)

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step, or the last step, we can always present our info and make our thesis match the info.
#List all questions and points we have about the topic

Questions:
# What makes threads non-scalable? List the problems
# What utility do some scalable implementations lack? Why?
# Just how scalable does a full utility implementation get?

Answers:
# Memory Usage, Context Switching. Consider using a thread pool.
# Signals, portability(maybe) both add overhead which would slow down threads
# If using thread pools, the scalability is then limited to the number of threads in the pool
----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point 1 -[[Rannath]] 
- efficient thread creation/destruction is more scalable 
-- NPTL's improvements over LinuxThreads- primarily due to lower overhead of creation/destruction ''1''

Main Point 2 -[[Rannath]] 
- UMS & user-space threads are more scalable - maybe 
-- context switches are costly ''From class'' 
-- blocking locks have lower latency when twinned with a user space scheduler ''8''

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --[[User:Praubic|Praubic]] 00:18, 14 October 2010 (UTC)

Main Point 3 
- Certain bottleneck appear in scaled implementations, removing these improves scalability. 
-- "False cache-line sharing" ''14'' 
-- xtime lock to a lockless lock ''14''

Main Point 3.5 
Fine-Grain over course-grain 
-- "Big Kernel Lock" ''14'' 
-- dcache_lock ''14''

Link the Main points to the thesis

----

Conclusion
# restate info
# affirmation of thesis

Here is the first paragraph that I attempted. Please feel free to change or even delete it from here.

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
--[[User:Praubic|Praubic]] 19:04, 12 October 2010 (UTC)

we can add this for intro paragraph:

How is it possible for systems to supports millions of threads or more within a single process?

It is possible for systems to supports millions of threads or more within a single processor, it has the ability to switch execution resource between threads, thus making a concurrent execution. Concurrency is when multiple threads stays on the ques for switching but incapable of running at the same time but it has the ability to make it look like they are running at same time due to the speed they switch. [[vG]] You stated it is possible you did not state how, or rather did not make it clear. The below should be a better interpretation. --[[User:Spanke|Shane]]

Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.

Added more == vG

Process is known as an instance of a program running in a computer which has its own resources such as address space, files, I/O devices and threads on the other hand thread is similar to a process but it but it does a single operation within the process. Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at. [[vG]]

----
I suggest that we start filling out the main points of the essay. We can discuss the intricacies as we go along. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6590&rep=rep1&type=pdf Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/http://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_7&action=edit&section=7hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# [http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt Interesting presentation: really worth checking out] [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)
# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83 Scalability in linux]
# [http://hillside.net/plop/2007/papers/PLoP2007_Ahluwalia.pdf This has something to do with our question...]
# [http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx Scheduling Priorities (Windows)], Microsoft (23 September 2010) --[[User:Spanke|Shane]]
# [http://www.novell.com/coolsolutions/feature/14878.html Linux Scheduling Priorities Explained], Novell (11 October 2005) --[[User:Spanke|Shane]]
# [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/ Inside the Linux 2.6 Completely Fair Scheduler], IBM (15 December 2009) --[[User:Spanke|Shane]]
#http://www.megaupload.com/?d=R4VMK3A1 (PDF Document on Multithreading) [[vG]]
# [http://www.linuxjournal.com/article/1363 what is multithreading?] [[vG]]
# [http://en.wikipedia.org/wiki/Thread_%28computer_science%29 type of threadings and multithreading in general] [[vG]]

COMP 3000 Essay 1 2010 Question 7

2010-10-14T15:05:49Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=
A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable. User threads on the other hand, are mapped to kernel threads by the threads library such as libpthreads. There are a few designs that incorporate it, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools which are yet another mechanism that allows for high scalability. 
'''Taken the liberty to add Praubic's tentative first para. No changes made as of yet.'''

I have the following... feel free to change :)

Process is known as an instance of a program running in a computer which has its own resources such as address space, files, I/O devices and threads on the other hand thread is similar to a process but it but it does a single operation within the process. Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.
[[vG]] Edited by : [[User:Spanke|Shane]]

----

== Scalable Threads: The Problems ==

One of the challenges in making an existing code base scalable is the identification and elimination of bottlenecks once scaled. When porting Linux to a 64-core NUMA system Ray Bryant and John Hawkes found the following bottlenecks (or just wrote a paper about them). Each of these bottle necks is an example of a type of bottleneck that can appear in any program.

When expensive operations are '''needlessly called''' one type of bottleneck appears. In Linux there can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is expensive compared to what would happen if the information was in the 'right place'. Once misplaced information that causes this problem all the time is identified it can be moved to limit the problem. Anywhere expensive operations are called a needless number of times this bottleneck can appear (this problem is not inherent, but is a result of bad-design).

Another type of bottleneck is from '''starvation.''' One such bottleneck is the xtime_lock in Linux. Having locking reading prevented writing to the timer value, causing the kernel to waste CPU time to keep trying. This problem was solved by using a lockless-read. This problem would appear anywhere that a thread must keep trying to execute, but cannot, leading to wasted CPU cycles.

The next type of bottleneck is from '''course-grained''' operations. Granularity refers to the execution time of a code segment. Both examples eat alot of CPU time, where a finer-grained implementation would eat less. The closer a segment is to the speed of an atomic action the finer its granularity. One course-grained bottleneck was the dcache_lock. It ate up some time in normal use, but it was also called in the much more popular dnotify_parent() function. That was an unacceptable state of affairs. So the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux. Another big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottleneck. Both those examples are the result of course granularity.

Bottlenecks can be from '''multiple problems.''' One example of that is the multiqueue scheduler from linux 2.4. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems. The spinlock ate up a fair majority of the CPU time, it was course-grained. While, the rest went into computing and recomputing information in the cache, a needless expensive operation. These problems were fixed by replacing the scheduler (That scheduler was then replaced by a more efficient scheduler [O(1) scheduler]).

--[[Rannath]]

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements which handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Fibers embrace essentially the same abstraction as coroutines. The distinction emerges from the fact that fibers are on the system level while coroutines execute on the language level. Unlike UMS, fibers do not utilize multiprocessor machines however they require less operating system support. Symbian Operating System presents an example of fibers usage in its Active Scheduler. An object of active scheduler contains a single fiber that is scheduled when an asynchronous call returns and blocks lower priority fibers until all above are finished.

Thread Pools consist of queues of threads that stay open and await new tasks to become assigned to them. If there is no new tasks to be completed they sleep or wait.This pattern eliminates the overhead of creation and destruction of threads which reflects in better system stability and improved performance. The long living threads can for instance handle multiple transaction requests from socket connection from other machines over a short time frame while at the same time avoiding the millions of cycles to drop/reestablish a thread. Often, thread pool operate on server farms and therefore thread-safety has to be carefully implemented.

== Design Choices ==
'''(A) Kernel Threads and User Threads (1:1 vs M:N) ''' --[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
This is the most basic design choice. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model.
''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.

The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads that are in contention), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.

The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]
''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency. --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-14T07:04:00Z

Vviveka2:

== Log ==
'''Suggestion:''' Let us maintain our edits here instead of on littering the main page with our names. Also please do not edit without writing to the log so that we know who has done what and when.

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity. Everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about)
More moving for clarity. added an essay outline at bottom (feel free to change)
filled in the outline somewhat added questions to the outline for everyone to think on.--[[User:Rannath|Rannath]]

First Draft for essay. Please modify and add on. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

Edited Scheduling Priorities and rewrote some areas to provide a better paragraph structure. --[[User:Spanke|Shane]] 15:25, 13 October 2010 (UTC)

Added to the memory management section. --[[User:Hirving|Hirving]] 21:42, 13 October 2010 (UTC)

Edited Scalable Threads Problems. Also did a little re-arrangement. --[[User:Gautam|Gautam]] 01:03, 14 October 2010 (UTC)

Answered Essay Questions in Discussion. --[[User:Spanke|Shane]] 01:25, 13 October 2010 (UTC)

 <Add your future activities here>

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics.

To be more clear: we can limit ourselves from the thread implementations to the thread scalability... ignore the stuff that required for all threads, unless its required for many threads. (I didn't find any implementations that required hardware)

I would also argue that since OSs have to run on multiple hardwares one cannot guarantee that unique/rare hardware bits will be there. While we can talk about hardware we should limit it to a mention at most. OR we could mention prospective hardware that could help out, but is not yet standard. It depends on whether we want to do "as it is" or "as it might be"

utility of such massively scalable thread implementations. I took this as: what functionality (of single strings) does one have to give up to make threads scalable.

'''Gautam: '''
I think the hardware is as relevant as the software. Not all things can be done in software and hardware support is an important factor in most of the solutions to many problems that OS face. My take.

'''Henry: '''
Since the question is about the system as a whole, I think the answer should include both software and hardware support for large amounts of threads. The questions revolves around how a system can handle millions of threads and what are the major factors that allow the system to do it. Also, the last part of the question seems to ask what this amount of threads allows a process to do.

'''Shane: '''
In response to the above's idea on the last part of the question, I would argue that it would enable fast execution because all threads that receive a cache miss would be picked up by the other threads so long as there was enough resources. Also the use of more threads would help synchronize the cache (through sharing) so that it would not miss. Of course this would be if they were assigned to the same task, you cannot sync threads running different applications it just wouldn't make sense. The only issue with this idea is the software must support this number.

'''vG: '''
We should talk about type of relationship models (1:1 N:M N:N and so on) also talk about the application vs hardware multi-threading within single processor.

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek <support.tamiltreasure@gmail.com>

Shane Panke <shanepanke@msn.com>

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Multi-threading is a term used to describe:

* A facility provided by the operating system that enables an application to create threads of execution within a process
* Applications whose architecture takes advantage of the multi-threading provided by the operating system
[[vG]]
----
These are all related ideas.

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

One implementation of UMS is: combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5'' -[[Rannath]]

----

I would scrap the first two below, at most mention them...

#time-division multiplexing
#threads vs processes
#I/O Scheduling -[[vG]]

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
This is a '''need''' section 4 below is not '''needed''' 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes
4. Examples of systems that require many threads such as mainframe servers or banking client processing.--[[User:Praubic|Praubic]] 17:34, 11 October 2010 (UTC)

Here is an example of a design: (the topic asks for key design choices here is one)

Capriccio is a specific design for scalable user level threads. They are distinct from most designs by being independent of event based mechanisms as well as kernel thread models. They are very good choice for internet servers and this implementations could easily support 100,000 threads. They are characterized by high scalability, efficient stack management and scheduling based on resource usage however the performance is not comparable to event-based systems.--[[User:Praubic|Praubic]] 13:32, 12 October 2010 (UTC)

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step, or the last step, we can always present our info and make our thesis match the info.
#List all questions and points we have about the topic

Questions:
# What makes threads non-scalable? List the problems
# What utility do some scalable implementations lack? Why?
# Just how scalable does a full utility implementation get?

Answers:
# Memory Usage, Context Switching. Consider using a thread pool.
# Signals, portability(maybe) both add overhead which would slow down threads
# If using thread pools, the scalability is then limited to the number of threads in the pool
----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point 1 -[[Rannath]] 
- efficient thread creation/destruction is more scalable 
-- NPTL's improvements over LinuxThreads- primarily due to lower overhead of creation/destruction ''1''

Main Point 2 -[[Rannath]] 
- UMS & user-space threads are more scalable - maybe 
-- context switches are costly ''From class'' 
-- blocking locks have lower latency when twinned with a user space scheduler ''8''

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --[[User:Praubic|Praubic]] 00:18, 14 October 2010 (UTC)

Main Point 3 
- Certain bottleneck appear in scaled implementations, removing these improves scalability. 
-- "False cache-line sharing" ''14'' 
-- xtime lock to a lockless lock ''14''

Main Point 3.5 
Fine-Grain over course-grain 
-- "Big Kernel Lock" ''14'' 
-- dcache_lock ''14''

Link the Main points to the thesis

----

Conclusion
# restate info
# affirmation of thesis

Here is the first paragraph that I attempted. Please feel free to change or even delete it from here.

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
--[[User:Praubic|Praubic]] 19:04, 12 October 2010 (UTC)

we can add this for intro paragraph:

How is it possible for systems to supports millions of threads or more within a single process?

It is possible for systems to supports millions of threads or more within a single processor, it has the ability to switch execution resource between threads, thus making a concurrent execution. Concurrency is when multiple threads stays on the ques for switching but incapable of running at the same time but it has the ability to make it look like they are running at same time due to the speed they switch. [[vG]] You stated it is possible you did not state how, or rather did not make it clear. The below should be a better interpretation. --[[User:Spanke|Shane]]

Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.

----
I suggest that we start filling out the main points of the essay. We can discuss the intricacies as we go along. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6590&rep=rep1&type=pdf Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/http://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_7&action=edit&section=7hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# [http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt Interesting presentation: really worth checking out] [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)
# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83 Scalability in linux]
# [http://hillside.net/plop/2007/papers/PLoP2007_Ahluwalia.pdf This has something to do with our question...]
# [http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx Scheduling Priorities (Windows)], Microsoft (23 September 2010) --[[User:Spanke|Shane]]
# [http://www.novell.com/coolsolutions/feature/14878.html Linux Scheduling Priorities Explained], Novell (11 October 2005) --[[User:Spanke|Shane]]
# [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/ Inside the Linux 2.6 Completely Fair Scheduler], IBM (15 December 2009) --[[User:Spanke|Shane]]
#http://www.megaupload.com/?d=R4VMK3A1 (PDF Document on Multithreading) [[vG]]
# [http://www.linuxjournal.com/article/1363 what is multithreading?] [[vG]]
# [http://en.wikipedia.org/wiki/Thread_%28computer_science%29 type of threadings and multithreading in general] [[vG]]

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-14T06:56:56Z

Vviveka2:

== Log ==
'''Suggestion:''' Let us maintain our edits here instead of on littering the main page with our names. Also please do not edit without writing to the log so that we know who has done what and when.

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity. Everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about)
More moving for clarity. added an essay outline at bottom (feel free to change)
filled in the outline somewhat added questions to the outline for everyone to think on.--[[User:Rannath|Rannath]]

First Draft for essay. Please modify and add on. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

Edited Scheduling Priorities and rewrote some areas to provide a better paragraph structure. --[[User:Spanke|Shane]] 15:25, 13 October 2010 (UTC)

Added to the memory management section. --[[User:Hirving|Hirving]] 21:42, 13 October 2010 (UTC)

Edited Scalable Threads Problems. Also did a little re-arrangement. --[[User:Gautam|Gautam]] 01:03, 14 October 2010 (UTC)

Answered Essay Questions in Discussion. --[[User:Spanke|Shane]] 01:25, 13 October 2010 (UTC)

 <Add your future activities here>

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics.

To be more clear: we can limit ourselves from the thread implementations to the thread scalability... ignore the stuff that required for all threads, unless its required for many threads. (I didn't find any implementations that required hardware)

I would also argue that since OSs have to run on multiple hardwares one cannot guarantee that unique/rare hardware bits will be there. While we can talk about hardware we should limit it to a mention at most. OR we could mention prospective hardware that could help out, but is not yet standard. It depends on whether we want to do "as it is" or "as it might be"

utility of such massively scalable thread implementations. I took this as: what functionality (of single strings) does one have to give up to make threads scalable.

'''Gautam: '''
I think the hardware is as relevant as the software. Not all things can be done in software and hardware support is an important factor in most of the solutions to many problems that OS face. My take.

'''Henry: '''
Since the question is about the system as a whole, I think the answer should include both software and hardware support for large amounts of threads. The questions revolves around how a system can handle millions of threads and what are the major factors that allow the system to do it. Also, the last part of the question seems to ask what this amount of threads allows a process to do.

'''Shane: '''
In response to the above's idea on the last part of the question, I would argue that it would enable fast execution because all threads that receive a cache miss would be picked up by the other threads so long as there was enough resources. Also the use of more threads would help synchronize the cache (through sharing) so that it would not miss. Of course this would be if they were assigned to the same task, you cannot sync threads running different applications it just wouldn't make sense. The only issue with this idea is the software must support this number.

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek <support.tamiltreasure@gmail.com>

Shane Panke <shanepanke@msn.com>

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Multi-threading is a term used to describe:

* A facility provided by the operating system that enables an application to create threads of execution within a process
* Applications whose architecture takes advantage of the multi-threading provided by the operating system
[[vG]]
----
These are all related ideas.

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

One implementation of UMS is: combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5'' -[[Rannath]]

----

I would scrap the first two below, at most mention them...

#time-division multiplexing
#threads vs processes
#I/O Scheduling -[[vG]]

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
This is a '''need''' section 4 below is not '''needed''' 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes
4. Examples of systems that require many threads such as mainframe servers or banking client processing.--[[User:Praubic|Praubic]] 17:34, 11 October 2010 (UTC)

Here is an example of a design: (the topic asks for key design choices here is one)

Capriccio is a specific design for scalable user level threads. They are distinct from most designs by being independent of event based mechanisms as well as kernel thread models. They are very good choice for internet servers and this implementations could easily support 100,000 threads. They are characterized by high scalability, efficient stack management and scheduling based on resource usage however the performance is not comparable to event-based systems.--[[User:Praubic|Praubic]] 13:32, 12 October 2010 (UTC)

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step, or the last step, we can always present our info and make our thesis match the info.
#List all questions and points we have about the topic

Questions:
# What makes threads non-scalable? List the problems
# What utility do some scalable implementations lack? Why?
# Just how scalable does a full utility implementation get?

Answers:
# Memory Usage, Context Switching. Consider using a thread pool.
# Signals, portability(maybe) both add overhead which would slow down threads
# If using thread pools, the scalability is then limited to the number of threads in the pool
----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point 1 -[[Rannath]] 
- efficient thread creation/destruction is more scalable 
-- NPTL's improvements over LinuxThreads- primarily due to lower overhead of creation/destruction ''1''

Main Point 2 -[[Rannath]] 
- UMS & user-space threads are more scalable - maybe 
-- context switches are costly ''From class'' 
-- blocking locks have lower latency when twinned with a user space scheduler ''8''

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --[[User:Praubic|Praubic]] 00:18, 14 October 2010 (UTC)

Main Point 3 
- Certain bottleneck appear in scaled implementations, removing these improves scalability. 
-- "False cache-line sharing" ''14'' 
-- xtime lock to a lockless lock ''14''

Main Point 3.5 
Fine-Grain over course-grain 
-- "Big Kernel Lock" ''14'' 
-- dcache_lock ''14''

Link the Main points to the thesis

----

Conclusion
# restate info
# affirmation of thesis

Here is the first paragraph that I attempted. Please feel free to change or even delete it from here.

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
--[[User:Praubic|Praubic]] 19:04, 12 October 2010 (UTC)

we can add this for intro paragraph:

How is it possible for systems to supports millions of threads or more within a single process?

It is possible for systems to supports millions of threads or more within a single processor, it has the ability to switch execution resource between threads, thus making a concurrent execution. Concurrency is when multiple threads stays on the ques for switching but incapable of running at the same time but it has the ability to make it look like they are running at same time due to the speed they switch. [[vG]] You stated it is possible you did not state how, or rather did not make it clear. The below should be a better interpretation. --[[User:Spanke|Shane]]

Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.

----
I suggest that we start filling out the main points of the essay. We can discuss the intricacies as we go along. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6590&rep=rep1&type=pdf Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/http://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_7&action=edit&section=7hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# [http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt Interesting presentation: really worth checking out] [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)
# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83 Scalability in linux]
# [http://hillside.net/plop/2007/papers/PLoP2007_Ahluwalia.pdf This has something to do with our question...]
# [http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx Scheduling Priorities (Windows)], Microsoft (23 September 2010) --[[User:Spanke|Shane]]
# [http://www.novell.com/coolsolutions/feature/14878.html Linux Scheduling Priorities Explained], Novell (11 October 2005) --[[User:Spanke|Shane]]
# [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/ Inside the Linux 2.6 Completely Fair Scheduler], IBM (15 December 2009) --[[User:Spanke|Shane]]
#http://www.megaupload.com/?d=R4VMK3A1 (PDF Document on Multithreading) [[vG]]
# [http://www.linuxjournal.com/article/1363 what is multithreading?] [[vG]]
# [http://en.wikipedia.org/wiki/Thread_%28computer_science%29 type of threadings and multithreading in general] [[vG]]

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-14T06:54:10Z

Vviveka2:

== Log ==
'''Suggestion:''' Let us maintain our edits here instead of on littering the main page with our names. Also please do not edit without writing to the log so that we know who has done what and when.

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity. Everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about)
More moving for clarity. added an essay outline at bottom (feel free to change)
filled in the outline somewhat added questions to the outline for everyone to think on.--[[User:Rannath|Rannath]]

First Draft for essay. Please modify and add on. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

Edited Scheduling Priorities and rewrote some areas to provide a better paragraph structure. --[[User:Spanke|Shane]] 15:25, 13 October 2010 (UTC)

Added to the memory management section. --[[User:Hirving|Hirving]] 21:42, 13 October 2010 (UTC)

Edited Scalable Threads Problems. Also did a little re-arrangement. --[[User:Gautam|Gautam]] 01:03, 14 October 2010 (UTC)

Answered Essay Questions in Discussion. --[[User:Spanke|Shane]] 01:25, 13 October 2010 (UTC)

 <Add your future activities here>

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics.

To be more clear: we can limit ourselves from the thread implementations to the thread scalability... ignore the stuff that required for all threads, unless its required for many threads. (I didn't find any implementations that required hardware)

I would also argue that since OSs have to run on multiple hardwares one cannot guarantee that unique/rare hardware bits will be there. While we can talk about hardware we should limit it to a mention at most. OR we could mention prospective hardware that could help out, but is not yet standard. It depends on whether we want to do "as it is" or "as it might be"

utility of such massively scalable thread implementations. I took this as: what functionality (of single strings) does one have to give up to make threads scalable.

'''Gautam: '''
I think the hardware is as relevant as the software. Not all things can be done in software and hardware support is an important factor in most of the solutions to many problems that OS face. My take.

'''Henry: '''
Since the question is about the system as a whole, I think the answer should include both software and hardware support for large amounts of threads. The questions revolves around how a system can handle millions of threads and what are the major factors that allow the system to do it. Also, the last part of the question seems to ask what this amount of threads allows a process to do.

'''Shane: '''
In response to the above's idea on the last part of the question, I would argue that it would enable fast execution because all threads that receive a cache miss would be picked up by the other threads so long as there was enough resources. Also the use of more threads would help synchronize the cache (through sharing) so that it would not miss. Of course this would be if they were assigned to the same task, you cannot sync threads running different applications it just wouldn't make sense. The only issue with this idea is the software must support this number.

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek <support.tamiltreasure@gmail.com>

Shane Panke <shanepanke@msn.com>

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Multi-threading is a term used to describe:

* A facility provided by the operating system that enables an application to create threads of execution within a process
* Applications whose architecture takes advantage of the multi-threading provided by the operating system
[[vG]]
----
These are all related ideas.

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

One implementation of UMS is: combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5'' -[[Rannath]]

----

I would scrap the first two below, at most mention them...

#time-division multiplexing
#threads vs processes
#I/O Scheduling -[[vG]]

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
This is a '''need''' section 4 below is not '''needed''' 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes
4. Examples of systems that require many threads such as mainframe servers or banking client processing.--[[User:Praubic|Praubic]] 17:34, 11 October 2010 (UTC)

Here is an example of a design: (the topic asks for key design choices here is one)

Capriccio is a specific design for scalable user level threads. They are distinct from most designs by being independent of event based mechanisms as well as kernel thread models. They are very good choice for internet servers and this implementations could easily support 100,000 threads. They are characterized by high scalability, efficient stack management and scheduling based on resource usage however the performance is not comparable to event-based systems.--[[User:Praubic|Praubic]] 13:32, 12 October 2010 (UTC)

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step, or the last step, we can always present our info and make our thesis match the info.
#List all questions and points we have about the topic

Questions:
# What makes threads non-scalable? List the problems
# What utility do some scalable implementations lack? Why?
# Just how scalable does a full utility implementation get?

Answers:
# Memory Usage, Context Switching. Consider using a thread pool.
# Signals, portability(maybe) both add overhead which would slow down threads
# If using thread pools, the scalability is then limited to the number of threads in the pool
----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point 1 -[[Rannath]] 
- efficient thread creation/destruction is more scalable 
-- NPTL's improvements over LinuxThreads- primarily due to lower overhead of creation/destruction ''1''

Main Point 2 -[[Rannath]] 
- UMS & user-space threads are more scalable - maybe 
-- context switches are costly ''From class'' 
-- blocking locks have lower latency when twinned with a user space scheduler ''8''

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --[[User:Praubic|Praubic]] 00:18, 14 October 2010 (UTC)

Main Point 3 
- Certain bottleneck appear in scaled implementations, removing these improves scalability. 
-- "False cache-line sharing" ''14'' 
-- xtime lock to a lockless lock ''14''

Main Point 3.5 
Fine-Grain over course-grain 
-- "Big Kernel Lock" ''14'' 
-- dcache_lock ''14''

Link the Main points to the thesis

----

Conclusion
# restate info
# affirmation of thesis

Here is the first paragraph that I attempted. Please feel free to change or even delete it from here.

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
--[[User:Praubic|Praubic]] 19:04, 12 October 2010 (UTC)

we can add this for intro paragraph:

How is it possible for systems to supports millions of threads or more within a single process?

It is possible for systems to supports millions of threads or more within a single processor, it has the ability to switch execution resource between threads, thus making a concurrent execution. Concurrency is when multiple threads stays on the ques for switching but incapable of running at the same time but it has the ability to make it look like they are running at same time due to the speed they switch. [[vG]] You stated it is possible you did not state how, or rather did not make it clear. The below should be a better interpretation. --[[User:Spanke|Shane]]

Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.

----
I suggest that we start filling out the main points of the essay. We can discuss the intricacies as we go along. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6590&rep=rep1&type=pdf Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/http://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_7&action=edit&section=7hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# [http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt Interesting presentation: really worth checking out] [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)
# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83 Scalability in linux]
# [http://hillside.net/plop/2007/papers/PLoP2007_Ahluwalia.pdf This has something to do with our question...]
# [http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx Scheduling Priorities (Windows)], Microsoft (23 September 2010) --[[User:Spanke|Shane]]
# [http://www.novell.com/coolsolutions/feature/14878.html Linux Scheduling Priorities Explained], Novell (11 October 2005) --[[User:Spanke|Shane]]
# [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/ Inside the Linux 2.6 Completely Fair Scheduler], IBM (15 December 2009) --[[User:Spanke|Shane]]
#http://www.megaupload.com/?d=R4VMK3A1 (PDF Document on Multithreading) [[vG]]

COMP 3000 Essay 1 2010 Question 7

2010-10-14T06:48:17Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=
A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable. User threads on the other hand, are mapped to kernel threads by the threads library such as libpthreads. There are a few designs that incorporate it, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools which are yet another mechanism that allows for high scalability. 
'''Taken the liberty to add Praubic's tentative first para. No changes made as of yet.'''

I have the following... have your words or remove it if not needed:)

Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.
[[vG]] Edited by : [[User:Spanke|Shane]]

----

== Scalable Threads: The Problems ==

One of the challenges in making an existing code base scalable is the identification and elimination of bottlenecks. When porting Linux to a 64-core NUMA system Ray Bryant and John Hawkes found the following bottlenecks (or just wrote a paper about them):

'''Cache-Coherency:'''There can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is comparatively expensive. Once misplaced information that causes this problem is identified it can be moved to limit the problem.

'''Locks:'''There can also be some user-called locks that contribute to bottlenecks. One such lock is the xtime_lock in Linux. Having locking reading prevented writing to the timer value, leading to starvation. This problem was solved by using a lockless-read.

'''Scheduler:'''The multiqueue scheduler is the third major bottle neck. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems. The spinlock ate up a fair majority of the CPU time. Whilem, the rest went into computing and recomputing information in the cache. These problems were fixed by replacing the scheduler,. The scheduler was then replaced by a more efficient scheduler [O(1) scheduler].

'''CPU:'''The next few bottle necks are related. They're both examples of course-granularity locks eating CPU time. Granularity refers to the execution time of a code segment. The closer a segment is to the speed of an atomic action the finer its granularity. Another course-grained bottleneck was the dcache_lock. It ate up a adequate amount of time in normal use. But it was also called in the much more popular dnotify_parent() function, which made it unacceptable. So the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux.

'''Kernel Lock:'''One big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottle necks.

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements which handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --Praubic

== Design Choices ==
'''(A) Kernel Threads and User Threads (1:1 vs M:N) ''' --[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
This is the most basic design choice. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model.
''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.

''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.

Explaining the four types of synchronization:

*Mutex locks uses only a thread thus giving access to only certain part of the code
*Using Read/Write synchronization one can gain exclusive write and read access to protected resource but to edit the content it must have the exclusive write lock. Exclusive write lock is only permitted when all the read locks are released
*Condition variable synchronization protects the thread until the condition becomes true
*Counting semaphores delivers access to multiple threads. It has a count which keeps tracks of the number of threads can have concurrent access to the data. Once the limit is reached other threads are blocked until the limit changes.
[[vG]]
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.
The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads that are in contention), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.
The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]
''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency. --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==

COMP 3000 Essay 1 2010 Question 7

2010-10-14T06:29:13Z

Vviveka2:

COMP 3000 Essay 1 2010 Question 7

2010-10-14T06:27:24Z

Vviveka2:

=Question=

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

=Answer=
A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable. User threads on the other hand, are mapped to kernel threads by the threads library such as libpthreads. There are a few designs that incorporate it, mainly Fibers and UMS (User Mode Scheduling) which allow for very high scalability. UMS threads have their own context and resources. However, the ability to switch in the user mode makes them more efficient (depending on the application) than Thread Pools which are yet another mechanism that allows for high scalability. 
'''Taken the liberty to add Praubic's tentative first para. No changes made as of yet.'''

I have the following... have your words or remove it if not needed:)

It is possible for systems to supports millions of threads or more within a single processor, it has the ability to switch execution resource between threads, thus making a concurrent execution. Concurrency is when multiple threads stays on the ques for switching but incapable of running at the same time but it has the ability to make it look like they are running at same time due to the speed they switch. [[vG]] You stated it is possible you did not state how, or rather did not make it clear. The below should be a better interpretation. --[[User:Spanke|Shane]]

Systems can support millions within a single process by switching execution resources between threads, creating a concurrent execution. Concurrency is the result of multiple threads staying on the queues but is incapable of running them at the same time. It provides the impression that they are executing at the same time due to the speed they switch at.
[[vG]]

----

== Scalable Threads: The Problems ==

One of the challenges in making an existing code base scalable is the identification and elimination of bottlenecks. When porting Linux to a 64-core NUMA system Ray Bryant and John Hawkes found the following bottlenecks (or just wrote a paper about them):

'''Cache-Coherency:'''There can be some instances of misplaced information in the cache that can cause a "cache-coherency operation" to be called. This operation is comparatively expensive. Once misplaced information that causes this problem is identified it can be moved to limit the problem.

'''Locks:'''There can also be some user-called locks that contribute to bottlenecks. One such lock is the xtime_lock in Linux. Having locking reading prevented writing to the timer value, leading to starvation. This problem was solved by using a lockless-read.

'''Scheduler:'''The multiqueue scheduler is the third major bottle neck. Altogether, the multiqueue scheduler ate up 25% of the CPU time. It had two problems. The spinlock ate up a fair majority of the CPU time. Whilem, the rest went into computing and recomputing information in the cache. These problems were fixed by replacing the scheduler,. The scheduler was then replaced by a more efficient scheduler [O(1) scheduler].

'''CPU:'''The next few bottle necks are related. They're both examples of course-granularity locks eating CPU time. Granularity refers to the execution time of a code segment. The closer a segment is to the speed of an atomic action the finer its granularity. Another course-grained bottleneck was the dcache_lock. It ate up a adequate amount of time in normal use. But it was also called in the much more popular dnotify_parent() function, which made it unacceptable. So the dcache_lock strategy was replaced with a finer-grained strategy from a later implementation of linux.

'''Kernel Lock:'''One big course-grained bottleneck in the system is the "Big Kernel Lock" (BKL) linux's kernel synchronization control. Waiting for the BKL took up as much as 70% of the CPU time on a system with only 28 cores. The preferred method, on Linux NUMA systems, was to limit the BKL's usage. The ext2 and ext3 file systems were replaced with a file system that uses finer-grained locking (XFS), reducing the impact of the bottle necks.

'''MAIN POINT 2 Paragraph draft''' --[[User:Praubic|Praubic]] 00:21, 14 October 2010 (UTC) still in progress and debating

Introduction of Windows NT and OS/2 brought about innovation that provides cheap threading while having expensive processing. UMS which reflects such design is a recommended mechanism for high performance requirements which handle many threads on multicore systems. A scheduler has to be implemented to manage the UMS threads and decide when they should be run or stopped. This implementation is not desirable for moderate performance systems because concurrent execution of this sort naturally allows for non-intuitive outcomes or behaviors such as race condition which requires careful programming and design choices. The framework used by UMS threading is divided into smaller abstractions depending on the final desired utility. For instance, UMS scheduling can be assigned to each logical processor and thereby creating affinity for related threads to function around one scheduler. This could turn out inefficient depending whether there are many related threads that could end up starving other processes.

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --Praubic

== Design Choices ==
'''(A) Kernel Threads and User Threads (1:1 vs M:N) ''' --[[User:Gautam|Gautam]] 00:29, 14 October 2010 (UTC) 
This is the most basic design choice. The 1:1 boasts of a slim clean library interface on top of the kernel functions. Although, the M:N would implement a complicated library, it would offer advantages in areas of signal handling. A general consensus was that the M:N design was not compatible with the Linux kernel due to such a high cost for implementation. This gave birth to the 1:1 model.
''' (B)Signal Handling '''
The kernel implements the POSIX signal handling for use with the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked, no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. This only holds true if the 1-on-1 model is used.
''' (C)Synchronization '''
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application. Fortunately, new functionality was added to the kernel to implement all kinds of synchronization.
''' (D)Memory Management '''
Thread memory management is an important design choice when attempting to create a large amount of threads in a single process, from creation to maintenance and deallocation. A thread's data structure is made up of a program counter, a stack and a control block. A control block of a thread is needed for thread management as it contains the state data of a thread. The optimization of this data structure can greatly increase performance in large number of threads.
The creation of a thread can take place before the process actually requires it to run and wait until a idle processor becomes available to run the thread. Thread overhead (the required memory, CPU time, and read/write time to initialize the thread) is a problem that can arise with this creation process, since it frontloads the process. Another problem with this creation process is that the thread must allocate the memory required for it's stack at creation because it is expensive to dynamically allocate the stack memory. A way to optimize this creation process for large amounts of threads is to copy the arguments of the thread into it's control block, this allows for the thread's stack to be allocated at the thread's startup (when the thread starts being used) and not when the thread is created. When the thread enters startup it can copy it's arguments out of it's control block and allocate it's memory. Thread creation is ruled by latency (the cost of thread management on the system) and throughput (the rate that the system can create, start, and finish threads that are in contention), and, if thread memory management is done in a serial processing manner, these two factor combine to create a maximum rate of thread creation.
The deallocation of a thread can also be optimized for use in increasing the scalability of threads. Storing deallocted stacks and control blocks in a free list allows the process of allocation and deallocation to be a list operation, if they are not stored in a free list then the thread overhead would include finding the correct size of free memory to store the stack. [http://portal.acm.org/citation.cfm?id=75378] [[hirving]]
''' (E)Scheduling Priorities '''
A thread is an entity that can be scheduled according to its scheduling priority which is a number ranging from 0 to 31 for Windows and a Red-Black Tree used by the CFS (Completely Fair Scheduler) in Linux. All threads are executed in a time splice assigned to them in round robin fashion and lower priority threads wait until the ones above finish performing their tasks. Threads are composed of thread context which internally breaks down into set of machine registers, the kernel and user stack all linked to the address space of the process where the thread resides. A context switch occurs as the time splice elapses and an equal (or higher) priority thread becomes available and it is responsible for allowing high scalability if it is efficiently implemented. For example fibers which are executed entirely in userspace do not require a system call during a switch which highly increases efficiency. --[[User:Praubic|Praubic]] 18:24, 13 October 2010 (UTC)

== References ==

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-14T04:11:39Z

Vviveka2:

== Log ==
'''Suggestion:''' Let us maintain our edits here instead of on littering the main page with our names. Also please do not edit without writing to the log so that we know who has done what and when.

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity. Everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about)
More moving for clarity. added an essay outline at bottom (feel free to change)
filled in the outline somewhat added questions to the outline for everyone to think on.--[[User:Rannath|Rannath]]

First Draft for essay. Please modify and add on. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

Added to the memory management section. --[[User:Hirving|Hirving]] 21:42, 13 October 2010 (UTC)

Edited Scalable Threads Problems. Also did a little re-arrangement. --[[User:Gautam|Gautam]] 01:03, 14 October 2010 (UTC)

 <Add your future activities here>

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics.

To be more clear: we can limit ourselves from the thread implementations to the thread scalability... ignore the stuff that required for all threads, unless its required for many threads. (I didn't find any implementations that required hardware)

I would also argue that since OSs have to run on multiple hardwares one cannot guarantee that unique/rare hardware bits will be there. While we can talk about hardware we should limit it to a mention at most. OR we could mention prospective hardware that could help out, but is not yet standard. It depends on whether we want to do "as it is" or "as it might be"

utility of such massively scalable thread implementations. I took this as: what functionality (of single strings) does one have to give up to make threads scalable.

'''Gautam: '''
I think the hardware is as relevant as the software. Not all things can be done in software and hardware support is an important factor in most of the solutions to many problems that OS face. My take.

'''Henry: '''
Since the question is about the system as a whole, I think the answer should include both software and hardware support for large amounts of threads. The questions revolves around how a system can handle millions of threads and what are the major factors that allow the system to do it. Also, the last part of the question seems to ask what this amount of threads allows a process to do.

'''Shane: '''
In response to the above's idea on the last part of the question, I would argue that it would enable fast execution because all threads that receive a cache miss would be picked up by the other threads so long as there was enough resources. Also the use of more threads would help synchronize the cache (through sharing) so that it would not miss. Of course this would be if they were assigned to the same task, you cannot sync threads running different applications it just wouldn't make sense. The only issue with this idea is the software must support this number.

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek <support.tamiltreasure@gmail.com>

Shane Panke <shanepanke@msn.com>

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Multi-threading is a term used to describe:

* A facility provided by the operating system that enables an application to create threads of execution within a process
* Applications whose architecture takes advantage of the multi-threading provided by the operating system
[[vG]]
----
These are all related ideas.

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

One implementation of UMS is: combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5'' -[[Rannath]]

----

I would scrap the first two below, at most mention them...

#time-division multiplexing
#threads vs processes
#I/O Scheduling -[[vG]]

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
This is a '''need''' section 4 below is not '''needed''' 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes
4. Examples of systems that require many threads such as mainframe servers or banking client processing.--[[User:Praubic|Praubic]] 17:34, 11 October 2010 (UTC)

Here is an example of a design: (the topic asks for key design choices here is one)

Capriccio is a specific design for scalable user level threads. They are distinct from most designs by being independent of event based mechanisms as well as kernel thread models. They are very good choice for internet servers and this implementations could easily support 100,000 threads. They are characterized by high scalability, efficient stack management and scheduling based on resource usage however the performance is not comparable to event-based systems.--[[User:Praubic|Praubic]] 13:32, 12 October 2010 (UTC)

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step, or the last step, we can always present our info and make our thesis match the info.
#List all questions and points we have about the topic

Questions:
#What makes threads non-scalable? List the problems
#What utility do some scalable implementations lack? Why?
#Just how scalable does a full utility implementation get?

Answers:
#
# Signals, portability(maybe) both add overhead which would slow down threads
#
----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point 1 -[[Rannath]] 
- efficient thread creation/destruction is more scalable 
-- NPTL's improvements over LinuxThreads- primarily due to lower overhead of creation/destruction ''1''

Main Point 2 -[[Rannath]] 
- UMS & user-space threads are more scalable - maybe 
-- context switches are costly ''From class'' 
-- blocking locks have lower latency when twinned with a user space scheduler ''8''

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --[[User:Praubic|Praubic]] 00:18, 14 October 2010 (UTC)

Main Point 3 
- Certain bottleneck appear in scaled implementations, removing these improves scalability. 
-- "False cache-line sharing" ''14'' 
-- xtime lock to a lockless lock ''14''

Main Point 3.5 
Fine-Grain over course-grain 
-- "Big Kernel Lock" ''14'' 
-- dcache_lock ''14''

Link the Main points to the thesis

----

Conclusion
# restate info
# affirmation of thesis

Here is the first paragraph that I attempted. Please feel free to change or even delete it from here.

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
--[[User:Praubic|Praubic]] 19:04, 12 October 2010 (UTC)

we can add this for intro paragraph:

How is it possible for systems to supports millions of threads or more within a single process?

It is possible for systems to supports millions of threads or more within a single processor, it has the ability to switch execution resource between threads, thus making a concurrent execution. Concurrency is when multiple threads stays on the ques for switching but incapable of running at the same time but it has the ability to make it look like they are running at same time due to the speed they switch. [[vG]]

----
I suggest that we start filling out the main points of the essay. We can discuss the intricacies as we go along. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6590&rep=rep1&type=pdf Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/http://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_7&action=edit&section=7hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# [http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt Interesting presentation: really worth checking out] [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)
# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83 Scalability in linux]
# [http://hillside.net/plop/2007/papers/PLoP2007_Ahluwalia.pdf This has something to do with our question...]
# [http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx Scheduling Priorities (Windows)], Microsoft (23 September 2010) --[[User:Spanke|Shane]]
# [http://www.novell.com/coolsolutions/feature/14878.html Linux Scheduling Priorities Explained], Novell (11 October 2005) --[[User:Spanke|Shane]]
# [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/ Inside the Linux 2.6 Completely Fair Scheduler], IBM (15 December 2009) --[[User:Spanke|Shane]]

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-14T03:41:39Z

Vviveka2:

== Log ==
'''Suggestion:''' Let us maintain our edits here instead of on littering the main page with our names. Also please do not edit without writing to the log so that we know who has done what and when.

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity. Everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about)
More moving for clarity. added an essay outline at bottom (feel free to change)
filled in the outline somewhat added questions to the outline for everyone to think on.--[[User:Rannath|Rannath]]

First Draft for essay. Please modify and add on. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

Added to the memory management section. --[[User:Hirving|Hirving]] 21:42, 13 October 2010 (UTC)

Edited Scalable Threads Problems. Also did a little re-arrangement. --[[User:Gautam|Gautam]] 01:03, 14 October 2010 (UTC)

 <Add your future activities here>

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics.

To be more clear: we can limit ourselves from the thread implementations to the thread scalability... ignore the stuff that required for all threads, unless its required for many threads. (I didn't find any implementations that required hardware)

I would also argue that since OSs have to run on multiple hardwares one cannot guarantee that unique/rare hardware bits will be there. While we can talk about hardware we should limit it to a mention at most. OR we could mention prospective hardware that could help out, but is not yet standard. It depends on whether we want to do "as it is" or "as it might be"

utility of such massively scalable thread implementations. I took this as: what functionality (of single strings) does one have to give up to make threads scalable.

'''Gautam: '''
I think the hardware is as relevant as the software. Not all things can be done in software and hardware support is an important factor in most of the solutions to many problems that OS face. My take.

'''Henry: '''
Since the question is about the system as a whole, I think the answer should include both software and hardware support for large amounts of threads. The questions revolves around how a system can handle millions of threads and what are the major factors that allow the system to do it. Also, the last part of the question seems to ask what this amount of threads allows a process to do.

'''Shane: '''
In response to the above's idea on the last part of the question, I would argue that it would enable fast execution because all threads that receive a cache miss would be picked up by the other threads so long as there was enough resources. Also the use of more threads would help synchronize the cache (through sharing) so that it would not miss. Of course this would be if they were assigned to the same task, you cannot sync threads running different applications it just wouldn't make sense. The only issue with this idea is the software must support this number.

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek <support.tamiltreasure@gmail.com>

Shane Panke <shanepanke@msn.com>

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Multi-threading is a term used to describe:

* A facility provided by the operating system that enables an application to create threads of execution within a process
* Applications whose architecture takes advantage of the multi-threading provided by the operating system
[[vG]]
----
These are all related ideas.

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

One implementation of UMS is: combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5'' -[[Rannath]]

----

I would scrap the first two below, at most mention them...

#time-division multiplexing
#threads vs processes
#I/O Scheduling -[[vG]]

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
This is a '''need''' section 4 below is not '''needed''' 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes
4. Examples of systems that require many threads such as mainframe servers or banking client processing.--[[User:Praubic|Praubic]] 17:34, 11 October 2010 (UTC)

Here is an example of a design: (the topic asks for key design choices here is one)

Capriccio is a specific design for scalable user level threads. They are distinct from most designs by being independent of event based mechanisms as well as kernel thread models. They are very good choice for internet servers and this implementations could easily support 100,000 threads. They are characterized by high scalability, efficient stack management and scheduling based on resource usage however the performance is not comparable to event-based systems.--[[User:Praubic|Praubic]] 13:32, 12 October 2010 (UTC)

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step, or the last step, we can always present our info and make our thesis match the info.
#List all questions and points we have about the topic

Questions:
#What makes threads non-scalable? List the problems
#What utility do some scalable implementations lack? Why?
#Just how scalable does a full utility implementation get?

Answers:
#
# Signals, portability(maybe) both add overhead which would slow down threads
#
----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point 1 -[[Rannath]] 
- efficient thread creation/destruction is more scalable 
-- NPTL's improvements over LinuxThreads- primarily due to lower overhead of creation/destruction ''1''

Main Point 2 -[[Rannath]] 
- UMS & user-space threads are more scalable - maybe 
-- context switches are costly ''From class'' 
-- blocking locks have lower latency when twinned with a user space scheduler ''8''

Ok for point 2 -> I posted a draft on the essay page but Im not certain as to whether i should talk about fibers since they are also functioning on user space but theyre not UMS. --[[User:Praubic|Praubic]] 00:18, 14 October 2010 (UTC)

Main Point 3 
- Certain bottleneck appear in scaled implementations, removing these improves scalability. 
-- "False cache-line sharing" ''14'' 
-- xtime lock to a lockless lock ''14''

Main Point 3.5 
Fine-Grain over course-grain 
-- "Big Kernel Lock" ''14'' 
-- dcache_lock ''14''

Link the Main points to the thesis

----

Conclusion
# restate info
# affirmation of thesis

Here is the first paragraph that I attempted. Please feel free to change or even delete it from here.

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
--[[User:Praubic|Praubic]] 19:04, 12 October 2010 (UTC)

----
I suggest that we start filling out the main points of the essay. We can discuss the intricacies as we go along. --[[User:Gautam|Gautam]] 02:46, 13 October 2010 (UTC)

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.6590&rep=rep1&type=pdf Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/http://homeostasis.scs.carleton.ca/wiki/index.php?title=Talk:COMP_3000_Essay_1_2010_Question_7&action=edit&section=7hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# [http://www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt Interesting presentation: really worth checking out] [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)
# [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.7621&rep=rep1&type=pdf#page=83 Scalability in linux]
# [http://hillside.net/plop/2007/papers/PLoP2007_Ahluwalia.pdf This has something to do with our question...]
# [http://msdn.microsoft.com/en-us/library/ms685100%28VS.85%29.aspx Scheduling Priorities (Windows)], Microsoft (23 September 2010) --[[User:Spanke|Shane]]
# [http://www.novell.com/coolsolutions/feature/14878.html Linux Scheduling Priorities Explained], Novell (11 October 2005) --[[User:Spanke|Shane]]
# [http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/ Inside the Linux 2.6 Completely Fair Scheduler], IBM (15 December 2009) --[[User:Spanke|Shane]]

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-11T16:04:38Z

Vviveka2: /* Facts We have */

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics. I think we can safety limit ourselves to software-side only.

'''Paul: '''
Our topic contains 3 parts to it from what i see
# How is it possible for systems to supports millions of threads or more within a single process?
# What are the key design choices that make such systems work -
# and how do those choices affect the utility of such massively scalable thread implementations?
We need to find a way to split it between 5 people so everyone focuses primarily on one aspect.
If you guys don't mind Id like to discuss the format of our essay. [[Paul]]

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek - support.tamiltreasure@gmail.com

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Log ==
Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity

everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about) --[[User:Rannath|Rannath]]

<Add your future activities here>

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

#time-division multiplexing
#threads vs processes
#I/O Scheduling

[[vG]]
Design Decisions:
# it looks like one is done with a combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5''

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

----
Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step.
#List all questions and points we have about the topic

----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point
# the idea
# the supporting evidence

----

Conclusion
# how the main points link together
# affirmation of thesis

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://people.redhat.com/drepper/nptl-design.pdf. Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# Interesting presentation: really worth checking out www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-11T16:03:46Z

Vviveka2: /* Facts We have */

== The Question ==
'''Original: '''
How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

'''Rannath: '''
The question seems to be about number and scalability of threads not the gross mechanics. I think we can safety limit ourselves to software-side only.

'''Paul: '''
Our topic contains 3 parts to it from what i see
# How is it possible for systems to supports millions of threads or more within a single process?
# What are the key design choices that make such systems work -
# and how do those choices affect the utility of such massively scalable thread implementations?
We need to find a way to split it between 5 people so everyone focuses primarily on one aspect.
If you guys don't mind Id like to discuss the format of our essay. [[Paul]]

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek - support.tamiltreasure@gmail.com

Henry Irving <sens.henry@gmail.com>

Paul Raubic <paul_raubic@hotmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: --[[User:Rannath|Rannath]]

Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

== Log ==
Please maintain a log of your activities in the Log Section. So that we can keep track of the evolution of the essay. --[[User:Gautam|Gautam]]

Moved around some info for clarity

everyone should post your interpretation of the question in simplest possible English so we`re on the same page (as someone, maybe me, seems to have the wrong idea about what we`re trying to talk about) --[[User:Rannath|Rannath]]

<Add your future activities here>

== Facts We have ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far we have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere
# least number (none if at all possible) of context switches per-thread - ''5''
# use of a "thread pool" - ''3''
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space. --[[User:Rannath|Rannath]]

Ok, since we are discussing design choices maybe we could also elaborate on the two major types of threads. Here, I already wrote a few lines, source can be found in citation section:

''Fibers (user mode threads) provide very quick and efficient switching because there is no need for a system call and kernel is oblivious to a switch - allows for millions of user mode threads. ISSUES: Blocking system calls disables all other fibers.
On the other hand managing threads through the kernel requires context switch (between user and kernel mode) on creation and removal of a thread therefore programs with prodigious number of threads would suffer huge performance hits.--[[User:Praubic|Praubic]] 18:05, 10 October 2010 (UTC)''

User-mode scheduling (UMS) is a light-weight mechanism that applications can use to schedule their own threads. The ability to switch between threads in user mode makes UMS more efficient than thread pools for short-duration work items that require few system calls. [[Paul]]

#time-division multiplexing
#threads vs processes
#I/O Scheduling

Design Decisions:
# it looks like one is done with a combination of N:N and N:M, where the N:N relationship reveals N false processors to the user-space so the user can deal with scheduling on their own. ''5''

Splitting this off because I don't think it's technically part of the answer 
Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]

----
Things that we '''need''' to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
(A)Design Decisions
1. Type of threading (1:1 1:N M:N)
2. Signal handling - we might be able to leave this out as it seems some "light weight" threads use no signals
3. Synchronisation
4. Memory Handling
5. Scheduling Priorities (context switching and how it affects the CPU threading process)[[Paul]]
----

----
Things we might want also to cover in the essay (non-essentials here): --[[User:Rannath|Rannath]] 04:43, 10 October 2010 (UTC) 
(A)Design Decisions
1. Brief History of threading
2. examples of attempts at getting absurd numbers of threads (failures)
3. other types of threading, including heavy weight and processes

(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Essay Outline ==

#Thesis is an answer to the question so... that's the first step.
#List all questions and points we have about the topic

----

Intro (fill in info)
# Thesis
# main topics

----

Body (made of many main points)

Main Point
# the idea
# the supporting evidence

----

Conclusion
# how the main points link together
# affirmation of thesis

== Sources ==

# Short history of threads in Linux and new implementation of them. [http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ] [[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)
# This paper discusses the design choices [http://people.redhat.com/drepper/nptl-design.pdf. Native POSIX Threads] [[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)
# lightweight threads vs kernel threads [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf PicoThreads: Lightweight Threads in Java] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# [http://eigenclass.org/hiki/lightweight-threads-with-lwt Eigenclass Comparing lightweight threads] --[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)
# A lightwight thread implementation for Unix [http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf Implementing light weight threads] --[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC) [[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC)
#Not in this group, but I thought that this paper was excellent: [http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf Qthreads: An API for Programming with Millions of Lightweight Threads]
# Difference between single and multi threading [http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process] [[vG]]
# [http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler] --[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)
# Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading] --[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)
# This site provides in-depth info about threads, threads-pooling, scheduling: http://msdn.microsoft.com/en-us/library/ms684841(VS.85).aspx [[Paul]]
# Here is another site that outlines THREAD designs and techniques: http://people.csail.mit.edu/rinard/osnotes/h2.html [[Paul]]
# Interesting presentation: really worth checking out www.cosc.brocku.ca/Offerings/4P13/slides/threads.ppt [[Paul]]
# KERNEL vs USERMODE http://www.wordiq.com/definition/Thread_(computer_science)--[[User:Praubic|Praubic]] 18:06, 10 October 2010 (UTC)

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-07T20:09:11Z

Vviveka2:

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek - support.tamiltreasure@gmail.com

Henry Irving <sens.henry@gmail.com>

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: -Rannath

== Essay Rough ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far I have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere --[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)
# least number (none if at all possible) of context switches per-thread - some linux implementation --[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)
# use of a "thread pool" - java picothreads article --[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)
#Multithreading generally occurs by time-division multiplexing. It makes it possible for the processor to switch between different threads but it happens so fast that the user sees it as it is running at the same time. [[User:vG]]
The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space.--[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)

----
Things that we need to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
(A)Design Decisions
1. Type of threading (1:1 M:N)
2. Signal handling
3. Synchronisation
4. Memory Handling
(B)Kernel
1. Program Thread manipulation through system calls --[[User:Hirving|Hirving]] 20:05, 7 October 2010 (UTC)

(C)Hardware --[[User:Hirving|Hirving]] 19:55, 7 October 2010 (UTC)
1. Simultaneous Multithreading
2. Multi-core processors

== Sources ==

A Webpage. However found it really interesting.
[http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ]
[[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)

This paper discusses the design choices [http://people.redhat.com/drepper/nptl-design.pdf. Native POSIX Threads]
[[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)

A paper with low-footprint(lightweight) threads vs kernel threads (for Java :( )
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf
--[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)

a comparison of lightweight threads
http://eigenclass.org/hiki/lightweight-threads-with-lwt
--[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)

a lightwight thread implementation for Unix
http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf
--[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC)

[[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC) Not in this group, but I thought that this paper was excellent: http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf

Difference between single and multi threading
http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process
[[vG]

[http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler]
--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)

Research Group working on Simultaneous Multithreading [http://www.cs.washington.edu/research/smt/ Simultaneous Multithreading]--[[User:Hirving|Hirving]] 19:58, 7 October 2010 (UTC)

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-07T19:39:06Z

Vviveka2:

== Group 7 ==

Let us start out by listing down our names and email id (preffered).

Gautam Akiwate <gautam.akiwate@gmail.com>

Patrick Young(rannath) <rannath@gmail.com>

vG Vivek - support.tamiltreasure@gmail.com

== Guidelines ==

Raw info should have some indication of where you got it for citation.

Claim your info so we don't need to dig for who got what when we need clarification.

Feel free to provide info for or edit someone else's info, just keep their signature so we can discuss changes

sign changes (once) preferably without time stamps Ex: -Rannath

== Essay Rough ==
Start by placing the info here so we can sort through it. I'm going to go into full research/essay writing mode on Sunday if there isn't enough here.

So far I have:
Three design choices I've seen:
# Smallest possible footprint per-thread (being extremely light weight) - from everywhere --[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)
# least number (none if at all possible) of context switches per-thread - some linux implementation --[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)
# use of a "thread pool" - java picothreads article --[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)

The idea is to reduce processor time and storage needed per-thread so you can have more in the same amount of space.--[[User:Rannath|Rannath]] 00:28, 7 October 2010 (UTC)

----
Things that we need to cover in the essay:--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC) 
(A)Design Decisions
1. Type of threading (1:1 M:N)
2. Signal handling
3. Synchronisation
4. Memory Handling
(B)Kernel (?)

This paper is really insightful. [http://sc.tamu.edu/whitepapers/altix/nptl-design.pdf Native POSIX Threads]
--[[User:Gautam|Gautam]] 19:35, 7 October 2010 (UTC)

== Sources ==

A Webpage. However found it really interesting.
[http://www.drdobbs.com/open-source/184406204;jsessionid=3MRSO5YMO1QVRQE1GHRSKHWATMY32JVN NPTL: The New Implementation of Threads for Linux ]
[[User:Gautam|Gautam]] 22:18, 5 October 2010 (UTC)

[http://hdl.handle.net/1853/6804 Implementation of Scalable Blocking Locks using an Adaptative Thread Scheduler]
[[User:Gautam|Gautam]] 22:11, 5 October 2010 (UTC)

A paper with low-footprint(lightweight) threads vs kernel threads (for Java :( )
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9043&rep=rep1&type=pdf
--[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)

a comparison of lightweight threads
http://eigenclass.org/hiki/lightweight-threads-with-lwt
--[[User:Rannath|Rannath]] 00:23, 6 October 2010 (UTC)

a lightwight thread implementation for Unix
http://www.usenix.org/publications/library/proceedings/sa92/stein.pdf
--[[User:Rannath|Rannath]] 00:49, 6 October 2010 (UTC)

[[User:Gbint|Gbint]] 19:50, 5 October 2010 (UTC) Not in this group, but I thought that this paper was excellent: http://www.sandia.gov/~rcmurph/doc/qt_paper.pdf

Difference between single and multi threading
http://wiki.answers.com/Q/Single_threaded_Process_and_Multi-threaded_Process
[[vG]

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-07T19:35:39Z

Vviveka2:

Talk:COMP 3000 Essay 1 2010 Question 7

2010-10-07T19:27:33Z

Vviveka2: