Question

How is it possible for systems to supports millions of threads or more within a single process? What are the key design choices that make such systems work - and how do those choices affect the utility of such massively scalable thread implementations?

Answer

A thread is an independent task that executes in the same address space as other threads within a single process while sharing data synchronously. Threads require less system resources then concurrent cooperating processes and start much easier therefore there may exist millions of them in a single process. The two major types of threads are kernel and user-mode. Kernel threads are usually considered more heavy and designs that involve them are not very scalable User threads on the other hand are mapped to kernel threads by the threads library such as libpthreads. and there are a few designs that incorporate it mainly Fibers and UMS (User Mode Scheudling) which allow for very high scalability. UMS threads have their own context and resources however the ability to switch in the user mode makes them more efficient (depending on application) than Thread Pools which are yet another mechanism that allows for high scalability.
Taken the liberty to add Praubic's tentative first para. No changes made as of yet.

Design Choices

(A) Kernel Threads and User Threads (1:1 vs M:N)
This is the most basic design choice. The 1:1 boasts of a slim clean library interface on top of the kernel functions. On the other hand the M:N would implement a complicated library though it would offer advantages in areas of signal handling. However a general consensus was that the M:N design was not compatible with the Linux kernel. Also the cost to implement the same would have been to high. Hence the 1:1 model was decided upon.
(B)Signal Handling
The kernel implements the POSIX signal handling. The kernel has to handle the multitude of signal masks. Since the signal will only be sent to a thread if it is unblocked no unnecessary interruptions through signals occur. The kernel is also in a much better situation to judge which is the best thread to receive the signal. Obviously this helps only if the 1-on-1 model is used.
(C)Synchronization
The implementation of the synchronization primitives such as mutexes, read-write locks, conditional variables, semaphores, and barriers requires some form of kernel support. Busy waiting is not an option since threads can have different priorities (beside wasting CPU cycles). The same argument rules out the exclusive use of sched yield. Signals were the only viable solution for the old implementation. Threads would block in the kernel until woken by a signal. This method has severe drawbacks in terms of speed and reliability caused by spurious wakeups and derogation of the quality of the signal handling in the application.Fortunately some new functionality was added to the kernel to implement all kinds of synchronization.
(D)Memory Allocation
One of the goals for the library is to have low startup costs for threads so that scalability is possible.The biggest problem time-wise outside the kernel is the memory needed for the thread data structures, thread-local storage, and the stack. This is to be done by optimizing the memory allocation for the threads.
(E)Scheduling Priorities
<Paul:Please fill this section out. -Gautam>
Consider it done around 1 or 2pm Wednesday

COMP 3000 Essay 1 2010 Question 7

Question

Answer

Design Choices

References