Operating Systems 2014F: Assignment 5

Please submit the answers to the following questions via CULearn by midnight on Thursday, October 16, 2014. There 10 points in 8 questions.

Submit your answers as a single text file named "<username>-comp3000-assign5.txt" (where username is your MyCarletonOne username). The first four lines of this file should be "COMP 3000 Assignment 5", your name, student number, and the date of submission. You may wish to format your answers in Markdown to improve their appearance.

No other formats will be accepted. Submitting in another format will likely result in your assignment not being graded and you receiving no marks for this assignment. In particular do not submit a zip file, MS Word, or OpenOffice file as your answers document!

Don't forget to include what outside resources you used to complete each of your answers, including other students, man pages, and web resources. You do not need to list help from the instructor, TA, or information found in the textbook.

Questions

[1] Some x86 processors have support for PAE (physical address extensions), which allows for the use of more than 4 GiB of memory with a 32-bit processor. On a system with PAE, what is the size of pointers used in regular C programs? Why?
[1] On a system that supports both paging and segmentation (such as the x86 in 32-bit protected mode), do segment base registers hold physical or virtual addresses? Explain why briefly.
[1] Do the stack and data segments on the x86 (as referred to by %esp and %ebp) ever overlap? Explain briefly.
[1] If we have a system where virtual address 0x52D2C3A3 mapped to physical address 0x13A103A3, what is the largest page size that could be used for this mapping? Why?
[1] What is the xchg x86 instruction do? Why is it important for creating concurrent programs?
[1] Peterson's algorithm does not reliably work on modern machines. Explain why briefly by making reference to the memory hierarchy of modern systems (e.g., relationship between registers, caches, and main memory/RAM).
[1] Spinlocks cause a program to loop infinitely until a resource becomes available. High performance multi-threaded code often uses spinlocks rather than locks that yield the processor when waiting for the lock to become available. When are spinlocks preferable? Why?
[3] Create dotprod_lockfree.c by modifying dotprod_mutex.c (From this LLNL tutorial) so that it does not do any locking using mutexes or any other explicit locking mechanism. In other words, the program should still be multithreaded; the threads should be able to run without any race conditions corrupting the computation but without grabbing and releasing any locks (explicitly). Explain why your modifications are correct. (You may want to refer to the serial version to help you understand what the program is supposed to do.)

Solutions

On 32-bit processors with PAE, pointers for regular C programs (running in userspace) are 32 bits because the size of the virtual address space is the same as before. PAE extends just the physical address space. The advantage of this is only the kernel needs to be modified to use the additional RAM. The disadvantage is all regular programs can address at most 4 GiB of memory.
While there are hybrid segmentation/paging schemes where segment registers hold physical addresses, in most systems (including the x86 is full 32-bit and 64-bit modes with virtual memory enabled), segment registers hold virtual addresses. They have to because segments must consist of contiguous memory, and in a paging system memory a process's memory, is, by design, not contiguous - it uses whatever physical memory (frames) that are handy. These frames are stitched together by the virtual memory hardware into the contiguous regions needed to support segments. For more information on segments on x86, see the Wikipedia article on x86 segmentation.
The stack segment register SS and the data segment register DS can overlap. In fact, in modern x86 programming the segment registers - including SS and DS - are set to the entire memory address range: their base is 0 and their bound it the top of memory (2^32-1 or 2^64-1), thus effectively making them do nothing. NOTE: The parenthetical clarification is WRONG. It makes no sense. %esp and %ebp are registers for the current frame. They have nothing to do with segmentation. See the Wikipedia article on the call stack.
The largest page size would be 16K because the 14 low order bits are are the same, and 2^14 = 16,384 = 16K. The low order bits make up the page offset which is always the same between a virtual address and its mapping to a physical address.
The xchg x86 instruction swaps two values. If both values are stored in memory, this swap is atomic. xchg is very useful for implementing correct concurrency mechanisms such as mutexes because the swap is guaranteed to be atomic.
Peterson's algorithm assumes that writes to main memory by one processor are immediately visible to other processors. On modern systems, however, writes are first made to caches (L1, L2, L3) before they are propagated to RAM. While RAM is updated continuously, there is a small window of time when values in memory are stale relative to those in cache. When those caches are processor/core specific (as L1 and L2 generally are), there is a window of opportunity for inconsistency that causes Peterson's algorithm to be incorrect.
High performance multitheaded code will sometimes use spinlocks when the expected time of contention - the time in the critical section - is smaller than the amount of time required to put a thread to sleep and wake it up. Switching contexts, changing process states - steps such as these take time to perform. That time is overhead that can reduce scalability for locks that are held only very briefly.