Operating Systems 2014F: Tutorial 5

In this tutorial you will be going through homework exercises from the textbook. Be sure to look at the associated chapter if you have questions.

Note there are a lot of material here. Do not expect to do all of it in detail in tutorial; instead, try to do some of everything so you can see where have significant conceptual gaps. On your own you should do these exercises in more detail to make sure you understand the associated concepts.

Relocation

Use the program relocation.py to answer the following questions from the Address Translation chapter. There is also a video available.

Run with seeds 1, 2, and 3, and compute whether each virtual address generated by the process is in or out of bounds. If in bounds,

compute the translation.

Run with these flags:

-s 0 -n 10

What value do you have set -l (the bounds register) to in order to ensure that all the generated virtual addresses are within bounds?

Run with these flags:

-s 1 -n 10 -l 100

What is the maximum value that bounds can be set to, such that the address space still fits into physical memory in its entirety?

Run some of the same problems above, but with larger address spaces (-a) and physical memories (-p).
What fraction of randomly-generated virtual addresses are valid, as a function of the value of the bounds register? Make a graph from running with different random seeds, with limit values ranging from 0 up to the maximum size of the address space.

Segmentation

Use the program segmentation.py to answer the following questions from the Segmentation chapter.

First let’s use a tiny address space to translate some addresses. Here’s a simple set of parameters with a few different random seeds; can you translate the addresses?

segmentation.py -a 128 -p 512 -b 0 -l 20 -B 512 -L 20 -s 0
segmentation.py -a 128 -p 512 -b 0 -l 20 -B 512 -L 20 -s 1
segmentation.py -a 128 -p 512 -b 0 -l 20 -B 512 -L 20 -s 2

Now, let’s see if we understand this tiny address space we’ve constructed (using the parameters from the question above). What is

the highest legal virtual address in segment 0? What about the lowest legal virtual address in segment 1? What are the lowest and highest illegal addresses in this entire address space? Finally, how would you run segmentation.py with the -A flag to test if you are right?

Let’s say we have a tiny 16-byte address space in a 128-byte physical memory. What base and bounds would you set up so as to get the simulator to generate the following translation results for the specified address stream: valid, valid, violation, ..., violation, valid, valid? Assume the following parameters:

segmentation.py -a 16 -p 128 -A 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 --b0 ? --l0 ? --b1 ? --l1 ?

Assuming we want to generate a problem where roughly 90% of the

randomly-generated virtual addresses are valid (i.e., not segmentation violations). How should you configure the simulator to do so? Which parameters are important?

Can you run the simulator such that no virtual addresses are valid? How?

Free Space

Use the program malloc.py to answer the following questions from the Free-Space Management chapter.

First run with the flags -n 10 -H 0 -P BEST -s 0 to generate a few random allocations and frees. Can you predict what alloc()/free() will return? Can you guess the state of the free list after each request? What do you notice about the free list over time?
How are the results different when using a WORST fit policy to search the free list (-p WORST)? What changes?
What about when using FIRST fit (-p FIRST)? What speeds up when you use first fit?
For the above questions, how the list is kept ordered can affect the time it takes to find a free location for some of the policies. Use the different free list orderings (-l ADDRSORT, -l SIZESORT+, -l SIZESORT-) to see how the policies and the list orderings interact.
Coalescing of a free list can be quite important. Increase the number of random allocations (say to -n 1000). What happens to larger allocation requests over time? Run with and without coalescing (i.e., without and with the -C flag). What differences in outcome do you see? How big is the free list over time in each case? Does the ordering of the list matter in this case?
What happens when you change the percent allocated fraction -P to higher than 50? What happens to allocations as it nears 100? What about as it nears 0?
What kind of specific requests can you make to generate a highly-fragmented free space? Use the -A flag to create fragmented free lists, and see how different policies and options change the organization of the free list.

Paging

Use the program paging-linear-translate.py to answer the following questions from the Paging: Introduction chapter.

Before doing any translations, let’s use the simulator to study how linear page tables change size given different parameters. Compute the size of linear page tables as different parameters change. Some suggested inputs are below; by using the -v flag, you can see how many page-table entries are filled.
First, to understand how linear page table size changes as the address space grows:

paging-linear-translate.py -P 1k -a 1m -p 512m -v -n 0
paging-linear-translate.py -P 1k -a 2m -p 512m -v -n 0
paging-linear-translate.py -P 1k -a 4m -p 512m -v -n 0

Then, to understand how linear page table size changes as page size grows:

paging-linear-translate.py -P 1k -a 1m -p 512m -v -n 0
paging-linear-translate.py -P 2k -a 1m -p 512m -v -n 0
paging-linear-translate.py -P 4k -a 1m -p 512m -v -n 0

Before running any of these, try to think about the expected trends. How should page-table size change as the address space grows? As the page size grows? Why shouldn’t we just use really big pages in general?

Now let’s do some translations. Start with some small examples, and change the number of pages that are allocated to the address space with the -u flag. For example:

paging-linear-translate.py -P 1k -a 16k -p 32k -v -u 0
paging-linear-translate.py -P 1k -a 16k -p 32k -v -u 25
paging-linear-translate.py -P 1k -a 16k -p 32k -v -u 50
paging-linear-translate.py -P 1k -a 16k -p 32k -v -u 75
paging-linear-translate.py -P 1k -a 16k -p 32k -v -u 100

What happens as you increase the percentage of pages that are allocated in each address space?

Now let’s try some different random seeds, and some different (and sometimes quite crazy) address-space parameters, for variety:

paging-linear-translate.py -P 8 -a 32 -p 1024 -v -s 1
paging-linear-translate.py -P 8k -a 32k -p 1m -v -s 2
paging-linear-translate.py -P 1m -a 256m -p 512m -v -s 3

Which of these parameter combinations are unrealistic? Why?

Use the program to try out some other problems. Can you find the limits of where the program doesn’t work anymore? For example, what happens if the address-space size is bigger than physical memory?

Threads (Intro)

Use the program x86.py (v1) to answer the following questions from the Concurrency: An Introduction chapter.

To start, let’s examine a simple program, “loop.s”. First, just look at the program, and see if you can understand it: cat loop.s. Then, run it with these arguments:

./x86.py -p loop.s -t 1 -i 100 -R dx

This specifies a single thread, an interrupt every 100 instructions, and tracing of register %dx. Can you figure out what the value of %dx will be during the run? Once you have, run the same above and use the -c flag to check your answers; note the answers, on the left, show the value of the register (or memory value) after the instruction on the right has run.

Now run the same code but with these flags:

./x86.py -p loop.s -t 2 -i 100 -a dx=3,dx=3 -R dx

This specifies two threads, and initializes each %dx register to 3. What values will %dx see? Run with the -c flag to see the answers. Does the presence of multiple threads affect anything about your calculations? Is there a race condition in this code?

Now run the following:

./x86.py -p loop.s -t 2 -i 3 -r -a dx=3,dx=3 -R dx

This makes the interrupt interval quite small and random; use different seeds with -s to see different interleavings. Does the frequency of interruption change anything about this program?

Next we’ll examine a different program (looping-race-nolock.s). This program accesses a shared variable located at memory address 2000; we’ll call this variable x for simplicity. Run it with a single thread and make sure you understand what it does, like this:

./x86.py -p looping-race-nolock.s -t 1 -M 2000

What value is found in x (i.e., at memory address 2000) throughout the run? Use -c to check your answer.

Now run with multiple iterations and threads:

./x86.py -p looping-race-nolock.s -t 2 -a bx=3 -M 2000

Do you understand why the code in each thread loops three times? What will the final value of x be?

Now run with random interrupt intervals:

./x86.py -p looping-race-nolock.s -t 2 -M 2000 -i 4 -r -s 0

Then change the random seed, setting -s 1, then -s 2, etc. Can you tell, just by looking at the thread interleaving, what the final value of x will be? Does the exact location of the interrupt matter? Where can it safely occur? Where does an interrupt cause trouble? In other words, where is the critical section exactly?

Now use a fixed interrupt interval to explore the program further. Run:

./x86.py -p looping-race-nolock.s -a bx=1 -t 2 -M 2000 -i 1

See if you can guess what the final value of the shared variable x will be. What about when you change -i 2, -i 3, etc.? For which interrupt intervals does the program give the “correct” final answer?

Now run the same code for more loops (e.g., set -a bx=100). What interrupt intervals, set with the -i flag, lead to a “correct” outcome? Which intervals lead to surprising results?
We’ll examine one last programin this homework (wait-for-me.s). Run the code like this:

./x86.py -p wait-for-me.s -a ax=1,ax=0 -R ax -M 2000

This sets the %ax register to 1 for thread 0, and 0 for thread 1, and watches the value of %ax and memory location 2000 throughout the run. How should the code behave? How is the value at location 2000 being used by the threads? What will its final value be?

Now switch the inputs:

./x86.py -p wait-for-me.s -a ax=0,ax=1 -R ax -M 2000

How do the threads behave? What is thread 0 doing? How would changing the interrupt interval (e.g., -i 1000, or perhaps to use random intervals) change the trace outcome? Is the program efficiently using the CPU?

Locks

Use the program x86.py (v2) to answer the following questions from the Locks chapter.

First let’s get ready to run x86.py with the flag -p flag.s. This code “implements” locking with a single memory flag. Can you understand what the assembly code is trying to do?
When you run with the defaults, does flag.s work as expected? Does it produce the correct result? Use the -M and -R flags to trace variables and registers (and turn on -c to see their values). Can you predict what value will end up in flag as the code runs?
Change the value of the register %bx with the -a flag (e.g., -a bx=2,bx=2 if you are running just two threads). What does the code do? How does it change your answer for the question above?
Set bx to a high value for each thread, and then use the -i flag to generate different interrupt frequencies; what values lead to bad outcomes? Which lead to good outcomes?
Now let’s look at the program test-and-set.s. First, try to understand the code, which uses the xchg instruction to build a simple locking primitive. How is the lock acquire written? How about lock release?
Now run the code, changing the value of the interrupt interval (-i) again, and making sure to loop for a number of times. Does the code always work as expected? Does it sometimes lead to an inefficient use of the CPU? How could you quantify that?
Use the -P flag to generate specific tests of the locking code. For example, run a schedule that grabs the lock in the first thread, but then tries to acquire it in the second. Does the right thing happen? What else should you test?
Now let’s look at the code in peterson.s, which implements Peterson’s algorithm (mentioned in a sidebar in the text). Study the code and see if you can make sense of it.
Now run the code with different values of -i. What kinds of different behavior do you see?
Can you control the scheduling (with the -P flag) to “prove” that the code works? What are the different cases you should show hold? Think about mutual exclusion and deadlock avoidance.
Now study the code for the ticket lock in ticket.s. Does it match the code in the chapter?
Now run the code, with the following flags: -a bx=1000,bx=1000 (this flag sets each thread to loop through the critical 1000 times). Watch what happens over time; do the threads spend much time spinning waiting for the lock?
How does the code behave as you add more threads?
Now examine yield.s , in which we pretend that a yield instruction enables one thread to yield control of the CPU to another (realistically, this would be an OS primitive, but for the simplicity of simulation, we assume there is an instruction that does the t ask). Find a scenario where test-and-set.s wastes cycles spinning, but yield.s does not. How many instructions are saved? In what scenarios do these savings arise?
Finally, examine test-and-test-and-set.s. What does this lock do? What kind of savings does it introduce as compared to test-and-set.s?