COMP3000 Operating Systems W22: Tutorial 2

In this tutorial, you will revisit the lifecyle of a program, from source code, to an executable (binary image), and further to being loaded into the address space. Then from a different angle, you can see when in execution, how the program (now a process) makes different types of calls to function, and how its memory is laid out.

Tutorials are graded based on participation and effort (so no need to try to have the “correct” answers — what matters is the process), but you should still turn in your work. Submit your answers on Brightspace as a single text file named "<username>-comp3000-t2.txt" (where username is your MyCarletonOne username). The first four lines of this file should be "COMP 3000 Tutorial 2", your name, student number, and the date of submission.

The deadline is usually four days after the tutorial date (see the actual deadline on the submission entry). Note that the submission entry is enforced by the system, so you may fail to get the effort marks even if it is one minute past the deadline.

You should also check in with your assigned TA online (by responding to the poll in the Teams channel tutorials-public or the private channel). Your TA will be your first point of contact when you have questions or encounter any issues during the tutorial session.

You get 1.5 marks for submitting answers that shows your effort and 0.5 for checking in, making this tutorial worth 2 points total.

Building Your Program

Assuming you use a compiled programming language like C, you will involve the following steps (implicitly) to build your program:

Compile source (C) code into assembly code (.s files).
You can see it using gcc -S -O2 hello.c.
Assemble assembly code into machine code placed in object code files (.o files).
To avoid the .o file being deleted automatically, you can do the compilation alone without linking by: gcc -c -O2 hello.c.
Link object code files together to create a runnable binary image (ELF files, no extension name).
This is the way you did it in Tutorial 1.

Talking to the OS Kernel via System Calls

A process on its own has limited access to the system. It cannot directly access any external devices or data sources (e.g., files, keyboard, the screen, networks) on its own. To access these external resources, to allocate memory, or otherwise change its runtime environment, it must make system calls. So we can say system calls are one way for the contained process to request OS services. Note that system calls run code outside of a process and thus cannot be called like regular function calls. The standard C library provides function wrappers for most commonly-used system calls so they can be accessed like regular C functions. Under the hood, however, these functions make use of special compiler directives in order to generate the machine code necessary to invoke system calls. We will take a closer look later this term.

You can see the system calls produced by a process using the strace command.

Keep in mind that this can be a good example to show the relationship between root/non-root users and kernel/user modes. System calls are requesting the kernel to do something in the kernel mode on behalf of the calling process. Therefore, any user (even non-root) process making a system call will at certain point run in the kernel mode.

Static & Dynamic Libraries

Because of abstraction (e.g., for portability or ease of development), most applications are not self-contained. For instance, you call just a simple printf() to output a message instead of drawing pixels on a display using a lot of low-level code. Hence, your program relies on lots of external code. In compiled languages such as C, external code can be brought into the process through linking. There are two basic types of linking, static and dynamic linking:

With static linking, code is brought in at compile time (specifically, in the link stage above) and added to the executable. The code is now the same as other application code. (Static libraries are more .o files like hello.o)
With dynamic linking, a reference to the library code is added to the binary. The actual library code has to later be loaded when the program is executed. This loading will happen before main() is called.

The dynamic libraries associated with a program binary can be found using the ldd command. You can use ltrace to see calls to functions that are dynamically linked.

Tasks/Questions

Part A

Here you may need some basic knowledge of the assembly language. For now, you do not need to read and understand each line (and it will not be the focus of this course). In general, such assembly code is divided into three colums: starting at position/character 1 will be labels/symbols (which you can refer to), followed by mnemonics (instruction opcodes) and last corresponding arguments (operands). You can turn to related documentation/manuals like: x86 Assembly/GNU assembly syntax

Looking at the .s file produced from gcc -S -O2 hello.c, do you see anything familiar and discussed in last week's lecture?
Where (e.g., at which line) is it supposed to call that printf()? If it is not doing that, why? (you can just mention what you think/believe)
Generate the .s file for syscall-hello.c as well. Observing the two .s files, what do you think is the key difference?
You can play around by trying to see: how a function in C is reflected in assembly; how to return from a function; how variables/literals are represented; what are symbols (refer to commands like nm); etc.

Part B

By default, gcc-produced binaries are dynamically linked (so at runtime they will require dynamic libraries to be present on the system). To compile a binary that is statically linked (so it has no external runtime library dependencies), instead do this:
gcc -O2 -static hello.c -o hello.static

Then compile a dynamically linked version:

gcc -O2 -z lazy hello.c -o hello.dynamic

How does the size of hello.dynamic compare with that of hello.static? Why?
Compile the other flavor of hello.c: syscall-hello.c. Build one static version and one dynamic version as well.
Now considering hello.dynamic, hello.static, syscall-hello.dynamic and syscall-hello.static, see what system calls each program produces by running strace -o sys-somename.log ./somename.static (or ./somename.dynamic). Which version generates more system calls? Why?

Note: system calls are saved in the log file sys-somename.log. Feel free to save them in a different file for each version.
See what library calls each program produces by running ltrace -o lib-somename.log ./somename.dynamic (or ./somename.static). Which version generates more library calls? Why?
Remember when building hello in Tutorial 1 you did not use -z lazy. Comparing hello with hello.dynamic, any difference? Why this difference?
Use ldd to find out what dynamic library dependencies dynamic versions have. What about static versions?

Part C

Compile and run 3000memview.c, then consider the following questions (if they turn out to be difficult, document your exploration and get some thinking).

Why are the addresses inconsistent between runs?
Roughly where does the stack seem to be? The heap? Code? Global variables? (hints: recall the memory image layout of a process discussed in the lecture and you can search for a more detailed one somewhere; local variables go to the stack; initialized data and global variables go to the data segment, data allocated at runtime go to the heap.)
Change each malloc() call to allocate more than 128K. What happens to the values of sbrk? Why? (Hint: use strace)