COMP3000 Operating Systems F23: Tutorial 2: Difference between revisions
(4 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
In this tutorial, you will revisit the lifecyle of a program, from source code, to an executable (binary image), and further to being loaded into the address space. Then from a different angle, you can see when in execution, how the program (now a process) makes different types of calls to function, and how its memory is laid out. | In this tutorial, you will revisit the lifecyle of a program, from source code, to an executable (binary image), and further to being loaded into the address space. Then from a different angle, you can see when in execution, how the program (now a process) makes different types of calls to function, and how its memory is laid out. | ||
==Building Your Program== | ==Building Your Program== | ||
Line 47: | Line 30: | ||
Here you may need some basic knowledge of the assembly language. For now, you do not need to read and understand each line (and it will not be the focus of this course). | Here you may need some basic knowledge of the assembly language. For now, you do not need to read and understand each line (and it will not be the focus of this course). | ||
In general, such assembly code is divided into three colums: starting at position/character 1 will be labels/symbols (which you can refer to), followed by mnemonics (instruction opcodes) and last corresponding arguments (operands). | In general, such assembly code is divided into three colums: starting at position/character 1 will be labels/symbols (which you can refer to), followed by mnemonics (instruction opcodes) and last corresponding arguments (operands). | ||
You can | You can check out related documentation, like: [https://en.wikibooks.org/wiki/X86_Assembly/GNU_assembly_syntax x86 Assembly/GNU assembly syntax] | ||
# Looking at the <tt>.s</tt> file produced from <tt>gcc -S -O2 hello.c</tt>, do you see anything familiar and discussed in the lectures of our first | # Looking at the <tt>.s</tt> file produced from <tt>gcc -S -O2 hello.c</tt>, do you see anything familiar and discussed in the lectures of our first weeks? | ||
# Where (e.g., at which line) is it supposed to call that <tt>printf()</tt>? If it is not doing that, <tt>why?</tt> (you can just mention what you think/believe) | # Where (e.g., at which line) is it supposed to call that <tt>printf()</tt>? If it is not doing that, <tt>why?</tt> (you can just mention what you think/believe) | ||
# Generate the .s file for [https://people.scs.carleton.ca/~abdou/comp3000/f23/tut2/syscall-hello.c syscall-hello.c] as well. Observing the two .s files, what do you think is the key difference? | # Generate the .s file for [https://people.scs.carleton.ca/~abdou/comp3000/f23/tut2/syscall-hello.c syscall-hello.c] as well. Observing the two .s files, what do you think is the key difference? | ||
Line 59: | Line 42: | ||
#: <tt>gcc -O2 -z lazy hello.c -o hello.dynamic</tt> | #: <tt>gcc -O2 -z lazy hello.c -o hello.dynamic</tt> | ||
#:How does the size of hello.dynamic compare with that of hello.static? <u>Why?</u> | #:How does the size of hello.dynamic compare with that of hello.static? <u>Why?</u> | ||
# Compile the other flavor of hello.c: [https://people.scs.carleton.ca/~ | # Compile the other flavor of hello.c: [https://people.scs.carleton.ca/~abdou/comp3000/f23/tut2/syscall-hello.c syscall-hello.c]. Build one static version and one dynamic version as well. | ||
#: Now considering hello.dynamic, hello.static, syscall-hello.dynamic and syscall-hello.static, see what system calls each program produces by running <tt>strace -o sys-somename.log ./somename.static</tt> (or <tt>./somename.dynamic</tt>). Which version generates more system calls? <u>Why?</u> | #: Now considering hello.dynamic, hello.static, syscall-hello.dynamic and syscall-hello.static, see what system calls each program produces by running <tt>strace -o sys-somename.log ./somename.static</tt> (or <tt>./somename.dynamic</tt>). Which version generates more system calls? <u>Why?</u> | ||
#: Note: system calls are saved in the log file <tt>sys-somename.log</tt>. Feel free to save them in a different file for each version. | #: Note: system calls are saved in the log file <tt>sys-somename.log</tt>. Feel free to save them in a different file for each version. | ||
# See what library calls each program produces by running <tt>ltrace -o lib-somename.log ./somename.dynamic</tt> (or <tt>./somename.static</tt>). Which version generates more library calls? <u>Why?</u> | # See what library calls each program produces by running <tt>ltrace -o lib-somename.log ./somename.dynamic</tt> (or <tt>./somename.static</tt>). Which version generates more library calls? <u>Why?</u> | ||
# Remember when building hello in Tutorial 1 you did not use <tt>-z lazy</tt>. Comparing <tt>hello</tt> with <tt>hello.dynamic</tt>, any difference? <u>Why this difference?</u> | # Remember when building hello in Tutorial 1 you did not use <tt>-z lazy</tt>. Comparing <tt>hello</tt> with <tt>hello.dynamic</tt>, any difference? <u>Why this difference?</u> | ||
# Use <tt>ldd</tt> to find out what dynamic library dependencies dynamic versions have. What about static versions? | # Use <tt>ldd</tt> to find out what dynamic library dependencies dynamic versions have. What about static versions? | ||
===Part C=== | ===Part C=== | ||
Compile and run [https://people.scs.carleton.ca/~ | Compile and run [https://people.scs.carleton.ca/~abdou/comp3000/f23/tut2/3000memview.c 3000memview.c], then consider the following questions (if they turn out to be difficult, document your exploration and get some thinking). | ||
# Why are the addresses inconsistent between runs? | # Why are the addresses inconsistent between runs? |
Latest revision as of 15:49, 27 September 2023
In this tutorial, you will revisit the lifecyle of a program, from source code, to an executable (binary image), and further to being loaded into the address space. Then from a different angle, you can see when in execution, how the program (now a process) makes different types of calls to function, and how its memory is laid out.
Building Your Program
Assuming you use a compiled programming language like C, you will involve the following steps (implicitly) to build your program:
- Compile source (C) code into assembly code (.s files).
- You can see it using gcc -S -O2 hello.c.
- Assemble assembly code into machine code placed in object code files (.o files).
- To avoid the .o file being deleted automatically, you can do the compilation alone without linking by: gcc -c -O2 hello.c.
- Link object code files together to create a runnable binary image (ELF files, no extension name).
- This is the way you did it in Tutorial 1.
Talking to the OS Kernel via System Calls
A process on its own has limited access to the system. It cannot directly access any external devices or data sources (e.g., files, keyboard, the screen, networks) on its own. To access these external resources, to allocate memory, or otherwise change its runtime environment, it must make system calls. So we can say system calls are one way for the contained process to request OS services. Note that system calls run code outside of a process and thus cannot be called like regular function calls. The standard C library provides function wrappers for most commonly-used system calls so they can be accessed like regular C functions. Under the hood, however, these functions make use of special compiler directives in order to generate the machine code necessary to invoke system calls. We will take a closer look later this term.
You can see the system calls produced by a process using the strace command.
Keep in mind that this can be a good example to show the orthogonal relationship between root/non-root users and kernel/user modes. System calls are requesting the kernel to do something in the kernel mode on behalf of the calling process. Therefore, any user (even non-root) process making a system call will at certain point run in the kernel mode.
Static & Dynamic Libraries
Because of abstraction (e.g., for portability or ease of development), most applications are not self-contained. For instance, you call just a simple printf() to output a message instead of drawing pixels on a display using a lot of low-level code. Hence, your program relies on lots of external code. In compiled languages such as C, external code can be brought into the process through linking. There are two basic types of linking, static and dynamic linking:
- With static linking, code is brought in at compile time (specifically, in the link stage above) and added to the executable. The code is now the same as other application code. (Static libraries are more .o files like hello.o; in practice, you may see more often .a files, which are "archives" of multiple .o files.)
- With dynamic linking, a reference to the library code is added to the binary. The actual library code has to later be loaded when the program is executed. This loading will happen before main() is called (you will notice a lot of noisy system calls at the beginning to achieve this). Then, at runtime, calling functions in such dynamic libraries will take the form of library calls (which only happens within this process in user space, as opposed to system calls in kernel space).
The dynamic libraries associated with a program binary can be found using the ldd command. You can use ltrace to see calls to functions that are dynamically linked.
Tasks/Questions
Part A
Here you may need some basic knowledge of the assembly language. For now, you do not need to read and understand each line (and it will not be the focus of this course). In general, such assembly code is divided into three colums: starting at position/character 1 will be labels/symbols (which you can refer to), followed by mnemonics (instruction opcodes) and last corresponding arguments (operands). You can check out related documentation, like: x86 Assembly/GNU assembly syntax
- Looking at the .s file produced from gcc -S -O2 hello.c, do you see anything familiar and discussed in the lectures of our first weeks?
- Where (e.g., at which line) is it supposed to call that printf()? If it is not doing that, why? (you can just mention what you think/believe)
- Generate the .s file for syscall-hello.c as well. Observing the two .s files, what do you think is the key difference?
- You can play around by trying to see: how a function in C is reflected in assembly; how to return from a function; how variables/literals are represented; what are symbols (refer to commands like nm); etc.
Part B
- By default, gcc-produced binaries are dynamically linked (so at runtime they will require dynamic libraries to be present on the system). To compile a binary that is statically linked (so it has no external runtime library dependencies), instead do this:
- gcc -O2 -static hello.c -o hello.static
- Then compile a dynamically linked version:
- gcc -O2 -z lazy hello.c -o hello.dynamic
- How does the size of hello.dynamic compare with that of hello.static? Why?
- Compile the other flavor of hello.c: syscall-hello.c. Build one static version and one dynamic version as well.
- Now considering hello.dynamic, hello.static, syscall-hello.dynamic and syscall-hello.static, see what system calls each program produces by running strace -o sys-somename.log ./somename.static (or ./somename.dynamic). Which version generates more system calls? Why?
- Note: system calls are saved in the log file sys-somename.log. Feel free to save them in a different file for each version.
- See what library calls each program produces by running ltrace -o lib-somename.log ./somename.dynamic (or ./somename.static). Which version generates more library calls? Why?
- Remember when building hello in Tutorial 1 you did not use -z lazy. Comparing hello with hello.dynamic, any difference? Why this difference?
- Use ldd to find out what dynamic library dependencies dynamic versions have. What about static versions?
Part C
Compile and run 3000memview.c, then consider the following questions (if they turn out to be difficult, document your exploration and get some thinking).
- Why are the addresses inconsistent between runs?
- Roughly where does the stack seem to be? The heap? Code? Global variables? (hints: recall the memory image layout of a process discussed in the lecture and you can search for a more detailed one somewhere; local variables go to the stack; initialized data and global variables go to the data segment, data allocated at runtime go to the heap.)
- Change each malloc() call to allocate more than 128K. What happens to the values of sbrk? Why? (Hint: use strace)