Operating Systems 2021F: Tutorial 2: Difference between revisions

From Soma-notes
No edit summary
No edit summary
Line 7: Line 7:
==Background==
==Background==


A process on UNIX-like systems are  
A process on UNIX-like systems are separated from each other: they run in their own address space - pointers can only refer to code and data in that program, not in other programs.  Because programs are separated, they can't access anything external to them without help.
 
Here we're going to talk about dynamic libraries and system calls, two ways programs gain access to additional functionality.  Dynamic libraries are for code that is loaded into a process at runtime, while system calls allow for code in other processes or the OS kernel to be accessed.


===Dynamic Libraries===
===Dynamic Libraries===
Line 14: Line 16:


The dynamic libraries associated with a program binary can be found using the <tt>ldd</tt> command.  You can use <tt>ltrace</tt> to see calls to functions that are dynamically linked.
The dynamic libraries associated with a program binary can be found using the <tt>ldd</tt> command.  You can use <tt>ltrace</tt> to see calls to functions that are dynamically linked.
Note that code in dynamic libraries runs inside of the process loading the code; thus, dynamic library code has the same privileges as other application code.  It can do everything regular application code can do (it can access all of your code and data), but it can do no more than your code can do (it has the same restrictions on accessing system resources).


===System Calls===
===System Calls===
Line 20: Line 24:


You can see the system calls produced by a process using the <tt>strace</tt> command.
You can see the system calls produced by a process using the <tt>strace</tt> command.
In general, the code you call through a system call has more privileges than regular application code.  This is because a system call is a request to the operating system kernel to do something on behalf of the process, and the kernel has full privileges to the system.  (Indeed, it is the part of the system that implements the process abstraction.)  We're going to talk a lot about system calls this semester, this is just your introduction to the concept.


==Tasks==
==Tasks==

Revision as of 13:28, 20 September 2021

This tutorial is still in development.

In this tutorial we're going to look at how processes work at a low level: how they make system calls & library calls, how C and assembly compare, and and how memory is laid out.

If you haven't already, please set up and use a VM on openstack for this work.

Background

A process on UNIX-like systems are separated from each other: they run in their own address space - pointers can only refer to code and data in that program, not in other programs. Because programs are separated, they can't access anything external to them without help.

Here we're going to talk about dynamic libraries and system calls, two ways programs gain access to additional functionality. Dynamic libraries are for code that is loaded into a process at runtime, while system calls allow for code in other processes or the OS kernel to be accessed.

Dynamic Libraries

Most applications on the system do not contain all the code that they need right within the executable. Instead, dynamic libraries are loaded into the program address space when the program loads. As an example, the standard C library, which contains such functions as printf, is loaded in at run-time. Typically dynamic libraries are stored in /lib or /usr/lib.

The dynamic libraries associated with a program binary can be found using the ldd command. You can use ltrace to see calls to functions that are dynamically linked.

Note that code in dynamic libraries runs inside of the process loading the code; thus, dynamic library code has the same privileges as other application code. It can do everything regular application code can do (it can access all of your code and data), but it can do no more than your code can do (it has the same restrictions on accessing system resources).

System Calls

A process on its own has limited access to the system. It cannot directly access any external devices or data sources (e.g., files, keyboard, the screen, networks) on its own. To access these external resources, to allocate memory, or otherwise change its runtime environment, it must make system calls. Note that system calls run code outside of a process and thus cannot be called like regular function calls. The standard C library provides function wrappers for most commonly-used system calls so they can be accessed like regular C functions. Under the hood, however, these functions make use of special compiler directives in order to generate the machine code necessary to invoke system calls.

You can see the system calls produced by a process using the strace command.

In general, the code you call through a system call has more privileges than regular application code. This is because a system call is a request to the operating system kernel to do something on behalf of the process, and the kernel has full privileges to the system. (Indeed, it is the part of the system that implements the process abstraction.) We're going to talk a lot about system calls this semester, this is just your introduction to the concept.

Tasks

Function calls, library calls, and system calls

For hello.c and syscall-hello.c do the following (substituting the appropriate source file for prog.c). For example, for hello.c, you would replace all instances of "prog" in a command with "hello".

To download programs to your VM, use the wget command, e.g.

 wget https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c
  1. Compile the program prog.c using gcc -O2 prog.c -o prog-dyn and run prog-dyn. What does it do?
  2. Statically compile and optimize prog.c by running gcc -O2 -static prog.c -o prog-static. How does the size compare with prog?
  3. Run ldd on the static and dynamic versions of the program. How does the output compare? Why?
  4. See what system calls prog-static produces by running strace -o syscalls-static.log ./prog-static. Do the same for prog-dyn. Which version generates more system calls? Note: system calls are saved in the log file syscalls-static.log. Feel free to save them in a different file.
  5. See what library calls prog-static produces by running ltrace -o library-static.log ./prog-static. Do the same for prog-dyn. Which version generates more library calls? (If ltrace isn't installed, run sudo apt-get install ltrace)
  6. Use the command ls -l to see the metadata associated with prog.c and prog-dyn, and prog-static. Who owns these files? What group are they in? Do you notice any pattern with the permissions (rwx) associated with each file?
  7. (optional) Look up the documentation for each of the system calls made by the static versions of the programs. You may need to append a 2 or 3 to the manpage invocation, e.g. "man 2 write" gets you the write system call documentation.

Comparing C and assembly

Do the following with hello.c and syscall-hello.c, as before.

A few tips on reading assembly code. See AT&T/GNU Assembler syntax for more information on syntax and the Wikipedia article on calling conventions for how functions are called.

  • The last letter of many instructions refers to the size of the operand. For example, callq means call a function using a "quad" value (64 bits).
  • A dollar sign preceding a value means that it is a literal value, a percent sign means it is a register.
  • If a register is in parentheses, then it is being used as a "pointer" (it contains an address, so the CPU goes to that address and interacts with the memory there). If there is a number before the parentheses, it is an offset to the register's value.
  1. Using the nm command, see what symbols are defined in prog-static and prog-dyn. Which defines more symbols?
  2. Run the command gcc -c -O2 prog.c to produce an object file. What file was produced? What symbols does it define?
  3. Look at the assembly code of the program by running gcc -S -O2 prog.c. What file was produced? Identify the following in the assembly code (if present):
    • A function call (call)
    • A return from a function (ret)
    • Registers being saved onto the stack (push)
    • Registers being retrieved from the stack (pop)
    • Subtraction (sub)
    • A system call (syscall)
  4. Disassemble the object file using objdump -d. How does this disassembly compare with the output from gcc -S?
  5. Examine the headers of object file, dynamically linked executable, and the statically linked executable using objdump -h
  6. Examine the contents of object file, dynamically linked executable, and the statically linked executable using objdump -s
  7. Re-run all of the previous gcc commands adding the "-v" flag. What is all of that output?

Examining the runtime memory map

Compile and run 3000memview.c, then consider the following questions.

  1. Why are the addresses inconsistent between runs?
  2. Roughly where does the stack seem to be? The heap? Code? Global variables?
  3. Observe how the heap grows (i.e. the value of sbrk changes) in response to malloc calls. Would you expect the heap to ever run into the stack? Why or why not?
  4. Change each malloc call to allocate more than 128K. What happens to the values of sbrk? Why? (Hint: use strace)
  5. Add more code and data to the program, and add more printf's to see where things are. Are things where you expect them to be?

Code

hello.c

#include <stdio.h>

int main(int argc, char *argv[]) {

        printf("Hello world!\n");

        return 0;
}

syscall-hello.c

#include <unistd.h>
#include <sys/syscall.h>

char *buf = "Hello world!\n";

int main(int argc, char *argv) {
        size_t result;

        /* "man 2 write" to see arguments to write syscall */
        result = syscall(SYS_write, 1, buf, 13);

        return (int) result;
}

3000memview.c

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

char *gmsg = "Global Message";

const int buffer_size = 100;

int main(int argc, char *argv[], char *envp[])
{
        char *lmsg = "Local Message";
        char *buf[buffer_size];
        int i;
        
        printf("Memory report\n");
        printf("argv:      %lx\n", (unsigned long) argv);
        printf("argv[0]:   %lx\n", (unsigned long) argv[0]);
        printf("envp:      %lx\n", (unsigned long) envp);
        printf("envp[0]:   %lx\n", (unsigned long) envp[0]);

        printf("lmsg:      %lx\n", (unsigned long) lmsg);
        printf("&lmsg:     %lx\n", (unsigned long) &lmsg);
        printf("gmsg:      %lx\n", (unsigned long) gmsg);
        printf("&gmsg:     %lx\n", (unsigned long) &gmsg);

        printf("main:      %lx\n", (unsigned long) &main);

        printf("sbrk(0):   %lx\n", (unsigned long) sbrk(0));
        printf("&buf:      %lx\n", (unsigned long) &buf);

        for (i = 0; i<buffer_size; i++) {
                buf[i] = (char *) malloc(4096);
        }
        
        printf("buf[0]:    %lx\n", (unsigned long) buf[0]);
        printf("sbrk(0):   %lx\n", (unsigned long) sbrk(0));
        
        return 0;
}