Operating Systems 2018F Lecture 3

From Soma-notes
Jump to navigation Jump to search

Video

Video from the lecture given on September 12, 2018 is now available.

Code

Code and files from the lecture (captured as they were at the end) are available here.

Notes

  • Sources of confusion from Tutorial 1:
    • Where is good assembly documentation?
      • Probably in books
      • Online
      • (Not expected to learn assembly)
    • Local variables
    • Environment variables
    • gcc arguments
  • The –O2 argument for gcc is for choosing an optimization level (namely, level 2)

Linking

  • A file that is statically linked does not have any library calls
  • What is linking?
    • Taking object files and putting them together to make a full binary, particularly resolving symbolic references between object files.
  • Static linking is the old way. Dynamic linking is how we do things today.
  • nm command for showing symbols

System and Function Calls

  • What is a system call in assembly? What is a function call in assembly?
  • (Time: 9:00) Looking at hello.c
#include <stdio.h>
int main(int argc, char *argv[]) {
        printf("Hello world!\n");
        return 0;
}
  • Professor enters the command: gcc –S –O2 hello.c, which generates the assembly file hello.s . The –S option for gcc does not tell it to do more work, it tells it to do less work.
  • (Time: 12:05) Looking at the assembly file of hello.c
Lec3-1.png
    • All of the things with dot (.) in front of them are declarations (instructions to the assembler) and are not actual operations.
    • Looking at the line that has a function call to puts. Since we are not using much of the fancy functionality of printf (the function we actually called), a modern compiler environment does optimization which involves changing code and swapping a fancy function like printf with a simpler function like puts to print what we want to the screen.
    • The name for the string (i.e. “Hello world!”) is .LC0
    • Compilation is a black box
    • The call operation in assembly is for calling functions. The puts that we see is a function call.
    • System calls are fundamentally different from function calls. They are different in a way
  • such that they cannot be created in standard C.
    • We can create a system call in only two ways:
      • putting inline assembly in standard C code.
      • use a compiler specific operation that tells it to generate code for the system call.
  • Professor writes the following commands on the terminal:
    • gcc –static –O2 hello.c –o hello
    • strace ./hello à strace gives you the system calls that a running program makes.
    • (Time: 26:25) The write system call is the system call that actually did the work that we intended to do.
    • So, the puts function somehow made a write system call which is what produced the text output to the console.
  • What is a function call? The following happens when you call a function:
    • Save arguments (to registers or the stack)
    • Call function
      • Saves (pushes) current instruction pointer onto call stack
      • Changes instruction pointer to the location of the beginning of the function
      • ... Function runs...
      • on function return, pops instruction pointer from the stack, restoring the old function pointer.
  • Everything is based on pointers to code
  • (Time: 35:00) The nm command tells us, in the first column of its output, where things are located.
    • If you look at an object file with nm, before it has been linked, you will see that all addresses are zero. This means that we do not know where they are in memory.
    • In contrast, when we look at a compiled program we can see symbols at specific addresses in memory.
    • Are these addresses unique to this program? In other words, is any particular address, in the program symbol table listed, unique across all programs running on my computer?
      • No!
      • This means that you could have a function located in the same memory location in completely different processes.
      • Pointers in a process are local
      • Addresses only have meaning in the context of the process
      • Every process has a completely different namespace
  • Address space – name space for pointers.
  • (Time : 39:00) What is a System Call?
    • It is an invocation of kernel code.
    • The kernel is the code that defines the system. It is what implements the processes. It implements the namespaces that make separate processes.
    • A system call entails calling code that is not necessarily in your address space.
    • A system call entails calling code with higher privileges.
      • Running in supervisor mode, not user mode
    • How can I call privileged code safely? (code that runs in supervisor mode)
      • Restrict the entry points to privileged code
        • Can’t call arbitrary routines.
      • Entry point should check whether operation is allowed.
      • In the kernel, this is what the system call dispatcher does
    • System Call Dispatcher
      • Process requests system call (write)
      • CPU switches to supervisor mode, runs system call dispatcher
      • Dispatcher decides if system call is allowed
      • System call code (write) is invoked
    • CPU switches into supervisor mode using special instructions
      • “software interrupts”
      • “upcall” → How do I call the kernel
    • The C library has function wrappers around standard system calls.
    • When we say that we are making a ‘system call’ from C, we normally mean that we are making a function call, whose function is a wrapper around the system call we intended to call.
    • Professor enters the following commands:
      • gcc –O2 hello.c –o hello
      • ltrace ./hello
      • ltrace tells you about functions that are called. After this command, we see that ltrace says that there is a puts function.
      • strace ./hello
      • After this command, we can see lots of system calls. We can identify the write system call at the end.
      • gcc –O2 –static hello.c –o hello
      • strace ./hello
      • When linking statically, we can see that we get a lot less system calls.
  • At the function call level, you call code inside your process.
  • At the system call level, you call code outside your process. System calls are more expensive than function calls.
  • We have not seen any assembly of a system call yet.

Walking through CSimpleShell.c

  • (Time: 1:00:13) Looking at csimpleshell.c
Lec3-2.png
  • We have an infinite loop in which we do:
    • Print ‘$’ sign, for printing a prompt
    • fgets is for reading an input
    • parse_args → parses arguments, converts a single string into an array of strings
      • takes the buffer of the input from the user
      • takes in args which is an array of ARR_SIZE (which is 1<<16 → means 1 right shifted 16 bits i.e. 2^(16)).
      • Takes in nargs which is the number of arguments

If there are no args (nargs == 0), then we continue (i.e. go to the next iteration of the loop).

    • If the first argument == “exit”, then call the exit function
    • Otherwise, we fork with the function fork( )
    • If the pid is not 0, then the current process is the parent process. It prints that it is waiting for the child , waits, and then prints that the child is done.
    • If the pid is 0, the current process is the child process. The child process calls execvp which is like execve, except that you don’t need to supply the environment variables (it just needs the arguments).
    • If execvp fails, the puts function below it will be called to indicate an error on the terminal and we exit with exit(127
  • We can run external programs with execve
  • Every process has its own namespace, memory, address space. How then, do you pass variables from one to the other?
  • Where is a username stored? Is a system call required to get the value of username?
    • Username information is inside the process. It is in the address space. A system call is not required to get the value of username.
    • Professor prints the environment variables.
    • When you load a program binary you give execve:
    1. The binary
    2. The command line arguments
    3. Environment variables
    • Environment variables are available as an argument to main. They are variables that are in your process that the kernel happens to put there when you start.