Operating Systems 2020W Lecture 6

Video

The video for the lecture given on January 24, 2020 is now available.

Notes

Topics

highlight relevant chapters from textbook
gdb
- follow-fork-mode (parent, child)
- detach-on-fork (on, off)
- follow-exec-mode (new, same)
- catch exec
- catch syscall
- layout regs
- layout split
- layout src
- layout asm
- print
- explore

Textbook chapters

 http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-intro.pdf
 http://pages.cs.wisc.edu/~remzi/OSTEP/cpu-api.pdf
 http://pages.cs.wisc.edu/~remzi/OSTEP/vm-intro.pdf

In Class

Lecture 6
---------

Running gdb
 - problem: want to run gdb in a separate window from program we are debugging
 - solution: run two terminals, use gdb -p <PID> to attach to process
   (or attach command inside gdb)

 - problem: won't work in standard configuration, can only run gdb on child
   processes
 - solution:
    sudo su -  (to become root)
    echo 0 > /proc/sys/kernel/yama/ptrace_scope    <-- lost on reboot

   or
    sudo nano /etc/sysctl.d/10-ptrace.conf
    change "kernel.yama.ptrace_scope = 0"          <-- persistent

 - problem: how do I debug the child process?  by default it debugs only
   the parent
 - solution: "set follow-fork-mode child" in gdb

 - problem: I want to see what system calls a process is making
 - solution: catch syscall
        also can specify specific syscalls to stop on

 - problem: I want to follow an execve (change the program that is being
    followed)
 - solution: set follow-exec-mode new  (rather than same)

 - problem: I set a breakpoint for a function, how did we get to that
   breakpoint?
 - solution: bt (backtrace) to see the call stack
 
Why am I showing you all this stuff with gdb?
 - I want you to understand how the programs we cover in the course run,
   particularly the system calls they run
 - strace can help with this, but sometimes you need to see things go down
   one line at a time.  Hence gdb.

Note that gdb uses a special system call to control other processes: ptrace
 - ptrace isn't always the most reliable
 - you don't want to ptrace processes in production; there are other tools
   which we will cover later in the term

*Note* that if you want strace to follow forks, you need to specify the -f option

Differences of fork versus execve
 - with fork
   - you start with one process (one PID)
   - you end with two processes: parent (with original PID) and
     child (with new PID)
   - parent and child are identical right after fork except for return
     value of fork().  Hence they are running the same program binary
 - with execve
   - only one process at start and end
   - before, running program that makes the execve call
   - after, running new program *specified* by the execve call
   - note that code after an execve call NEVER RUNS...unless
     execve fails (e.g., you tried to run a program and that file
     doesn't exist or you don't have permission to run it)

In a normal shell, execve is run in a child (so after a fork)
  - otherwise the shell would terminate

/proc and /sys
 - direct interfaces into kernel state
 - provide a file interface because, what would be better?
    - if you made a system call to get these, how would you have
      so many options?  You'd have to pass in magic numbers or strings
    - if you're passing in strings, why not make them filenames
      - then you get a hierarchy of strings!
 - big idea: a filesystem is just a mapping of filenames to data,
   doesn't have to correspond to "real" files (i.e., files stored on disk)
   - really, a file is just something that can be used with the
     filesystem API (open, read, write, close, etc)

 - view of processes in /proc is mostly read-only
    - can't kill a process by deleting files in /proc
    - but this is just a historical artifact of us already having a way
      of killing processes (with signals)

Note that /etc is system configuration stored in persistent files
 - preserved across reboots

Data in /proc and /sys is ephemeral
 - destroyed when system is rebooted
 - just reflects state of current kernel