Operating Systems 2019W: Tutorial 1: Difference between revisions

From Soma-notes
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''This tutorial is not yet finalized.'''
In this tutorial you will be learning about 1) the difference between system calls, function calls, and library calls, 2) how C is translated into assembly, and 3) how processes are created.
 
In this tutorial you will be learning about 1) the difference between system calls, function calls, and library calls and 2) how processes are created.


==Getting started==
==Getting started==
Line 12: Line 10:


To login, visit [https://openstack.scs.carleton.ca openstack.scs.carleton.ca].  Note that you can only access this page when on the Carleton network.  To get on the Carleton network when off campus, please connect using [https://carleton.ca/its/help-centre/remote-access/ Carleton's VPN service].
To login, visit [https://openstack.scs.carleton.ca openstack.scs.carleton.ca].  Note that you can only access this page when on the Carleton network.  To get on the Carleton network when off campus, please connect using [https://carleton.ca/its/help-centre/remote-access/ Carleton's VPN service].
You should login using your MyCarletonOne username and your SCS password.  These are the same credentials you use to login to the Windows machines in the lab or the SCS linux machines.  '''If your password doesn't work or you can't access the COMP 3000B project''', you should change your password on the SCS [http://www.scs.carleton.ca/webacct/ account management page] ([https://service.scs.carleton.ca/technical-support/account-creation instructions are here]).


Once you log in, create a VM by going to Images and select Launch Instance for "ubuntu_18.04_lightapps_desktop_2019-01-04".  In the resulting dialog, give your instance a name and select the right security groups, then launch the image.
Once you log in, create a VM by going to Images and select Launch Instance for "ubuntu_18.04_lightapps_desktop_2019-01-04".  In the resulting dialog, give your instance a name and select the right security groups, then launch the image.
* Please name your VM <username-X>.  Thus if your username is alicehacker, name your first VM "alicehacker-1".  If you need more VMs, just increment the name!
* Please name your VM <username-X>.  Thus if your username is alicehacker, name your first VM "alicehacker-1".  If you need more VMs, just increment the name!
* Under "Access & Security", check off the "ping-ssh-egress" option.  Don't specify a key pair, it isn't supported by our image.
* Under "Access & Security", select the "ping-ssh-egress" option.  Don't specify a key pair, it isn't supported by our image.
Once you've launched the image, while it is spawning select "Associate Floating IP" in button menu associated with the launching VM.  You may select any IP address that is available.
Once you've launched the image, while it is spawning select "Associate Floating IP" in button menu associated with the launching VM.  You may select any IP address that is available.


Once the VM is fully spawned, connect via ssh to your VM (using putty or similar) at the IP address you specified, using the username "student" and the password "student".  As you first command, '''type "passwd" to change your password.'''.
Once the VM is fully spawned, connect via ssh to your VM (using putty or similar) at the IP address you specified, using the username "student" and the password "student".  As you first command, '''type "passwd" to change your password.'''. On Linux, the ssh client is "ssh".  '''On Windows, use PuTTY.'''


You can get more information on how to use the SCS Openstack cluster at the [https://carleton.ca/scs/technical-support/scs-open-stack/openstack-technical-support/ SCS support page].  Note this page includes videos!
You can get more information on how to use the SCS Openstack cluster at the [https://carleton.ca/scs/technical-support/scs-open-stack/openstack-technical-support/ SCS support page].  Note this page includes videos!
Line 24: Line 24:
==Function calls, library calls, and system calls==
==Function calls, library calls, and system calls==


For [http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c hello.c] and [http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/syscall-hello.c syscall-hello.c] do the following (substituting the appropriate source file for prog.c):
For [http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c hello.c] and [http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/syscall-hello.c syscall-hello.c] do the following (substituting the appropriate source file for prog.c).  To download programs to your VM, use the wget command, e.g.
<pre>
wget https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c
</pre>
# Compile the program prog.c using <tt>gcc -O2 prog.c -o prog-dyn</tt> and run prog-dyn.  What does it do?
# Compile the program prog.c using <tt>gcc -O2 prog.c -o prog-dyn</tt> and run prog-dyn.  What does it do?
# Statically compile and optimize prog.c by running <tt>gcc -O2 -static prog.c -o prog-static</tt>.  How does the size compare with <tt>prog</tt>?
# Statically compile and optimize prog.c by running <tt>gcc -O2 -static prog.c -o prog-static</tt>.  How does the size compare with <tt>prog</tt>?
# See what system calls prog-static produces by running <tt>strace -o syscalls-static.log ./prog-static</tt>.  Do the same for <tt>prog-dyn</tt>.  Which version generates more system calls?
# See what system calls prog-static produces by running <tt>strace -o syscalls-static.log ./prog-static</tt>.  Do the same for <tt>prog-dyn</tt>.  Which version generates more system calls? '''Note: system calls are saved in the log file syscalls-static.log.  Feel free to save them in a different file.'''
# See what library calls prog-static produces by running <tt>ltrace -o library-static.log ./prog-static</tt>.  Do the same for <tt>prog-dyn</tt>.  Which version generates more library calls?  (If ltrace isn't installed, run <tt>sudo apt-get install ltrace</tt>)
# See what library calls prog-static produces by running <tt>ltrace -o library-static.log ./prog-static</tt>.  Do the same for <tt>prog-dyn</tt>.  Which version generates more library calls?  (If ltrace isn't installed, run <tt>sudo apt-get install ltrace</tt>)
# Use the command <tt>ls -l</tt> to see the metadata associated with prog.c and prog-dyn, and prog-static.  Who owns these files?  What group are they in?  Do you notice any pattern with the permissions (rwx) associated with each file?
# (optional) Look up the documentation for each of the system calls made by the static versions of the programs.  You may need to append a 2 or 3 to the manpage invocation, e.g. "man 2 write" gets you the write system call documentation.
# (optional) Look up the documentation for each of the system calls made by the static versions of the programs.  You may need to append a 2 or 3 to the manpage invocation, e.g. "man 2 write" gets you the write system call documentation.


Line 35: Line 39:
Do the following with hello.c and syscall-hello.c, as before.
Do the following with hello.c and syscall-hello.c, as before.


A few tips on reading assembly code:
A few tips on reading assembly code.  See [https://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax AT&T/GNU Assembler syntax] for more information.
* The last letter of many instructions refers to the size of the operand.  For example, callq means call a function using a "quad" value (64 bits).
* The last letter of many instructions refers to the size of the operand.  For example, callq means call a function using a "quad" value (64 bits).
* A dollar sign preceding a value means that it is a literal value, a percent sign means it is a register.
* A dollar sign preceding a value means that it is a literal value, a percent sign means it is a register.
* If a register is in parentheses, then it is being used as a "pointer" (it contains an address, so the CPU goes to that address and interacts with the memory there).  If there is a number before the parentheses, it is an offset to the register's value.
* If a register is in parentheses, then it is being used as a "pointer" (it contains an address, so the CPU goes to that address and interacts with the memory there).  If there is a number before the parentheses, it is an offset to the register's value.
See [https://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax AT&T/GNU Assembler syntax] for more information.


# Using the <tt>nm</tt> command, see what symbols are defined in prog-static and prog-dyn.  Which defines more symbols?
# Using the <tt>nm</tt> command, see what symbols are defined in prog-static and prog-dyn.  Which defines more symbols?
Line 57: Line 60:
==Creating processes, running executables==
==Creating processes, running executables==


For this part you will be playing with and modifying [http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/csimpleshell.c csimpleshell.c] by [https://web.archive.org/web/20170223203852/http://rik0.altervista.org/snippets/csimpleshell.html Enrico Franchi], as well as looking at the files from the previous section.  The source is also listed below.
For this part you will be playing with and modifying [http://homeostasis.scs.carleton.ca/~soma/os-2019w/code/csimpleshell.c csimpleshell.c] by [https://web.archive.org/web/20170223203852/http://rik0.altervista.org/snippets/csimpleshell.html Enrico Franchi], as well as looking at the files from the previous section.  The source is also listed below.


# Use the command <tt>ls -l</tt> to see the metadata associated with prog.c and prog-dyn, and prog-staticWho owns these files?  What group are they in?  Do you notice any pattern with the permissions (rwx) associated with each file?
# Compile and run csimpleshellTry running some commands in csimpleshell as you did with the regular shell (bash).  How similar is their behavior?
# In csimpleshell, change the prompt to be the current user (e.g., "student $"), as reported by the USER environment variable.
# In csimpleshell, change the prompt to be the current user (e.g., "student $"), as reported by the USER environment variable.
# Make csimpleshell not call <tt>wait()</tt> on the command if there is an & as the last argument to a command; instead, just return another prompt.  Can you see the "zombies" that are now produced (processes with a status of Z)?   
# Make csimpleshell not call <tt>wait()</tt> on the command if there is an & as the last argument to a command; instead, just return another prompt.  Can you see the "zombies" that are now produced (processes with a status of Z)?   
Line 65: Line 68:
# Add an environment variable called LASTCOMMAND that contains the last command that was executed by csimpleshell.  This environment variable should be passed on to each new program that is run.  How can you check that your code works?
# Add an environment variable called LASTCOMMAND that contains the last command that was executed by csimpleshell.  This environment variable should be passed on to each new program that is run.  How can you check that your code works?
# Use [http://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html sigaction()] and [http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html waitpid()] to create a signal handler for SIGCHLD that prevents the creation of zombies for background commands.
# Use [http://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html sigaction()] and [http://pubs.opengroup.org/onlinepubs/9699919799/functions/wait.html waitpid()] to create a signal handler for SIGCHLD that prevents the creation of zombies for background commands.
# How complex is the assembly code of csimpleshell?
# How do the system calls and library calls made by csimpleshell change after doing the above changes?
# How do the system calls and library calls made by csimpleshell change after doing the above changes?
# (Advanced) Implement I/O redirection for STDIN (<) and STDOUT (>).  Do the same for arbitrary file descriptors (e.g., 2>).
# (Advanced) Implement I/O redirection for STDIN (<) and STDOUT (>).  Do the same for arbitrary file descriptors (e.g., 2>).
Line 101: Line 105:
===csimpleshell.c===
===csimpleshell.c===
<source line lang="C">
<source line lang="C">
/* csimpleshell.c, Enrico Franchi © 2005
      https://web.archive.org/web/20170223203852/
      http://rik0.altervista.org/snippets/csimpleshell.html
      "BSD" license
  January 12, 2019: minor changes to eliminate most compilation warnings
  (Anil Somayaji, soma@scs.carleton.ca)
*/
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdlib.h>
Line 106: Line 119:
#include <string.h>
#include <string.h>
#include <errno.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <sys/types.h>
#define BUFFER_SIZE 1<<16
#define BUFFER_SIZE 1<<16
Line 123: Line 137:
      
      
     for(cp=buf_args; (*cp=strsep(&wbuf, " \n\t")) != NULL ;){
     for(cp=buf_args; (*cp=strsep(&wbuf, " \n\t")) != NULL ;){
         if ((*cp != '\0') && (++cp >= &buf_args[args_size]))
         if ((*cp != NULL) && (++cp >= &buf_args[args_size]))
             break;
             break;
     }
     }
Line 141: Line 155:
     char *args[ARR_SIZE];
     char *args[ARR_SIZE];


     int *ret_status;
     int ret_status;
     size_t nargs;
     size_t nargs;
     pid_t pid;
     pid_t pid;
Line 155: Line 169:
         if (pid){
         if (pid){
             printf("Waiting for child (%d)\n", pid);
             printf("Waiting for child (%d)\n", pid);
             pid = wait(ret_status);
             pid = wait(&ret_status);
             printf("Child (%d) finished\n", pid);
             printf("Child (%d) finished\n", pid);
         } else {
         } else {

Latest revision as of 18:48, 14 January 2019

In this tutorial you will be learning about 1) the difference between system calls, function calls, and library calls, 2) how C is translated into assembly, and 3) how processes are created.

Getting started

To do the following exercises, bring up a terminal on a system running Linux, preferably Ubuntu Linux 18.04. You should create a virtual machine on openstack for this lab. This VM will be your machine for the semester.

If you wish to use a VM on your own laptop, please install Ubuntu 18.04 or a similar Linux distribution on a VM running on VirtualBox or other virtualization platform. We strongly encourage you to use openstack, however, as we can provide better support for these VMs.

Setting up Openstack

To login, visit openstack.scs.carleton.ca. Note that you can only access this page when on the Carleton network. To get on the Carleton network when off campus, please connect using Carleton's VPN service.

You should login using your MyCarletonOne username and your SCS password. These are the same credentials you use to login to the Windows machines in the lab or the SCS linux machines. If your password doesn't work or you can't access the COMP 3000B project, you should change your password on the SCS account management page (instructions are here).

Once you log in, create a VM by going to Images and select Launch Instance for "ubuntu_18.04_lightapps_desktop_2019-01-04". In the resulting dialog, give your instance a name and select the right security groups, then launch the image.

  • Please name your VM <username-X>. Thus if your username is alicehacker, name your first VM "alicehacker-1". If you need more VMs, just increment the name!
  • Under "Access & Security", select the "ping-ssh-egress" option. Don't specify a key pair, it isn't supported by our image.

Once you've launched the image, while it is spawning select "Associate Floating IP" in button menu associated with the launching VM. You may select any IP address that is available.

Once the VM is fully spawned, connect via ssh to your VM (using putty or similar) at the IP address you specified, using the username "student" and the password "student". As you first command, type "passwd" to change your password.. On Linux, the ssh client is "ssh". On Windows, use PuTTY.

You can get more information on how to use the SCS Openstack cluster at the SCS support page. Note this page includes videos!

Function calls, library calls, and system calls

For hello.c and syscall-hello.c do the following (substituting the appropriate source file for prog.c). To download programs to your VM, use the wget command, e.g.

 wget https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c
  1. Compile the program prog.c using gcc -O2 prog.c -o prog-dyn and run prog-dyn. What does it do?
  2. Statically compile and optimize prog.c by running gcc -O2 -static prog.c -o prog-static. How does the size compare with prog?
  3. See what system calls prog-static produces by running strace -o syscalls-static.log ./prog-static. Do the same for prog-dyn. Which version generates more system calls? Note: system calls are saved in the log file syscalls-static.log. Feel free to save them in a different file.
  4. See what library calls prog-static produces by running ltrace -o library-static.log ./prog-static. Do the same for prog-dyn. Which version generates more library calls? (If ltrace isn't installed, run sudo apt-get install ltrace)
  5. Use the command ls -l to see the metadata associated with prog.c and prog-dyn, and prog-static. Who owns these files? What group are they in? Do you notice any pattern with the permissions (rwx) associated with each file?
  6. (optional) Look up the documentation for each of the system calls made by the static versions of the programs. You may need to append a 2 or 3 to the manpage invocation, e.g. "man 2 write" gets you the write system call documentation.

Comparing C and Assembly

Do the following with hello.c and syscall-hello.c, as before.

A few tips on reading assembly code. See AT&T/GNU Assembler syntax for more information.

  • The last letter of many instructions refers to the size of the operand. For example, callq means call a function using a "quad" value (64 bits).
  • A dollar sign preceding a value means that it is a literal value, a percent sign means it is a register.
  • If a register is in parentheses, then it is being used as a "pointer" (it contains an address, so the CPU goes to that address and interacts with the memory there). If there is a number before the parentheses, it is an offset to the register's value.
  1. Using the nm command, see what symbols are defined in prog-static and prog-dyn. Which defines more symbols?
  2. Run the command gcc -c -O2 prog.c to produce an object file. What file was produced? What symbols does it define?
  3. Look at the assembly code of the program by running gcc -S -O2 prog.c. What file was produced? Identify the following in the assembly code (if present):
    • A function call (call)
    • A return from a function (ret)
    • Registers being saved onto the stack (push)
    • Registers being retrieved from the stack (pop)
    • Subtraction (sub)
    • A system call (syscall)
  4. Disassemble the object file using objdump -d. How does this disassembly compare with the output from gcc -S?
  5. Examine the headers of object file, dynamically linked executable, and the statically linked executable using objdump -h
  6. Examine the contents of object file, dynamically linked executable, and the statically linked executable using objdump -s
  7. Re-run all of the previous gcc commands adding the "-v" flag. What is all of that output?

Creating processes, running executables

For this part you will be playing with and modifying csimpleshell.c by Enrico Franchi, as well as looking at the files from the previous section. The source is also listed below.

  1. Compile and run csimpleshell. Try running some commands in csimpleshell as you did with the regular shell (bash). How similar is their behavior?
  2. In csimpleshell, change the prompt to be the current user (e.g., "student $"), as reported by the USER environment variable.
  3. Make csimpleshell not call wait() on the command if there is an & as the last argument to a command; instead, just return another prompt. Can you see the "zombies" that are now produced (processes with a status of Z)?
  4. Change the execvp() call to execve(). Where do you get the extra argument? (NOTE: When you switch to execve() you will have to specify the full path to commands, e.g. /bin/ls not ls.)
  5. Add an environment variable called LASTCOMMAND that contains the last command that was executed by csimpleshell. This environment variable should be passed on to each new program that is run. How can you check that your code works?
  6. Use sigaction() and waitpid() to create a signal handler for SIGCHLD that prevents the creation of zombies for background commands.
  7. How complex is the assembly code of csimpleshell?
  8. How do the system calls and library calls made by csimpleshell change after doing the above changes?
  9. (Advanced) Implement I/O redirection for STDIN (<) and STDOUT (>). Do the same for arbitrary file descriptors (e.g., 2>).

Code

hello.c

#include <stdio.h>

int main(int argc, char *argv[]) {

        printf("Hello world!\n");

        return 0;
}

syscall-hello.c

#include <unistd.h>
#include <sys/syscall.h>

char *buf = "Hello world!\n";

int main(int argc, char *argv) {
        size_t result;

        /* "man 2 write" to see arguments to write syscall */
        result = syscall(SYS_write, 1, buf, 13);

        return (int) result;
}

csimpleshell.c

/* csimpleshell.c, Enrico Franchi © 2005
      https://web.archive.org/web/20170223203852/
      http://rik0.altervista.org/snippets/csimpleshell.html
      "BSD" license

   January 12, 2019: minor changes to eliminate most compilation warnings
   (Anil Somayaji, soma@scs.carleton.ca)
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/types.h>
#define BUFFER_SIZE 1<<16
#define ARR_SIZE 1<<16

void parse_args(char *buffer, char** args, 
                size_t args_size, size_t *nargs)
{
    char *buf_args[args_size]; /* You need C99 */
    char **cp;
    char *wbuf;
    size_t i, j;
    
    wbuf=buffer;
    buf_args[0]=buffer; 
    args[0] =buffer;
    
    for(cp=buf_args; (*cp=strsep(&wbuf, " \n\t")) != NULL ;){
        if ((*cp != NULL) && (++cp >= &buf_args[args_size]))
            break;
    }
    
    for (j=i=0; buf_args[i]!=NULL; i++){
        if(strlen(buf_args[i])>0)
            args[j++]=buf_args[i];
    }
    
    *nargs=j;
    args[j]=NULL;
}


int main(int argc, char *argv[], char *envp[]){
    char buffer[BUFFER_SIZE];
    char *args[ARR_SIZE];

    int ret_status;
    size_t nargs;
    pid_t pid;
    
    while(1){
        printf("$ ");
        fgets(buffer, BUFFER_SIZE, stdin);
        parse_args(buffer, args, ARR_SIZE, &nargs); 

        if (nargs==0) continue;
        if (!strcmp(args[0], "exit" )) exit(0);       
        pid = fork();
        if (pid){
            printf("Waiting for child (%d)\n", pid);
            pid = wait(&ret_status);
            printf("Child (%d) finished\n", pid);
        } else {
            if( execvp(args[0], args)) {
                puts(strerror(errno));
                exit(127);
            }
        }
    }    
    return 0;
}