Operating Systems 2019F: Tutorial 1: Difference between revisions
| No edit summary | |||
| (6 intermediate revisions by 2 users not shown) | |||
| Line 22: | Line 22: | ||
| When working in the shell, help is available on most programs in the system, especially those that are command line based.  This system of help is available by using the <tt>man</tt> (manual) command.  If one wanted to get help on the echo command, the associated command would be <tt>man echo</tt>. | When working in the shell, help is available on most programs in the system, especially those that are command line based.  This system of help is available by using the <tt>man</tt> (manual) command.  If one wanted to get help on the echo command, the associated command would be <tt>man echo</tt>. | ||
| While you can find lots of documentation using web searches, note that it is very easy to find documentation that doesn't precisely match your system.  Thus web searches should be a complement to rather than a substitute for man pages. | |||
| ===Shell Basics=== | ===Shell Basics=== | ||
| Line 45: | Line 47: | ||
| ===Environment & Shell Variables=== | ===Environment & Shell Variables=== | ||
| Environment variables on both Linux and Windows are variable-value pairs that are shared between processes that define important context-related information (such as the name of the current user, the current language, the timezone) for applications.   | Environment variables on both Linux and Windows are variable-value pairs that are shared between processes that define important context-related information (such as the name of the current user, the current language, the timezone) for applications.  The key advantage of environment variables is that they are available right when a program starts - they are given to it by the operating system. | ||
| In Linux, these environment variables can be printed on the command line in most shells by referring to the variable name prefixed with a $ sign (eg: to output the value in the HELLO environment variable, one could write <tt>echo $HELLO</tt>). | In Linux, these environment variables can be printed on the command line in most shells by referring to the variable name prefixed with a $ sign (eg: to output the value in the HELLO environment variable, one could write <tt>echo $HELLO</tt>). | ||
| Line 73: | Line 75: | ||
| ===Dynamic Libraries=== | ===Dynamic Libraries=== | ||
| Most applications on the system do not contain all the code that they need right within the executable.  Instead, dynamic libraries are loaded into the program address space when the program loads.  As an example, the standard C library, which contains such functions as <tt>printf</tt>, is loaded in at run-time.  | Most applications on the system do not contain all the code that they need right within the executable.  Instead, dynamic libraries are loaded into the program address space when the program loads.  As an example, the standard C library, which contains such functions as <tt>printf</tt>, is loaded in at run-time. Typically dynamic libraries are stored in /lib or /usr/lib. | ||
| The dynamic libraries associated with a program binary can be found using the <tt>ldd</tt> command.  You can use <tt>ltrace</tt> to see calls to functions that are dynamically linked. | |||
| ===System Calls=== | ===System Calls=== | ||
| Line 81: | Line 85: | ||
| You can see the system calls produced by a process using the <tt>strace</tt> command. | You can see the system calls produced by a process using the <tt>strace</tt> command. | ||
| === | ===Cotrolling Processes=== | ||
| On Linux, you can control processes by sending them signals.  You send signals when you type certain key sequences in most shells: Control-C sends INT (interrupt), Control-Z sends STOP. | On Linux, you can control processes by sending them signals.  You send signals when you type certain key sequences in most shells: Control-C sends INT (interrupt), Control-Z sends STOP. | ||
| Line 94: | Line 98: | ||
| By default, kill sends the TERM signal. | By default, kill sends the TERM signal. | ||
| ===Downloading Code & Compiling Programs=== | |||
| To download C programs to your VM, use <tt>wget</tt> or <tt>curl</tt> commands: | |||
|  wget https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c | |||
|  curl https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c -o hello.c | |||
| To compile, use gcc: | |||
|  gcc -O2 hello.c -o hello | |||
| This compiles it with level 2 optimization and without debugging symbols. | |||
| To run, you have to specify where it is: | |||
|  ./hello | |||
| Remember you can change directories using the <tt>cd</tt> command. | |||
| By default gcc-produced binaries are dynamically linked (so at runtime they will require dynamic libraries to be present on the system).  To compile a binary that is statically linked (so it has no external runtime library dependencies), instead do this: | |||
|  gcc -O2 -static hello.c -o hello | |||
| ==Tasks/Questions== | ==Tasks/Questions== | ||
| Line 99: | Line 126: | ||
| # Look up the manual page for three shell commands discussed above or shown in lecture.  What do you notice about the format of manual pages?  What sections are they divided into?  How detailed are they? | # Look up the manual page for three shell commands discussed above or shown in lecture.  What do you notice about the format of manual pages?  What sections are they divided into?  How detailed are they? | ||
| # By default, manual pages are displayed using <tt>less</tt>.  How do you quit <tt>less</tt>?  How can you search for specific terms? | # By default, manual pages are displayed using <tt>less</tt>.  How do you quit <tt>less</tt>?  How can you search for specific terms? | ||
| # Using  | # Using the <tt>which</tt> command, figure out where at least three commands reside on the system.  Look at the permissions of those files.  Who owns them?  What group are they in? | ||
| # For those same program binaries, figure out what the permission bits mean by reading the man page of <tt>chmod</tt>.  (This is the command you could use to change those permission bits.)  What is the difference between "man chmod" and "man 2 chmod"? | # For those same program binaries, figure out what the permission bits mean by reading the man page of <tt>chmod</tt>.  (This is the command you could use to change those permission bits.)  What is the difference between "man chmod" and "man 2 chmod"? | ||
| # What are the owner, group, and permissions of /etc/passwd and /etc/shadow?  What are these files used for? | # What are the owner, group, and permissions of /etc/passwd and /etc/shadow?  What are these files used for? | ||
| Line 105: | Line 132: | ||
| # The <tt>ls</tt> command can be used to get a listing of the files in a directory.  What options are passed to <tt>ls</tt> to see all of the files within a directory (including hidden files)?  What files are hidden? | # The <tt>ls</tt> command can be used to get a listing of the files in a directory.  What options are passed to <tt>ls</tt> to see all of the files within a directory (including hidden files)?  What files are hidden? | ||
| # The PATH environment variable lists the directories the shell uses to search for commands.  Where can you find documentation on it?  How can you add the current directory (whichever directory you are currently in) to PATH? | # The PATH environment variable lists the directories the shell uses to search for commands.  Where can you find documentation on it?  How can you add the current directory (whichever directory you are currently in) to PATH? | ||
| # Is <tt>cd</tt> a command that is built in to a shell, or is it an external binary?  How do you know? | |||
| # Using <tt>ldd</tt>, what dynamic library dependencies does the <tt>top</tt> command have?  Note that you must specify the full path to <tt>top</tt>. | # Using <tt>ldd</tt>, what dynamic library dependencies does the <tt>top</tt> command have?  Note that you must specify the full path to <tt>top</tt>. | ||
| # Run <tt>top</tt> in one window and try to terminate it using the <tt>kill</tt> command in another window.  Try running "strace -fqo /tmp/top.log top" to get a record of the system calls that top is running.  What happens to top when it receives those signals?  What system call is used to send the signals? | # Run <tt>top</tt> in one window and try to terminate it using the <tt>kill</tt> command in another window.  Try running "strace -fqo /tmp/top.log top" to get a record of the system calls that top is running.  What happens to top when it receives those signals?  What system call is used to send the signals? | ||
| # Download and compile [http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c hello.c] and [http://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/syscall-hello.c syscall-hello.c].  Compile them statically and dynamically.  How do the library and system calls produced by them compare? | |||
| # Download, compile, and run [http://homeostasis.scs.carleton.ca/~soma/os-2019w/code/csimpleshell.c csimpleshell.c].  How does its functionality compare to that of <tt>bash</tt>? | |||
| ==Code== | |||
| ===hello.c=== | |||
| <source lang="c"> | |||
| #include <stdio.h> | |||
| int main(int argc, char *argv[]) { | |||
|         printf("Hello world!\n"); | |||
|         return 0; | |||
| } | |||
| </source> | |||
| ===syscall-hello.c=== | |||
| <source lang="c"> | |||
| #include <unistd.h> | |||
| #include <sys/syscall.h> | |||
| char *buf = "Hello world!\n"; | |||
| int main(int argc, char *argv) { | |||
|         size_t result; | |||
|         /* "man 2 write" to see arguments to write syscall */ | |||
|         result = syscall(SYS_write, 1, buf, 13); | |||
|         return (int) result; | |||
| } | |||
| </source> | |||
| ===csimpleshell.c=== | |||
| <source line lang="C"> | |||
| /* csimpleshell.c, Enrico Franchi © 2005 | |||
|       https://web.archive.org/web/20170223203852/ | |||
|       http://rik0.altervista.org/snippets/csimpleshell.html | |||
|       "BSD" license | |||
|    January 12, 2019: minor changes to eliminate most compilation warnings | |||
|    (Anil Somayaji, soma@scs.carleton.ca) | |||
| */ | |||
| #include <stdio.h> | |||
| #include <stdlib.h> | |||
| #include <unistd.h> | |||
| #include <string.h> | |||
| #include <errno.h> | |||
| #include <sys/wait.h> | |||
| #include <sys/types.h> | |||
| #define BUFFER_SIZE 1<<16 | |||
| #define ARR_SIZE 1<<16 | |||
| void parse_args(char *buffer, char** args,  | |||
|                 size_t args_size, size_t *nargs) | |||
| { | |||
|     char *buf_args[args_size]; /* You need C99 */ | |||
|     char **cp; | |||
|     char *wbuf; | |||
|     size_t i, j; | |||
|     wbuf=buffer; | |||
|     buf_args[0]=buffer;  | |||
|     args[0] =buffer; | |||
|     for(cp=buf_args; (*cp=strsep(&wbuf, " \n\t")) != NULL ;){ | |||
|         if ((*cp != NULL) && (++cp >= &buf_args[args_size])) | |||
|             break; | |||
|     } | |||
|     for (j=i=0; buf_args[i]!=NULL; i++){ | |||
|         if(strlen(buf_args[i])>0) | |||
|             args[j++]=buf_args[i]; | |||
|     } | |||
|     *nargs=j; | |||
|     args[j]=NULL; | |||
| } | |||
| int main(int argc, char *argv[], char *envp[]){ | |||
|     char buffer[BUFFER_SIZE]; | |||
|     char *args[ARR_SIZE]; | |||
|     int ret_status; | |||
|     size_t nargs; | |||
|     pid_t pid; | |||
|     while(1){ | |||
|         printf("$ "); | |||
|         fgets(buffer, BUFFER_SIZE, stdin); | |||
|         parse_args(buffer, args, ARR_SIZE, &nargs);  | |||
|         if (nargs==0) continue; | |||
|         if (!strcmp(args[0], "exit" )) exit(0);        | |||
|         pid = fork(); | |||
|         if (pid){ | |||
|             printf("Waiting for child (%d)\n", pid); | |||
|             pid = wait(&ret_status); | |||
|             printf("Child (%d) finished\n", pid); | |||
|         } else { | |||
|             if( execvp(args[0], args)) { | |||
|                 puts(strerror(errno)); | |||
|                 exit(127); | |||
|             } | |||
|         } | |||
|     }     | |||
|     return 0; | |||
| } | |||
| </source> | |||
Latest revision as of 17:31, 11 September 2019
This tutorial is not yet finalized.
In this tutorial you will be learning the basics of command-line interaction in Linux.
Getting Started
For this tutorial, you need to get access to a Linux or UNIX machine. We suggest you use one of the Virtualbox virtual machines already configured on the SCS lab machines. Alternately, you can use a Linux terminal on your own machine, running on your own virtual Linux box, or on one of the SCS Linux machines. Starting (hopefully) next week we will begin working with SCS's openstack cluster.
The concepts covered below are mostly part of standard UNIX/Linux tutorials. Feel free to consult one or more of them. However, remember that you are trying to build a conceptual model of how things work. Thus, don't memorize commands; instead, try to understand how things fit together, and ask questions when things don't work as expected!
Feel free to discuss this tutorial on discord on the #tutorials channel.
Background
The Shell
The shell or command line provides a text interface for running programs. While not as visually pleasing as a graphical interface, the shell provides a more clear representation of the functionality provided by the operating system.
To run a program contained in the current directory in the shell, you need to prefix the name of the command with a ./. This "./" tells the shell that the location of the command you wish to run is the current directory. By default, the shell will not search for executable commands in the current working directory. To run most system commands, the name of the command can be typed without a path specification.
Help
When working in the shell, help is available on most programs in the system, especially those that are command line based. This system of help is available by using the man (manual) command. If one wanted to get help on the echo command, the associated command would be man echo.
While you can find lots of documentation using web searches, note that it is very easy to find documentation that doesn't precisely match your system. Thus web searches should be a complement to rather than a substitute for man pages.
Shell Basics
Note that bash is the default shell on most Linux systems. Other UNIX-like systems can default to other shells like csh or tcsh; there are many alternatives such as zsh that you may prefer. When you change shells the syntax of the following operations can change; however, conceptually all UNIX-like shells provide the same basic functionality:
- run external programs with command-line arguments
- view and set environment variables
- redirect program input and output using I/O redirection and pipes.
- allow for the creation of scripts that combine external programs with built-in programming functionality.
Processes
Each application running on a system is assigned a unique process identifier. The ps command shows the process identifiers for running processes. Each process running on the system is kept separated from other processes by the operating system. This information will be useful for subsequent questions.
When you enter a command at a shell prompt, most of the time you are creating a new process which runs the program you specified.
Permissions
Your permission to access a file in Unix is determined by who you are logged in as. All files on the Unix file system (including directories and other special files) have three different sets of permissions. The first set of permissions denotes the allowed file operations for the owner of the file. The second set of permissions denotes the allowed file operations for a group of users. The third set of permissions denotes the allowed file operations for everyone else. A file is always owned by someone and is always associated with a group.
The ls command with the -l option can be used to show both the permissions of a file as well as the owner and group associated with the file. Permissions are listed first, followed by the owner and the group.
Environment & Shell Variables
Environment variables on both Linux and Windows are variable-value pairs that are shared between processes that define important context-related information (such as the name of the current user, the current language, the timezone) for applications. The key advantage of environment variables is that they are available right when a program starts - they are given to it by the operating system.
In Linux, these environment variables can be printed on the command line in most shells by referring to the variable name prefixed with a $ sign (eg: to output the value in the HELLO environment variable, one could write echo $HELLO).
Most shells also have internal variables which are private to the shell process. Typically you can access shell and environment variables using the same mechanisms. By convention, shell variables are lower case or mixed case, while environment variables are all upper case. In bash, by default all variables are first shell variables. To make them environment variables, they must be "export"-ed. Thus
X="Important Data"
just defines X for the current bash process. However, if you then type
export X
X will be turned into an environment variable, and so every subsequent program will also get X. You can combine both in one line:
export X="Important Data"
This is the idiom for setting environment variables normally.
One thing to remember with the above is that spaces are used to separate arguments in bash and most other UNIX shells. Thus it is an error to type:
export X = "Important Data"
as you now are giving export three arguments, not one.
One of the key reasons people choose alternatives to bash is because of quirks like this!
Dynamic Libraries
Most applications on the system do not contain all the code that they need right within the executable. Instead, dynamic libraries are loaded into the program address space when the program loads. As an example, the standard C library, which contains such functions as printf, is loaded in at run-time. Typically dynamic libraries are stored in /lib or /usr/lib.
The dynamic libraries associated with a program binary can be found using the ldd command. You can use ltrace to see calls to functions that are dynamically linked.
System Calls
A process on its own has limited access to the system. It cannot directly access any external devices or data sources (e.g., files, keyboard, the screen, networks) on its own. To access these external resources, to allocate memory, or otherwise change its runtime environment, it must make system calls. Note that system calls run code outside of a process and thus cannot be called like regular function calls. The standard C library provides function wrappers for most commonly-used system calls so they can be accessed like regular C functions. Under the hood, however, these functions make use of special compiler directives in order to generate the machine code necessary to invoke system calls.
You can see the system calls produced by a process using the strace command.
Cotrolling Processes
On Linux, you can control processes by sending them signals. You send signals when you type certain key sequences in most shells: Control-C sends INT (interrupt), Control-Z sends STOP.
You can send a signal to a process using the kill command:
kill -<signal> <process ID>
So to stop process 4542, type
kill -STOP 4542
By default, kill sends the TERM signal.
Downloading Code & Compiling Programs
To download C programs to your VM, use wget or curl commands:
wget https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c curl https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c -o hello.c
To compile, use gcc:
gcc -O2 hello.c -o hello
This compiles it with level 2 optimization and without debugging symbols.
To run, you have to specify where it is:
./hello
Remember you can change directories using the cd command.
By default gcc-produced binaries are dynamically linked (so at runtime they will require dynamic libraries to be present on the system). To compile a binary that is statically linked (so it has no external runtime library dependencies), instead do this:
gcc -O2 -static hello.c -o hello
Tasks/Questions
- Look up the manual page for three shell commands discussed above or shown in lecture. What do you notice about the format of manual pages? What sections are they divided into? How detailed are they?
- By default, manual pages are displayed using less. How do you quit less? How can you search for specific terms?
- Using the which command, figure out where at least three commands reside on the system. Look at the permissions of those files. Who owns them? What group are they in?
- For those same program binaries, figure out what the permission bits mean by reading the man page of chmod. (This is the command you could use to change those permission bits.) What is the difference between "man chmod" and "man 2 chmod"?
- What are the owner, group, and permissions of /etc/passwd and /etc/shadow? What are these files used for?
- What does it mean to have execute permission on a directory?
- The ls command can be used to get a listing of the files in a directory. What options are passed to ls to see all of the files within a directory (including hidden files)? What files are hidden?
- The PATH environment variable lists the directories the shell uses to search for commands. Where can you find documentation on it? How can you add the current directory (whichever directory you are currently in) to PATH?
- Is cd a command that is built in to a shell, or is it an external binary? How do you know?
- Using ldd, what dynamic library dependencies does the top command have? Note that you must specify the full path to top.
- Run top in one window and try to terminate it using the kill command in another window. Try running "strace -fqo /tmp/top.log top" to get a record of the system calls that top is running. What happens to top when it receives those signals? What system call is used to send the signals?
- Download and compile hello.c and syscall-hello.c. Compile them statically and dynamically. How do the library and system calls produced by them compare?
- Download, compile, and run csimpleshell.c. How does its functionality compare to that of bash?
Code
hello.c
#include <stdio.h>
int main(int argc, char *argv[]) {
        printf("Hello world!\n");
        return 0;
}
syscall-hello.c
#include <unistd.h>
#include <sys/syscall.h>
char *buf = "Hello world!\n";
int main(int argc, char *argv) {
        size_t result;
        /* "man 2 write" to see arguments to write syscall */
        result = syscall(SYS_write, 1, buf, 13);
        return (int) result;
}
csimpleshell.c
/* csimpleshell.c, Enrico Franchi © 2005
      https://web.archive.org/web/20170223203852/
      http://rik0.altervista.org/snippets/csimpleshell.html
      "BSD" license
   January 12, 2019: minor changes to eliminate most compilation warnings
   (Anil Somayaji, soma@scs.carleton.ca)
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/types.h>
#define BUFFER_SIZE 1<<16
#define ARR_SIZE 1<<16
void parse_args(char *buffer, char** args, 
                size_t args_size, size_t *nargs)
{
    char *buf_args[args_size]; /* You need C99 */
    char **cp;
    char *wbuf;
    size_t i, j;
    
    wbuf=buffer;
    buf_args[0]=buffer; 
    args[0] =buffer;
    
    for(cp=buf_args; (*cp=strsep(&wbuf, " \n\t")) != NULL ;){
        if ((*cp != NULL) && (++cp >= &buf_args[args_size]))
            break;
    }
    
    for (j=i=0; buf_args[i]!=NULL; i++){
        if(strlen(buf_args[i])>0)
            args[j++]=buf_args[i];
    }
    
    *nargs=j;
    args[j]=NULL;
}
int main(int argc, char *argv[], char *envp[]){
    char buffer[BUFFER_SIZE];
    char *args[ARR_SIZE];
    int ret_status;
    size_t nargs;
    pid_t pid;
    
    while(1){
        printf("$ ");
        fgets(buffer, BUFFER_SIZE, stdin);
        parse_args(buffer, args, ARR_SIZE, &nargs); 
        if (nargs==0) continue;
        if (!strcmp(args[0], "exit" )) exit(0);       
        pid = fork();
        if (pid){
            printf("Waiting for child (%d)\n", pid);
            pid = wait(&ret_status);
            printf("Child (%d) finished\n", pid);
        } else {
            if( execvp(args[0], args)) {
                puts(strerror(errno));
                exit(127);
            }
        }
    }    
    return 0;
}