Operating Systems 2020W: Tutorial 1

From Soma-notes
Revision as of 02:45, 20 March 2020 by Soma (talk | contribs) (Created page with "In this tutorial you will be learning the basics of command-line interaction in Linux. Tutorials are graded based on participation and effort, but you should still turn in yo...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In this tutorial you will be learning the basics of command-line interaction in Linux.

Tutorials are graded based on participation and effort, but you should still turn in your work. Submit your answers on cuLearn as a single text file named "<username>-comp3000-t1.txt" (where username is your MyCarletonOne username). The first four lines of this file should be "COMP 3000 Tutorial 1", your name, student number, and the date of submission. You should also contact your TA in person or online to check in.

You get 3 marks for submitting answers and 1 for checking in, making this tutorial worth 4 points total.

Getting Started

For this tutorial, you need to get access to a Linux or UNIX machine. We suggest you use an SCS Openstack instance (see below). You'll need access to a system for the entire semester, ideally the same one.

The concepts covered below are mostly part of standard UNIX/Linux tutorials. Feel free to consult one or more of them. However, remember that you are trying to build a conceptual model of how things work. Thus, don't memorize commands; instead, try to understand how things fit together, and ask questions when things don't work as expected!

Feel free to discuss this tutorial on discord on the #tutorials channel.

Openstack

Again, for emphasis: don't take snapshots! (see below)

Create a VM on the new SCS openstack cluster at openstack-stein.scs.carleton.ca and do your work there. Documentation on the cluster is here. While you don't need a persistent VM for this lab, it will be important for future tuturials - and it is nice to have your work stick around when you leave the lab.

If you log cannot log into openstack or if you are not part of the COMP 3000 project, you need to change your SCS password (run newacct) in order to update your account to have the right entitlements.

Make sure you have added the ssh-ping security group to your network interface and that you have associated a floating IP address with your instance. The 192.168.X.X IP addresses are private (and cannot be accessed outside of the openstack cluster), the 134.117.X.X floating IP addresses can be accessed from the Carleton network and will allow you to access the wider Internet.

Note that you must be on the Carleton network to use openstack. When you are off campus, connect using the Carleton VPN.

Create a VM using the latest COMP 3000 snapshot image. Please create a machine with two VCPUs. The user is student, default password is student. Please change your password after you first connect to your machine (using the passwd command).

Make sure you connect via ssh. Windows 10 and MacOS have ssh clients available from their command lines, just type "ssh student@<IP address>" where the IP address is the floating IP address you assigned to your VM. PuTTY also works, and you can use x2go. DO NOT use the web console, as it is glitchy!


The image provides an "scs-backup" command that will backup the student user's directory to the SCS linux machines. So if your SCS username is janedoe, you can type

 scs-backup janedoe

and it will create a copy of everything in the student account in a directory called "COMP3000VM-backup" in your home directory. You can ssh/sftp to access.scs.carleton.ca in order to access this copy of your VM's files. You should do backups at the end of every session and before you do anything dangerous.

The scs-backup bash function is listed below. Feel free to adapt to your own needs; however, realize that rsync is a very powerful command that can delete arbitrary files at the specified destination (and in fact that is what the listed command does to the backup directory). If you are changing any arguments, be sure to test with the -n option so you can see what will happen!

Note that you cannot take snapshots of your VM, so please don't try (it will keep trying and never succeed, and you'll make work for the tech staff who have to cancel what you did).

Background

The Shell

The shell or command line provides a text interface for running programs. While not as visually pleasing as a graphical interface, the shell provides a more clear representation of the functionality provided by the operating system.

To run a program contained in the current directory in the shell, you need to prefix the name of the command with a ./. This "./" tells the shell that the location of the command you wish to run is the current directory. By default, the shell will not search for executable commands in the current working directory. To run most system commands, the name of the command can be typed without a path specification.

Help

When working in the shell, help is available on most programs in the system, especially those that are command line based. This system of help is available by using the man (manual) command. If one wanted to get help on the echo command, the associated command would be man echo.

While you can find lots of documentation using web searches, note that it is very easy to find documentation that doesn't precisely match your system. Thus web searches should be a complement to rather than a substitute for man pages.

Shell Basics

Note that bash is the default shell on most Linux systems. Other UNIX-like systems can default to other shells like csh or tcsh; there are many alternatives such as zsh that you may prefer. When you change shells the syntax of the following operations can change; however, conceptually all UNIX-like shells provide the same basic functionality:

  • run external programs with command-line arguments
  • view and set environment variables
  • redirect program input and output using I/O redirection and pipes.
  • allow for the creation of scripts that combine external programs with built-in programming functionality.

Processes

Each application running on a system is assigned a unique process identifier. The ps command shows the process identifiers for running processes. Each process running on the system is kept separated from other processes by the operating system. This information will be useful for subsequent questions.

When you enter a command at a shell prompt, most of the time you are creating a new process which runs the program you specified.

Permissions

Your permission to access a file in Unix is determined by who you are logged in as. All files on the Unix file system (including directories and other special files) have three different sets of permissions. The first set of permissions denotes the allowed file operations for the owner of the file. The second set of permissions denotes the allowed file operations for a group of users. The third set of permissions denotes the allowed file operations for everyone else. A file is always owned by someone and is always associated with a group.

The ls command with the -l option can be used to show both the permissions of a file as well as the owner and group associated with the file. Permissions are listed first, followed by the owner and the group.

Environment & Shell Variables

Environment variables on both Linux and Windows are variable-value pairs that are shared between processes that define important context-related information (such as the name of the current user, the current language, the timezone) for applications. The key advantage of environment variables is that they are available right when a program starts - they are given to it by the operating system.

In Linux, these environment variables can be printed on the command line in most shells by referring to the variable name prefixed with a $ sign (eg: to output the value in the HELLO environment variable, one could write echo $HELLO).

Most shells also have internal variables which are private to the shell process. Typically you can access shell and environment variables using the same mechanisms. By convention, shell variables are lower case or mixed case, while environment variables are all upper case. In bash, by default all variables are first shell variables. To make them environment variables, they must be "export"-ed. Thus

 X="Important Data"

just defines X for the current bash process. However, if you then type

 export X

X will be turned into an environment variable, and so every subsequent program will also get X. You can combine both in one line:

 export X="Important Data"

This is the idiom for setting environment variables normally.

One thing to remember with the above is that spaces are used to separate arguments in bash and most other UNIX shells. Thus it is an error to type:

 export X = "Important Data"

as you now are giving export three arguments, not one.

One of the key reasons people choose alternatives to bash is because of quirks like this!

Dynamic Libraries

Most applications on the system do not contain all the code that they need right within the executable. Instead, dynamic libraries are loaded into the program address space when the program loads. As an example, the standard C library, which contains such functions as printf, is loaded in at run-time. Typically dynamic libraries are stored in /lib or /usr/lib.

The dynamic libraries associated with a program binary can be found using the ldd command. You can use ltrace to see calls to functions that are dynamically linked.

System Calls

A process on its own has limited access to the system. It cannot directly access any external devices or data sources (e.g., files, keyboard, the screen, networks) on its own. To access these external resources, to allocate memory, or otherwise change its runtime environment, it must make system calls. Note that system calls run code outside of a process and thus cannot be called like regular function calls. The standard C library provides function wrappers for most commonly-used system calls so they can be accessed like regular C functions. Under the hood, however, these functions make use of special compiler directives in order to generate the machine code necessary to invoke system calls.

You can see the system calls produced by a process using the strace command.

Cotrolling Processes

On Linux, you can control processes by sending them signals. You send signals when you type certain key sequences in most shells: Control-C sends INT (interrupt), Control-Z sends STOP.

You can send a signal to a process using the kill command:

 kill -<signal> <process ID>

So to stop process 4542, type

 kill -STOP 4542

By default, kill sends the TERM signal.

Downloading Code & Compiling Programs

To download C programs to your VM, use wget or curl commands:

wget https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c
curl https://homeostasis.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c -o hello.c

To compile, use gcc:

gcc -O2 hello.c -o hello

This compiles it with level 2 optimization and without debugging symbols.

To run, you have to specify where it is:

./hello

Remember you can change directories using the cd command.

By default gcc-produced binaries are dynamically linked (so at runtime they will require dynamic libraries to be present on the system). To compile a binary that is statically linked (so it has no external runtime library dependencies), instead do this:

gcc -O2 -static hello.c -o hello

Tasks/Questions

  1. Look up the manual page for three shell commands discussed above or shown in lecture. What do you notice about the format of manual pages? What sections are they divided into? How detailed are they?
  2. By default, manual pages are displayed using less. How do you quit less? How can you search for specific terms?
  3. Using the which command, figure out where at least three commands reside on the system. Look at the permissions of those files. Who owns them? What group are they in?
  4. For those same program binaries, figure out what the permission bits mean by reading the man page of chmod. (This is the command you could use to change those permission bits.) What is the difference between "man chmod" and "man 2 chmod"?
  5. What are the owner, group, and permissions of /etc/passwd and /etc/shadow? What are these files used for?
  6. What does it mean to have execute permission on a directory?
  7. The ls command can be used to get a listing of the files in a directory. What options are passed to ls to see all of the files within a directory (including hidden files)? What files are hidden?
  8. The PATH environment variable lists the directories the shell uses to search for commands. Where can you find documentation on it? How can you add the current directory (whichever directory you are currently in) to PATH?
  9. Is cd a command that is built in to a shell, or is it an external binary? How do you know?
  10. Using ldd, what dynamic library dependencies does the top command have? Note that you must specify the full path to top.
  11. Run top in one window and try to terminate it using the kill command in another window. Try running "strace -fqo /tmp/top.log top" to get a record of the system calls that top is running. What happens to top when it receives those signals? What system call is used to send the signals?
  12. Download and compile hello.c and syscall-hello.c. Compile them statically and dynamically. How do the library and system calls produced by them compare (as shown by ltrace and strace)?
  13. Download, compile, and run csimpleshell.c. How does its functionality compare to that of bash?

Code

hello.c

#include <stdio.h>

int main(int argc, char *argv[]) {

        printf("Hello world!\n");

        return 0;
}

syscall-hello.c

#include <unistd.h>
#include <sys/syscall.h>

char *buf = "Hello world!\n";

int main(int argc, char *argv) {
        size_t result;

        /* "man 2 write" to see arguments to write syscall */
        result = syscall(SYS_write, 1, buf, 13);

        return (int) result;
}

csimpleshell.c

/* csimpleshell.c, Enrico Franchi © 2005
      https://web.archive.org/web/20170223203852/
      http://rik0.altervista.org/snippets/csimpleshell.html
      "BSD" license

   January 12, 2019: minor changes to eliminate most compilation warnings
   (Anil Somayaji, soma@scs.carleton.ca)
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/types.h>
#define BUFFER_SIZE 1<<16
#define ARR_SIZE 1<<16

void parse_args(char *buffer, char** args, 
                size_t args_size, size_t *nargs)
{
    char *buf_args[args_size]; /* You need C99 */
    char **cp;
    char *wbuf;
    size_t i, j;
    
    wbuf=buffer;
    buf_args[0]=buffer; 
    args[0] =buffer;
    
    for(cp=buf_args; (*cp=strsep(&wbuf, " \n\t")) != NULL ;){
        if ((*cp != NULL) && (++cp >= &buf_args[args_size]))
            break;
    }
    
    for (j=i=0; buf_args[i]!=NULL; i++){
        if(strlen(buf_args[i])>0)
            args[j++]=buf_args[i];
    }
    
    *nargs=j;
    args[j]=NULL;
}


int main(int argc, char *argv[], char *envp[]){
    char buffer[BUFFER_SIZE];
    char *args[ARR_SIZE];

    int ret_status;
    size_t nargs;
    pid_t pid;
    
    while(1){
        printf("$ ");
        fgets(buffer, BUFFER_SIZE, stdin);
        parse_args(buffer, args, ARR_SIZE, &nargs); 

        if (nargs==0) continue;
        if (!strcmp(args[0], "exit" )) exit(0);       
        pid = fork();
        if (pid){
            printf("Waiting for child (%d)\n", pid);
            pid = wait(&ret_status);
            printf("Child (%d) finished\n", pid);
        } else {
            if( execvp(args[0], args)) {
                puts(strerror(errno));
                exit(127);
            }
        }
    }    
    return 0;
}