Operating Systems 2021F: Tutorial 1

From Soma-notes

In this tutorial you will be learning the basics of command-line interaction in Linux.

Tutorials are graded based on participation and effort. You can either turn in your work on Brightspace or you can talk to your TA to show that you've made a reasonable attempt at the material in the tutorial. If you turn in written answers, submit them as a single text file named "<username>-comp3000-t1.txt" (where username is your MyCarletonOne username). The first four lines of this file should be "COMP 3000 Tutorial 1", your name, student number, and the date of submission.

This tutorial is worth 4 points total.

Getting Started

For this tutorial, you need to get access to a Linux or UNIX machine. We strongly suggest you use an SCS Openstack instance (see below). You'll need access to a system for the entire semester, ideally the same one.

The concepts covered below are mostly part of standard UNIX/Linux tutorials. Feel free to consult one or more of them. However, remember that you are trying to build a conceptual model of how things work. Thus, don't memorize commands; instead, try to understand how things fit together, and ask questions when things don't work as expected!

Feel free to discuss this tutorial on Teams in the Tutorials channel.

Openstack

Again, for emphasis: don't take snapshots! (see below)

Connecting to Openstack

Create a VM on the new SCS openstack cluster at openstack-stein.scs.carleton.ca and do your work there. Documentation on the cluster is here. While you don't need a persistent VM for this lab, it will be important for future tuturials - and it is nice to have your work stick around when you leave the lab.

To access Openstack you must be on the Carleton network, so make sure to VPN in!

To get added to the COMP 3000 project, you need to change your SCS password (run newacct) in order to update your account to have the right entitlements.

Setting up and connecting a VM

Create a VM on the SCS openstack cluster as shown in lecture. Make sure do the following:

  • Choose the COMP3000A-2021F snapshot image ONLY (others won't have the right software for later tutorials),
  • Add the ping-ssh-egress security group, and
  • Associating a floating IP address.

The 192.168.X.X IP addresses are private (and cannot be accessed outside of the openstack cluster), the 134.117.X.X floating IP addresses can be accessed from the Carleton network and will allow you to access the wider Internet. You need to ssh to your VM instance. Windows 10, Ubuntu and MacOS all have SSH clients available from their command lines, just type "ssh student@<IP address>" where the IP address is the floating IP address you assigned to your VM (while connected to the Carleton VPN). Other tools supporting SSH (e.g., PuTTY) also work.

Once you are prompted to log in, the default user is student, default password is student. Please change your password after you first connect to your machine (using the passwd command).

(Don't use the web console unless it is an emergency, it will be glitchy. Also, while you may use x2go to connect to this VM, we recommend connecting via ssh for faster performance.)

Backups

The image provides an "scs-backup" command that will backup the student user's directory to the SCS linux machines. So if your SCS username is janedoe, you can type:

scs-backup janedoe

and it will create a copy of everything (note: you can customize it) in the student account in a directory called "COMP3000VM-backup" in your home directory. You can ssh/sftp to access.scs.carleton.ca in order to access this copy of your VM's files.

You should do backups at the end of every session and before you do anything dangerous. While the cluster is generally stable, you should be ready for everything in it to be erased at a moment's notice, because it could happen!

Note that you cannot take snapshots of your VM, so please don't try (it will keep trying and never succeed, and you'll make work for the tech staff who have to cancel what you did).

Background

Online Documetation (man pages)

The man (short for manual) command is one of the primary ways to to access built-in software documentation. Most software packages that provide command-line programs include man pages.

For almost any commands mentioned in the tutorials, you can use man to find the usage. While you can also find documentation online for these same commands, many have multiple variants that have different functionality. The man page is guaranteed to document the version installed on your system.

Man pages are divided into multiple sections, with each section having its own purpose, e.g., 1 for general commands, 2 for system calls, and 3 for library functions. You can specify the section as the first argument to man if there is more than one man page with the same name. For instance, tee is both a command (man 1 tee) and a system call (man 2 tee). The lowest number man page will be returned if the section is not specified.

Note that the topics in man pages go beyond just software & command manuals; they also include conventions and abstract concepts (e.g., man syscalls and man man-pages). Thus if you have questions, consider browsing the man pages rather than just going to a search engine.

The Shell

The shell or command line provides a text interface for running programs. While not as visually pleasing as a graphical interface, the shell provides a more clear representation of the functionality provided by the operating system.

To run a program contained in the current directory in the shell, you need to prefix the name of the command with a ./. This "./" tells the shell that the location of the command you wish to run is the current directory. By default, the shell will not search for executable commands in the current working directory. To run most system commands, the name of the command can be typed without a path specification.

Note that there are many kinds of shells, and people can be very opinionated about which shell is best. We will be normally using bash, but there are many others including ones that have been around forever (sh, csh, tcsh) and somewhat newer, more feature-filled shells (ksh and zsh). There are also shells that were first built for non-UNIX-like systems but now run on Linux (Powershell). Wikipedia has a nice article comparing the features of different command shells.

Shell Basics

Note that bash is the default shell on most Linux systems. Other UNIX-like systems can default to other shells like csh or tcsh; there are many alternatives such as zsh that you may prefer. When you change shells the syntax of the following operations can change; however, conceptually all UNIX-like shells provide the same basic functionality:

  • run external programs with command-line arguments
  • view and set environment variables
  • redirect program input and output using I/O redirection and pipes.
  • allow for the creation of scripts that combine external programs with built-in programming functionality.

Processes

Each application running on a system is assigned a unique process identifier. The ps command shows the process identifiers for running processes. Each process running on the system is kept separated from other processes by the operating system. This information will be useful for subsequent questions.

When you enter a command at a shell prompt, most of the time you are creating a new process which runs the program you specified.

Permissions

Your permission to access files in Unix is determined by who you are logged in. A logged in user has a user ID and belongs to one or more groups.

A file is always owned by someone and is always associated with a group. All files on the Unix file system (including directories and other special files) have three different sets of permissions:

  • owner permissions
  • group permissions
  • other permissions

Each of these have read, write, and/or execute permissions along with some other special permissions we'll discuss later.

The ls command with the -l option can be used to show both the permissions of a file as well as the owner and group associated with the file. Permissions are listed first, followed by the owner and the group.

Environment & Shell Variables

Environment variables on both Linux and Windows are variable-value pairs that are shared between processes that define important context-related information (such as the name of the current user, the current language, the timezone) for applications. The key advantage of environment variables is that they are available right when a program starts - they are given to it by the operating system.

In Linux, these environment variables can be printed on the command line in most shells by referring to the variable name prefixed with a $ sign (eg: to output the value in the HELLO environment variable, one could write echo $HELLO).

Most shells also have internal variables which are private to the shell process. Typically you can access shell and environment variables using the same mechanisms. By convention, shell variables are lower case or mixed case, while environment variables are all upper case. In bash, by default all variables are first shell variables. To make them environment variables, they must be "export"-ed. Thus

 X="Important Data"

just defines X for the current bash process. However, if you then type

 export X

X will be turned into an environment variable, and so every subsequent program will also get X. You can combine both in one line:

 export X="Important Data"

This is the idiom for setting environment variables normally.

To delete an environment variable, you can unset X.

One thing to remember with the above is that spaces are used to separate arguments in bash and most other UNIX shells. Thus it is an error to type:

 export X = "Important Data"

as you now are giving export three arguments, not one.

One of the key reasons people choose alternatives to bash is because of quirks like this!

Cotrolling Processes

On Linux, you can control processes by sending them signals.

You send signals when you type certain key sequences in most shells: Control-C sends INT (interrupt), Control-Z sends STOP.

You can send a signal to a process using the kill command:

 kill -<signal> <process ID>

So to stop process 4542, type

 kill -STOP 4542

By default, kill sends the TERM signal.

Downloading Code & Compiling Programs

To download C programs to your VM, use wget or curl commands:

wget https://people.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c
curl https://people.scs.carleton.ca/~soma/os-2017f/code/tut1/hello.c -o hello.c

To compile, use gcc:

gcc -O2 hello.c -o hello

This compiles it with level 2 optimization and without debugging symbols.

To run, you have to specify where it is:

./hello

Remember you can change directories using the cd command.

By default gcc-produced binaries are dynamically linked (so at runtime they will require dynamic libraries to be present on the system). To compile a binary that is statically linked (so it has no external runtime library dependencies), instead do this:

gcc -O2 -static hello.c -o hello

Tasks/Questions

  1. When you have logged in to a shell, how (i.e., using what commands?) do you first find out information about the environment?
    1. The version of your Linux distribution and the version of your Linux kernel.
    2. The name (binary path) of the current shell.
    3. RAM, disk space, and CPU.
  2. Using the man command, find out what the following commands do: which, pwd, who, whoami, env and whereis. Try using each of them.
  3. Linux commands can be classified as internal (built into the shell) and external (separate program binaries). How can you tell if a specific command (e.g., cd) is internal or external? Figure out where at least three external commands reside on the system.
  4. Making your own commands: the PATH environment variable lists the directories the shell uses to search for external commands. Where can you find documentation on it? How can you add the current directory (whichever directory you are currently in) to PATH? Then, how to make that change permanent? Try to identify multiple ways.
  5. Look at the permissions of the program binaries of the external commands you have just found above. Who owns them? What group are they in?
  6. For those same program binaries, figure out what the permission bits mean by reading the man page of chmod (this is the command you could use to change those permission bits).
  7. What are the owner, group, and permissions of /etc/passwd and /etc/shadow? What are these files used for?
  8. What does it mean to have execute permission on a directory?
  9. The ls command can be used to get a listing of the files in a directory. What options are passed to ls to see: the permission bits above; all the files within a directory (including hidden files)? How to make a file hidden?
  10. Compile and run csimpleshell.c. How does its functionality compare to that of bash? List at least 3 differences.

Code

csimpleshell.c

/* csimpleshell.c, Enrico Franchi © 2005
      https://web.archive.org/web/20170223203852/
      http://rik0.altervista.org/snippets/csimpleshell.html
      "BSD" license

   January 12, 2019: minor changes to eliminate most compilation warnings
   (Anil Somayaji, soma@scs.carleton.ca)
*/

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/wait.h>
#include <sys/types.h>
#define BUFFER_SIZE 1<<16
#define ARR_SIZE 1<<16

void parse_args(char *buffer, char** args, 
                size_t args_size, size_t *nargs)
{
    char *buf_args[args_size]; /* You need C99 */
    char **cp;
    char *wbuf;
    size_t i, j;
    
    wbuf=buffer;
    buf_args[0]=buffer; 
    args[0] =buffer;
    
    for(cp=buf_args; (*cp=strsep(&wbuf, " \n\t")) != NULL ;){
        if ((*cp != NULL) && (++cp >= &buf_args[args_size]))
            break;
    }
    
    for (j=i=0; buf_args[i]!=NULL; i++){
        if(strlen(buf_args[i])>0)
            args[j++]=buf_args[i];
    }
    
    *nargs=j;
    args[j]=NULL;
}


int main(int argc, char *argv[], char *envp[]){
    char buffer[BUFFER_SIZE];
    char *args[ARR_SIZE];

    int ret_status;
    size_t nargs;
    pid_t pid;
    
    while(1){
        printf("$ ");
        fgets(buffer, BUFFER_SIZE, stdin);
        parse_args(buffer, args, ARR_SIZE, &nargs); 

        if (nargs==0) continue;
        if (!strcmp(args[0], "exit" )) exit(0);       
        pid = fork();
        if (pid){
            printf("Waiting for child (%d)\n", pid);
            pid = wait(&ret_status);
            printf("Child (%d) finished\n", pid);
        } else {
            if( execvp(args[0], args)) {
                puts(strerror(errno));
                exit(127);
            }
        }
    }    
    return 0;
}