Operating Systems 2019W: Assignment 1

From Soma-notes

Please submit the answers to the following questions via CULearn by 11:55 PM on January 29, 2019. There are 20 points in 11 questions.

Submit your answers as a single text file named "<username>-comp3000-assign1.txt" (where username is your MyCarletonOne username). The first four lines of this file should be "COMP 3000 Assignment 1", your name, student number, and the date of submission. You may wish to format your answers in Markdown to improve their appearance.

No other formats will be accepted. Submitting in another format will likely result in your assignment not being graded and you receiving no marks for this assignment. In particular do not submit an MS Word, OpenOffice, or PDF file as your answers document!

Don't forget to include what outside resources you used to complete each of your answers, including other students, man pages, and web resources. You do not need to list help from the instructor, TA, or information found in the textbook.

Note that all questions below refer to 3000shell.c from Tutorial 2.

Questions

  1. [1] When you run a program from 3000shell, what does that program get for file descriptors 0, 1, and 2? Assume that you are running 3000shell in a standard terminal on Linux.
  2. [1] If line 293 from 3000shell.c is removed, pid = wait(ret_status);, how will the behavior of 3000shell change?
  3. [1] How would the behaviour of 3000shell change if lines 299 and 300 were removed?
  4. [1] What does the call to setup_comm_fn() in line 167 of 3000shell.c do?
  5. [2] Are system calls used to set signal handlers? Use output from strace to show your answer is correct.
  6. [2] Does find_env() generate any system calls? Use output from strace to show your answer is correct.
  7. [2] Give the assembly code corresponding to the allocation of pattern[] on line 63. Note that this allocation may be for multiple variables. How do you know your answer is correct?
  8. [2] How can execve overwrite all of a process's memory while also preserving its file descriptors? Specifically, does preserving file descriptors require execve to avoid erasing some process memory?
  9. [2] Is the SA_RESTART flag on a signal handler (e.g., line 243) interpreted by code running in the process or in the kernel? Give evidence for your answer.
  10. [3] Implement a "listfiles" command that lists the files in the current directory. Give the code for the listfiles() function and show how you'd change the rest of the code so that typing "listfiles" runs this function. Your solution should not invoke an external executable; instead, it should do something similar to plist(). Show your changes by using diff -c.
  11. [3] Implement an "input from file" feature to 3000shell that works as follows. If you run
    $ bc infile=foo.bc
    then 3000shell should have bc read the contents of foo.bc from standard input. You may assume that the "infile=" parameter is the last one (other than perhaps &). Show your changes from the standard 3000shell.c using diff -c.

Solutions

Question 1

The program gets file descriptors 0, 1, and 2 from whatever ran 3000shell, as it does not change them. If 3000shell is run from a standard terminal, this will be a pseudo TTY, e.g. /dev/pts/0

Question 2

If this line is removed, then 3000shell will never wait for a process to finish (the wait system call is only invoked on SIGCHLD). Thus, all processes will be run in "the background".

Question 3

If lines 299 and 300 were removed, 3000shell would behave the same as before unless execve fails. Then, instead of printing the error, the execve would fail silently. Also, since the child doesn't exit, you'll have 2 processes running 3000shell (accessing the same terminal).

Question 4

The call to setup_comm_fn() takes the string e->d_name (called pidstr) and constructs the string "/proc/<pidstr>/comm", where <pidstr> is the value of pidstr and places it in comm_fn. This is a key part of implementing plist, as this is the file that contains the name of the program running in <pidstr> (as pidstr is actually a process ID represented as a string).

Question 5

System calls are used to set signal handlers. We know this because if we comment out the sigaction lines in 3000shell.c (245-251), the lines that register the signal handlers, then the output of strace changes.

...
mprotect(0x7f2488452000, 16384, PROT_READ) = 0
mprotect(0x555aba598000, 4096, PROT_READ) = 0
mprotect(0x7f2488683000, 4096, PROT_READ) = 0
munmap(0x7f2488667000, 113228)          = 0
* rt_sigaction(SIGCHLD, {sa_handler=0x555aba3971d0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f24880a9f20}, NULL, 8) = 0
* rt_sigaction(SIGHUP, {sa_handler=0x555aba3971d0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f24880a9f20}, NULL, 8) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0

The *'d lines are not present when lines 245-251 are commented out, but the other calls are the the same. (Note that the arguments to the system calls vary but their order is otherwise consistent.)

Question 6

find_env() does not generate any system calls. You can see this if you replace lines 253 and 254 with fixed values that are the same as they would be otherwise (e.g. "soma" and "/bin:/usr/bin") and note that the system calls made do not change. Specifically, if we run ls, we see something like this:

Signal setup:

 rt_sigaction(SIGCHLD, {sa_handler=0x555aba3971d0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f24880a9f20}, NULL, 8) = 0
 rt_sigaction(SIGHUP, {sa_handler=0x555aba3971d0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f24880a9f20}, NULL, 8) = 0

check standard output

fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0

allocate memory

brk(NULL)                               = 0x555abbc1f000
brk(0x555abbc40000)                     = 0x555abbc40000

check standard input

fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0

printf prompt:

write(1, "soma $ ", 7)                  = 7

Read input:

read(0, "ls\n", 1024)                   = 3

This pattern does not change. If find_env() generated system calls, there should be system calls between the sigaction ones and the write of the prompt that go away when the calls are removed on lines 253 and 254. But they stay the same.

Question 7

The assembly code is:

subq $152, %rsp

We know this code is correct as this is the only code that changes when we change the the size of pattern[] (by changing the value of MAXPATTERN). "diff -c" between the assembly files shows this difference. (This instruction subtracts from the stack pointer and thus allocates space on the stack.)

Question 8

The kernel doesn't have to specifically preserve any of a process's address space to preserve file descriptors because the file descriptors are stored in a kernel data structure, not in process-visible memory. From userspace, a file descriptor is just a small number; in the kernel, however, the file descriptor is an index into an array of structs that represent the state of open files for a process. Thus throwing away a process's address space on execve doesn't destroy the record of what files were open.

Question 9

SA_RESTART is handled in userspace by the C library, not by the kernel. We can see this by doing the following experiment. Run

strace -o restart.log ./3000shell

At the 3000shell prompt, enter the commands "ls &" and then "exit". Next, comment out the SA_RESTART line in 3000shell.c (lin 243), compile it to 3000shell-norestart. Then run

strace -o norestart.log ./3000shell-norestart

Enter "ls &". The command will run and the shell will exit on its own.

If you compare these two traces, you'll see something like this for the original code:

Write the prompt, read "ls &"

 write(1, "soma $ ", 7)                  = 7
 read(0, "ls &\n", 1024)                 = 5

Fork the child, report it is in the background, print a new prompt

 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f5477f7a7d0) = 14692
 write(2, "Process 14692 running in the bac"..., 41) = 41
 write(1, "soma $ ", 7)                  = 7

Wait for user input, but the read gets interrupted by SIGCHLD

 read(0, 0x5594b3dfa670, 1024)           = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14692, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---

Run the signal handler

 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14692
 write(2, "\nProcess 14692 exited with statu"..., 37) = 37
 rt_sigreturn({mask=[]})                 = 0

Read user input again. User presses return (to clear up mess from ls) then types exit:

 read(0, "\n", 1024)                     = 1
 write(1, "soma $ ", 7)                  = 7
 read(0, "exit\n", 1024)                 = 5

Program terminates

 exit_group(0)                           = ?

In contrast, without SA_RESTART, we see the following:

This is the same:

 write(1, "soma $ ", 7)                  = 7
 read(0, "ls &\n", 1024)                 = 5
 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4edea277d0) = 14934
 write(2, "Process 14934 running in the bac"..., 41) = 41
 write(1, "soma $ ", 7)                  = 7
 read(0, 0x5608fcb95670, 1024)           = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=14934, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
 wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 14934
 write(2, "\nProcess 14934 exited with statu"..., 37) = 37
 rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)

But after the signal handler returns, we just go into the shell exit condition (print a newline and exit):

 write(1, "\n", 1)                       = 1
 exit_group(0)                           = ?

So restart causes an additional read system call to be made after the signal handler returns. This system call originates in userspace, so it has nothing to do with the kernel. The C library thus handles SA_RESTART.

Question 10

*** 3000shell.c 2017-09-24 22:03:24.000000000 -0400
--- 3000shell-listfiles.c       2019-02-02 11:56:39.759148682 -0500
***************
*** 54,59 ****
--- 54,85 ----
  }
  
  
+ void listfiles(void)
+ {
+         DIR *pwd;
+         struct dirent *e;
+         int result;
+ 
+         pwd = opendir(".");
+ 
+         if (pwd == NULL) {
+                 fprintf(stderr, "Couldn't open current directory.\n");
+                 return;
+         }
+ 
+         e = readdir(pwd);
+         while (e != NULL) {
+                 printf("%s\n", e->d_name);
+                 e = readdir(pwd);
+         }
+ 
+         result = closedir(pwd);
+         if (result) {
+                 fprintf(stderr, "Couldn't close current directory.\n");
+         }
+ }
+ 
+ 
  /* this is kind of like getenv() */
  char *find_env(char *envvar, char *notfound, char *envp[])
  {
***************
*** 276,281 ****
--- 302,312 ----
                      continue;
              }
  
+             if (!strcmp(args[0], "listfiles")) {
+                     listfiles();
+                     continue;
+             }
+ 
              background = 0;            
              if (strcmp(args[nargs-1], "&") == 0) {
                      background = 1;


Question 11

*** 3000shell.c	2017-09-24 22:03:24.000000000 -0400
--- 3000shell-input.c	2019-02-02 15:10:49.857329201 -0500
***************
*** 134,139 ****
--- 134,179 ----
          }
  }
  
+ 
+ int redirect_input(const char* infile_str)
+ {
+         int fd, result;
+ 
+         fd = open(infile_str, O_RDONLY);
+ 
+         if (fd == -1) {
+                 return 0;
+         } else {
+                 if (dup2(fd, 0) == -1) {
+                         return 0;
+                 } else {
+                         return 1;
+                 }               
+         }
+ }
+ 
+ 
+ const char *extract_input_fn(const char *arg)
+ {
+         const char *input_prefix = "infile=";
+         const int prefix_len = strlen(input_prefix);
+ 
+         const char *input_file;
+ 
+         if (strncmp(arg, input_prefix, prefix_len) != 0) {
+                 return NULL;
+         }
+         
+         input_file = arg + prefix_len;
+ 
+         if (input_file[0] != '\0') {
+                 return input_file;
+         } else {
+                 return NULL;
+         }
+ }
+ 
+ 
  void setup_comm_fn(char *pidstr, char *comm_fn)
  {
          char *c;
***************
*** 225,231 ****
      char *args[ARR_SIZE];
      char bin_fn[BUFFER_SIZE];
      struct sigaction signal_handler_struct;
!     char *s;    
  
      int *ret_status = NULL;
      size_t nargs;
--- 265,273 ----
      char *args[ARR_SIZE];
      char bin_fn[BUFFER_SIZE];
      struct sigaction signal_handler_struct;
!     char *s;
!     int i, j;
!     const char *input_fn;
  
      int *ret_status = NULL;
      size_t nargs;
***************
*** 294,299 ****
--- 336,364 ----
                      }
              } else {
                      find_binary(args[0], path, bin_fn, BUFFER_SIZE);
+ 
+                     input_fn = NULL;
+                     for (i = 1; i < nargs; i++) {
+                             input_fn = extract_input_fn(args[i]);
+                             if (input_fn) {
+                                     args[i] = NULL;
+                                     for (j = 0; j < nargs-(i+1); j++) {
+                                             args[j+i] = args[j+i+1];
+                                     }
+                                     nargs--;
+                                     break;
+                             }
+                     }
+ 
+                     if (input_fn) {
+                             if (!redirect_input(input_fn)) {
+                                     fprintf(stderr,
+                                             "Could not open %s for "
+                                             "redirection, aborting.\n",
+                                             input_fn);
+                                     exit(128);
+                             } 
+                     }
                      
                      if (execve(bin_fn, args, envp)) {
                              puts(strerror(errno));