Carleton University COMP 3000
Fall 2019 Assignment 2 Solutions

1. [1] Assume you have a file "animal.c".  You type ln animal.c frog.c
in order to create file "frog.c".  If you rename "animal.c" to
"animal-base.c", what happens when you try to edit "frog.c"?  Why?

  A: Nothing happens to frog.c because frog.c is a hardlink to the
  same inode as animal.c.  Renaming or removing animal.c just affects
  that hard link, it doesn't affect the underlying inode (except maybe
  to change its reference count).

2. [1] Assume you have a file "animal.c".  You type ln -s animal.c
penguin.c in order to create file "penguin.c".  If you rename
"animal.c" to "animal-base.c", what happens when you try to edit
"penguin.c"?  Why?

  A: You'll get a "file not found" error because penguin.c is now a
  broken symbolic link.  It was referring to animal.c but that
  filename no longer exists.


3. [2] What symbols are referenced but not defined/allocated in the object file for 3000shell.c (when compiled with -c)? Where (ultimately) is the code or data referred to by these symbols?

  A: One way to get the undefined symbols is by running the following:

     nm 3000shell.o | grep " U "

     If you do this, you'll find the following output:
     
                 U close
                 U closedir
                 U creat
                 U __ctype_b_loc
                 U dup2
                 U __errno_location
                 U execve
                 U exit
                 U fgets
                 U fork
                 U __fprintf_chk
                 U fwrite
                 U _GLOBAL_OFFSET_TABLE_
                 U open
                 U opendir
                 U __printf_chk
                 U putchar
                 U puts
                 U read
                 U readdir
                 U sigaction
                 U __stack_chk_fail
                 U stderr
                 U stdin
                 U strcpy
                 U strerror
                 U strncmp
                 U strncpy
                 U strsep
                 U wait
                 U __xstat

  Every symbol that doesn't start with an underscore is a function
  that is part of the C library, so ultimately the code is in the
  dynamically linked C library (libc.so.6, exact path revealed by ldd
  of the compiled executable).  These symbols have to be further
  resolved by the linker in order to make an executable (stub entries
  have to be added).  You can see what is being passed to the linker
  (collect2) if you add the -v option to gcc; it turns out it is a bit
  complicated!

4. [2] Modify 3000test.c so it reports what the three times associated
with an inode, atime, mtime, and ctime (in a human-readable time
format).  What do these times mean and when are they updated on files
on your system?

  A: Insert this header around line 16:

        #include <time.h>

     Add the following lines around line 53:

        printf("  atime: %s", ctime(&(statbuf.st_atim.tv_sec)));
        printf("  mtime: %s", ctime(&(statbuf.st_mtim.tv_sec)));
        printf("  ctime: %s", ctime(&(statbuf.st_ctim.tv_sec)));

     As per the man page for stat(2), they mean:

        st_atime: This is the file's last access timestamp.
        st_mtime: This is the file's last modification timestamp.
        st_ctime: This is the file's last status change timestamp.

     If you experiment with modifying files, you'll see that mtime
     changes when you modify the contents of a file and ctime changes
     when you change metadata (e.g., change the permissions on a
     file).  While atime is supposed to be updated every time a file
     is accessed, on most modern Linux systems it won't be updated
     when that happens and instead it seems to be synonymous with
     mtime.  This is because Linux filesystems tend to be mounted with
     relatime or noatime options.  (We will discuss this later when we
     talk about filesystems.)
     
5. [2] How is the first argument passed to a function in x86-64 assembly? Give an example of this happening in assembly and the corresponding C code.

  A: The first argument is passed in the %rdi register.  For example,
  the following assebly code places the contents of the %rbx register
  into %rdi and then calls the function something:

        movq    %rbx, %rdi
        call    something

  This corresponds to the following C statement:

        something(buf);

  Note that fact that %rbx had the value of buf previously depends on
  how buf was processed previously.


6. [2] What x86-64 register is changed to allocate local variables?  Explain briefly with an example.

  A: The stack pointer register, %rsi, is decremented to store local
  variables that aren't stored in a register.  (Thus, an int may be in
  a register, but an array of bytes will be on the stack.)  For
  example, for a function declaring these local variables:

        int n, i;
        char buf[256];

  we see the following:
  
     	subq	$272, %rsp

  Note that 272 = 8 + 8 + 256 (the size of the ints plus the size of
  the string).

  NOT PART OF THE ANSWER BUT RELATED:

  To access a local variable, offsets relative to the %rsi register (if
  it hasn't been changed) or the base pointer register %rbp are used.
  For example, the C statement

        printf("Translation: %s\n", buf);

   turns into the following assembly:

	movq	%rsp, %rdx
	leaq	.LC1(%rip), %rsi
	movl	$1, %edi
	call	__printf_chk@PLT

   Note what is in each register (in argument order):

        %edi is 1, this version of printf needs the file descriptor
	where the output is going (stdout, fd 1)

        %rsi is a pointer to the constant string "Translation: %s\n"
	which is specified as an offset to the current instruction
	pointer (allowing the code to be loaded at an arbitrary
	address

        %rdx is the stack pointer, which happens to still be pointing to buf.

7. [2] Make a program <tt>redact-mmap.c</tt> that takes two arguments: a string s and a filename f.  <tt>redact</tt> should mmap f into its address space and then replace every occurance of s with X's (with the number of X's corresponding to the length of s).

  A: Code below, based on 3000copy-mmap.c.

/* 3000redact-mmap.c */
/* v0.1 October 5, 2019 */
/* Licenced under the GPLv3, copyright Anil Somayaji */
/* You really shouldn't be incorporating parts of this in any other code,
   it is meant for teaching, not production */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

#define BUFSIZE 4096

void report_error(char *context, char *error)
{
        fprintf(stderr, "Error in %s: %s\n", context, error);

        exit(-1);
}

int main(int argc, char *argv[])
{
        char *source_fn, *dest_fn, *redact_text, *replacement_text;
        struct stat statbuf;
        int source_fd, dest_fd;
        ssize_t len, i, redact_len;
        
        char *source_map, *dest_map;

        if (argc < 4) {
                fprintf(stderr, "Usage: %s <redaction string> "
                        "<source file> <dest file>\n",
                        argv[0]);
                report_error("command line", "Not enough arguments");
        }

        redact_text = argv[1];
        source_fn = argv[2];
        dest_fn = argv[3];

        redact_len = strlen(redact_text);
        replacement_text = (char *) malloc(redact_len * sizeof(char));
        for (i=0; i < redact_len; i++) {
                replacement_text[i] = 'X';
        }      

        printf("Redacting %s from %s to create %s\n",
               redact_text, source_fn, dest_fn);

        source_fd = open(source_fn, O_RDONLY);
        if (source_fd == -1) {                
                report_error("opening source", strerror(errno));
        }
        
        if (fstat(source_fd, &statbuf)) {
                report_error("stat", strerror(errno));
        }
        
        len = statbuf.st_size;

        source_map = (char *) mmap(NULL, len,
                                   PROT_READ, MAP_SHARED, source_fd, 0);
        if (source_map == MAP_FAILED) {
                report_error("source map", strerror(errno));
        }

        if (stat(dest_fn, &statbuf) == -1) {
                printf("%s does not exist, creating\n", dest_fn);
        } else {
                printf("%s exists, overwriting\n", dest_fn);
        }
        
        dest_fd = open(dest_fn, O_RDWR|O_CREAT|O_TRUNC, 0666);
        if (dest_fd == -1) {                
                report_error("opening dest", strerror(errno));
        }

        if (lseek(dest_fd, len-1, SEEK_SET) == -1) {
                report_error("lseek", strerror(errno));
        }
        
        if (write(dest_fd, "", 1) == -1) {
                report_error("write after lseek", strerror(errno));
        }
        
        dest_map = (char *) mmap(NULL, len,
                                 PROT_READ|PROT_WRITE,
                                 MAP_SHARED, dest_fd, 0);
        if (dest_map == MAP_FAILED) {
                report_error("dest map", strerror(errno));
        }
        
        printf("Copying bytes with redactions...\n");

        i = 0;
        while (i < len) {
                if ((source_map[i] == redact_text[0]) &&
                    (redact_len <= (len - i)) &&
                    (strncmp(source_map + i,
                             redact_text, redact_len) == 0)) {
                        /* printf("Redacting at offset %ld\n", i); */
                        memcpy(dest_map + i, replacement_text,
                               redact_len);
                        i = i + redact_len;
                } else {
                        dest_map[i] = source_map[i];
                        i++;
                }
        }
        
        if (munmap(source_map, len) == -1) {
                report_error("source munmap", strerror(errno));                
        }
        
        if (munmap(dest_map, len) == -1) {
                report_error("dest munmap", strerror(errno));                
        }
        
        if (close(source_fd) == -1) {
                report_error("closing source", strerror(errno));
        }
                
        if (close(dest_fd) == -1) {
                report_error("closing dest", strerror(errno));
        }

        printf("Done!\n");
        
        return 0;
}

8. [2] Make a program <tt>redact-rw.c</tt> that does the same thing as <tt>redact-mmap.c</tt> except that it uses reads and writes (of 4096 bytes at a time) rather than mmap. (You can ignore matches that cross block boundaries.) 

  A: Code below, based on 3000copy-rw.c.

/* 3000redact-rw.c */
/* v0.1 October 5, 2019 */
/* Licenced under the GPLv3, copyright Anil Somayaji */
/* You really shouldn't be incorporating parts of this in any other code,
   it is meant for teaching, not production */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

const int BUFSIZE=4096;

void report_error(char *context, char *error)
{
        fprintf(stderr, "Error in %s: %s\n", context, error);

        exit(-1);
}

int main(int argc, char *argv[])
{
        char *source_fn, *dest_fn, *redact_text, *replacement_text;
        struct stat statbuf;
        int source_fd, dest_fd;
        char buf[BUFSIZE];
        ssize_t len;
        int i, done, redact_len, written_count;
        
        if (argc < 4) {
                fprintf(stderr, "Usage: %s <redaction string> "
                        "<source file> <dest file>\n",
                        argv[0]);
                report_error("command line", "Not enough arguments");
        }
        
        redact_text = argv[1];
        source_fn = argv[2];
        dest_fn = argv[3];
        
        redact_len = strlen(redact_text);

        if (redact_len > BUFSIZE) {
                report_error("redaction length", "longer than block size");
        }
        
        replacement_text = (char *) malloc(redact_len * sizeof(char));
        for (i=0; i < redact_len; i++) {
                replacement_text[i] = 'X';
        }      

        printf("Redacting %s from %s to create %s\n",
               redact_text, source_fn, dest_fn);

        source_fd = open(source_fn, O_RDONLY);
        if (source_fd == -1) {                
                report_error("opening source", strerror(errno));
        }
        
        if (stat(dest_fn, &statbuf) == -1) {
                printf("%s does not exist, creating\n", dest_fn);
        } else {
                printf("%s exists, overwriting\n", dest_fn);
        }
        
        dest_fd = open(dest_fn, O_RDWR|O_CREAT|O_TRUNC, 0666);
        if (dest_fd == -1) {                
                report_error("opening dest", strerror(errno));
        }
        
        printf("Copying bytes with redactions...\n");

        done = 0;
        written_count = 0;
        while (!done) {
                len = read(source_fd, buf, BUFSIZE);

                if (len == -1) {
                        report_error("reading source", strerror(errno));
                }
                
                if (len > 0) {
                        for (i=0; i<len; i++) {
                                if ((buf[i] == redact_text[0]) &&
                                    (redact_len <= (len - i)) &&
                                    (strncmp(buf + i,
                                             redact_text, redact_len) == 0)) {
                                        /* printf("Redacting at offset %d\n",
                                           i + written_count); */
                                        memcpy(buf + i, replacement_text,
                                               redact_len);
                                        i = i + redact_len;
                                }
                        }

                        if (write(dest_fd, buf, len) == -1) {
                                report_error("writing dest", strerror(errno));
                        }
                        
                        written_count = written_count + len;
                } else {
                        done = 1;
                }
        }

        if (close(source_fd) == -1) {
                report_error("closing source", strerror(errno));
        }
                
        if (close(dest_fd) == -1) {
                report_error("closing dest", strerror(errno));
        }

        printf("Done!\n");

        return 0;
}

9. [3] Compare the performance of <tt>redact-mmap</tt> versus <tt>redact-rw</tt>.  Which is faster on large files?  Why do you think this is the case?  Try using the time command to benchmark your programs, making sure to do at least ten trials.

 A: I generated a source file with the following command:

      rm source; echo "This is a line of text" > source; for i in `seq 1
      18`; do cat source >> t; cat t >> source; done; rm t

    I then ran benchmarks as follows on a class vm, with nothing else running:

      for x in `seq 1 10`; do rm dest; time ./3000redact-rw line
      source dest; done 2> rw-times.txt

      for x in `seq 1 10`; do rm dest; time ./3000redact-mmap line
      source dest; done 2> mmap-times.txt

    The times for rw were as follows:

       real	0m4.524s
       user	0m1.046s
       sys	0m1.442s

       real	0m4.269s
       user	0m1.046s
       sys	0m1.267s

       real	0m4.034s
       user	0m1.220s
       sys	0m0.807s

       real	0m3.949s
       user	0m1.140s
       sys	0m1.235s

       real	0m4.439s
       user	0m1.146s
       sys	0m1.373s

       real	0m4.167s
       user	0m1.204s
       sys	0m0.910s

       real	0m4.219s
       user	0m1.215s
       sys	0m0.846s

       real	0m3.717s
       user	0m1.173s
       sys	0m1.305s

       real	0m3.935s
       user	0m1.095s
       sys	0m1.357s

       real	0m4.069s
       user	0m1.191s
       sys	0m0.916s

    The times for mmap were as follows:   
     
       real	0m5.709s
       user	0m1.809s
       sys	0m0.834s

       real	0m3.828s
       user	0m1.561s
       sys	0m0.698s

       real	0m3.897s
       user	0m1.641s
       sys	0m0.741s

       real	0m3.991s
       user	0m1.519s
       sys	0m0.790s

       real	0m3.647s
       user	0m1.373s
       sys	0m0.568s

       real	0m4.434s
       user	0m1.519s
       sys	0m0.571s

       real	0m3.784s
       user	0m1.394s
       sys	0m0.796s

       real	0m4.112s
       user	0m1.517s
       sys	0m0.601s

       real	0m3.514s
       user	0m1.450s
       sys	0m0.552s

       real	0m4.001s
       user	0m1.499s
       sys	0m0.556s

    Note these programs seem to run at roughly the same speed.  This
    suggests that their runtime is dominated by the cost of copying
    the file; the redaction work is not significant, and the different
    means of copying (mmap versus read/write) also aren't so
    significant.

10. [3] What happens when process A mmap's file f and process B appends to file f.  Will process A see the additional data added to the file by B (if A does no other system calls)?  Design an experiment and document your results.

  A: Below is the code for 3000mmap-wait.c.  If you run the following,
     the program will crash if source is much less than 10,000 bytes
     in length ($ = commants entered at the prompt):

       $ echo "This is a test" > source
       $ ./3000mmap-wait source 10000
       Mapping 10000 bytes of source
       Press a key to continue.
       <hit enter>
       Bus error (core dumped)

     However, if you append data with the following before hitting
     enter, say by running the following in another window

       for x in `seq 1 1000`; do echo "This is a test" >> source; done

     it will now work:

       $ echo "This is a test" > source
       $ ./3000mmap-wait source 10000
       Mapping 10000 bytes of source
       Press a key to continue.
       <press enter after above command runs>
       Last 10 bytes: This is a 
       Done!
   
     What this shows is that the the underlying file can change (if it
     is shared) and the mmap will change in real time, including
     allowing access to new data at the end of the file, assuming the
     length of the original mmap was past the original end of file.

/* 3000mmap-wait.c */
/* v0.1 October 5, 2019 */
/* Licenced under the GPLv3, copyright Anil Somayaji */
/* You really shouldn't be incorporating parts of this in any other code,
   it is meant for teaching, not production */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

const int SAMPLE_LEN = 10;
        
void report_error(char *context, char *error)
{
        fprintf(stderr, "Error in %s: %s\n", context, error);

        exit(-1);
}

int main(int argc, char *argv[])
{
        char *map_fn, s[SAMPLE_LEN + 1];
        int fd;
        int len;
        
        char *map;

        if (argc < 3) {
                fprintf(stderr, "Usage: %s <file to map> <length>\n",
                        argv[0]);
                report_error("command line", "Not enough arguments");
        }

        map_fn = argv[1];
        len = atoi(argv[2]);

        printf("Mapping %d bytes of %s\n", len, map_fn);

        fd = open(map_fn, O_RDONLY);
        if (fd == -1) {                
                report_error("opening file", strerror(errno));
        }

        map = (char *) mmap(NULL, len,
                                   PROT_READ, MAP_SHARED, fd, 0);

        if (map == MAP_FAILED) {
                report_error("mmap", strerror(errno));
        }
        
        printf("Press enter to continue.\n");
        getchar();
        
        strncpy(s, map + len - SAMPLE_LEN, SAMPLE_LEN);
        s[SAMPLE_LEN] = '\0';

        printf("Last %d bytes: %s\n", SAMPLE_LEN, s);
        
        if (munmap(map, len) == -1) {
                report_error("munmap", strerror(errno));                
        }
        
        if (close(fd) == -1) {
                report_error("closing file", strerror(errno));
        }

        printf("Done!\n");
        
        return 0;
}