Carleton University COMP 3000 Fall 2019 Assignment 2 Solutions 1. [1] Assume you have a file "animal.c". You type ln animal.c frog.c in order to create file "frog.c". If you rename "animal.c" to "animal-base.c", what happens when you try to edit "frog.c"? Why? A: Nothing happens to frog.c because frog.c is a hardlink to the same inode as animal.c. Renaming or removing animal.c just affects that hard link, it doesn't affect the underlying inode (except maybe to change its reference count). 2. [1] Assume you have a file "animal.c". You type ln -s animal.c penguin.c in order to create file "penguin.c". If you rename "animal.c" to "animal-base.c", what happens when you try to edit "penguin.c"? Why? A: You'll get a "file not found" error because penguin.c is now a broken symbolic link. It was referring to animal.c but that filename no longer exists. 3. [2] What symbols are referenced but not defined/allocated in the object file for 3000shell.c (when compiled with -c)? Where (ultimately) is the code or data referred to by these symbols? A: One way to get the undefined symbols is by running the following: nm 3000shell.o | grep " U " If you do this, you'll find the following output: U close U closedir U creat U __ctype_b_loc U dup2 U __errno_location U execve U exit U fgets U fork U __fprintf_chk U fwrite U _GLOBAL_OFFSET_TABLE_ U open U opendir U __printf_chk U putchar U puts U read U readdir U sigaction U __stack_chk_fail U stderr U stdin U strcpy U strerror U strncmp U strncpy U strsep U wait U __xstat Every symbol that doesn't start with an underscore is a function that is part of the C library, so ultimately the code is in the dynamically linked C library (libc.so.6, exact path revealed by ldd of the compiled executable). These symbols have to be further resolved by the linker in order to make an executable (stub entries have to be added). You can see what is being passed to the linker (collect2) if you add the -v option to gcc; it turns out it is a bit complicated! 4. [2] Modify 3000test.c so it reports what the three times associated with an inode, atime, mtime, and ctime (in a human-readable time format). What do these times mean and when are they updated on files on your system? A: Insert this header around line 16: #include Add the following lines around line 53: printf(" atime: %s", ctime(&(statbuf.st_atim.tv_sec))); printf(" mtime: %s", ctime(&(statbuf.st_mtim.tv_sec))); printf(" ctime: %s", ctime(&(statbuf.st_ctim.tv_sec))); As per the man page for stat(2), they mean: st_atime: This is the file's last access timestamp. st_mtime: This is the file's last modification timestamp. st_ctime: This is the file's last status change timestamp. If you experiment with modifying files, you'll see that mtime changes when you modify the contents of a file and ctime changes when you change metadata (e.g., change the permissions on a file). While atime is supposed to be updated every time a file is accessed, on most modern Linux systems it won't be updated when that happens and instead it seems to be synonymous with mtime. This is because Linux filesystems tend to be mounted with relatime or noatime options. (We will discuss this later when we talk about filesystems.) 5. [2] How is the first argument passed to a function in x86-64 assembly? Give an example of this happening in assembly and the corresponding C code. A: The first argument is passed in the %rdi register. For example, the following assebly code places the contents of the %rbx register into %rdi and then calls the function something: movq %rbx, %rdi call something This corresponds to the following C statement: something(buf); Note that fact that %rbx had the value of buf previously depends on how buf was processed previously. 6. [2] What x86-64 register is changed to allocate local variables? Explain briefly with an example. A: The stack pointer register, %rsi, is decremented to store local variables that aren't stored in a register. (Thus, an int may be in a register, but an array of bytes will be on the stack.) For example, for a function declaring these local variables: int n, i; char buf[256]; we see the following: subq $272, %rsp Note that 272 = 8 + 8 + 256 (the size of the ints plus the size of the string). NOT PART OF THE ANSWER BUT RELATED: To access a local variable, offsets relative to the %rsi register (if it hasn't been changed) or the base pointer register %rbp are used. For example, the C statement printf("Translation: %s\n", buf); turns into the following assembly: movq %rsp, %rdx leaq .LC1(%rip), %rsi movl $1, %edi call __printf_chk@PLT Note what is in each register (in argument order): %edi is 1, this version of printf needs the file descriptor where the output is going (stdout, fd 1) %rsi is a pointer to the constant string "Translation: %s\n" which is specified as an offset to the current instruction pointer (allowing the code to be loaded at an arbitrary address %rdx is the stack pointer, which happens to still be pointing to buf. 7. [2] Make a program redact-mmap.c that takes two arguments: a string s and a filename f. redact should mmap f into its address space and then replace every occurance of s with X's (with the number of X's corresponding to the length of s). A: Code below, based on 3000copy-mmap.c. /* 3000redact-mmap.c */ /* v0.1 October 5, 2019 */ /* Licenced under the GPLv3, copyright Anil Somayaji */ /* You really shouldn't be incorporating parts of this in any other code, it is meant for teaching, not production */ #include #include #include #include #include #include #include #include #include #define BUFSIZE 4096 void report_error(char *context, char *error) { fprintf(stderr, "Error in %s: %s\n", context, error); exit(-1); } int main(int argc, char *argv[]) { char *source_fn, *dest_fn, *redact_text, *replacement_text; struct stat statbuf; int source_fd, dest_fd; ssize_t len, i, redact_len; char *source_map, *dest_map; if (argc < 4) { fprintf(stderr, "Usage: %s " " \n", argv[0]); report_error("command line", "Not enough arguments"); } redact_text = argv[1]; source_fn = argv[2]; dest_fn = argv[3]; redact_len = strlen(redact_text); replacement_text = (char *) malloc(redact_len * sizeof(char)); for (i=0; i < redact_len; i++) { replacement_text[i] = 'X'; } printf("Redacting %s from %s to create %s\n", redact_text, source_fn, dest_fn); source_fd = open(source_fn, O_RDONLY); if (source_fd == -1) { report_error("opening source", strerror(errno)); } if (fstat(source_fd, &statbuf)) { report_error("stat", strerror(errno)); } len = statbuf.st_size; source_map = (char *) mmap(NULL, len, PROT_READ, MAP_SHARED, source_fd, 0); if (source_map == MAP_FAILED) { report_error("source map", strerror(errno)); } if (stat(dest_fn, &statbuf) == -1) { printf("%s does not exist, creating\n", dest_fn); } else { printf("%s exists, overwriting\n", dest_fn); } dest_fd = open(dest_fn, O_RDWR|O_CREAT|O_TRUNC, 0666); if (dest_fd == -1) { report_error("opening dest", strerror(errno)); } if (lseek(dest_fd, len-1, SEEK_SET) == -1) { report_error("lseek", strerror(errno)); } if (write(dest_fd, "", 1) == -1) { report_error("write after lseek", strerror(errno)); } dest_map = (char *) mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, dest_fd, 0); if (dest_map == MAP_FAILED) { report_error("dest map", strerror(errno)); } printf("Copying bytes with redactions...\n"); i = 0; while (i < len) { if ((source_map[i] == redact_text[0]) && (redact_len <= (len - i)) && (strncmp(source_map + i, redact_text, redact_len) == 0)) { /* printf("Redacting at offset %ld\n", i); */ memcpy(dest_map + i, replacement_text, redact_len); i = i + redact_len; } else { dest_map[i] = source_map[i]; i++; } } if (munmap(source_map, len) == -1) { report_error("source munmap", strerror(errno)); } if (munmap(dest_map, len) == -1) { report_error("dest munmap", strerror(errno)); } if (close(source_fd) == -1) { report_error("closing source", strerror(errno)); } if (close(dest_fd) == -1) { report_error("closing dest", strerror(errno)); } printf("Done!\n"); return 0; } 8. [2] Make a program redact-rw.c that does the same thing as redact-mmap.c except that it uses reads and writes (of 4096 bytes at a time) rather than mmap. (You can ignore matches that cross block boundaries.) A: Code below, based on 3000copy-rw.c. /* 3000redact-rw.c */ /* v0.1 October 5, 2019 */ /* Licenced under the GPLv3, copyright Anil Somayaji */ /* You really shouldn't be incorporating parts of this in any other code, it is meant for teaching, not production */ #include #include #include #include #include #include #include #include #include const int BUFSIZE=4096; void report_error(char *context, char *error) { fprintf(stderr, "Error in %s: %s\n", context, error); exit(-1); } int main(int argc, char *argv[]) { char *source_fn, *dest_fn, *redact_text, *replacement_text; struct stat statbuf; int source_fd, dest_fd; char buf[BUFSIZE]; ssize_t len; int i, done, redact_len, written_count; if (argc < 4) { fprintf(stderr, "Usage: %s " " \n", argv[0]); report_error("command line", "Not enough arguments"); } redact_text = argv[1]; source_fn = argv[2]; dest_fn = argv[3]; redact_len = strlen(redact_text); if (redact_len > BUFSIZE) { report_error("redaction length", "longer than block size"); } replacement_text = (char *) malloc(redact_len * sizeof(char)); for (i=0; i < redact_len; i++) { replacement_text[i] = 'X'; } printf("Redacting %s from %s to create %s\n", redact_text, source_fn, dest_fn); source_fd = open(source_fn, O_RDONLY); if (source_fd == -1) { report_error("opening source", strerror(errno)); } if (stat(dest_fn, &statbuf) == -1) { printf("%s does not exist, creating\n", dest_fn); } else { printf("%s exists, overwriting\n", dest_fn); } dest_fd = open(dest_fn, O_RDWR|O_CREAT|O_TRUNC, 0666); if (dest_fd == -1) { report_error("opening dest", strerror(errno)); } printf("Copying bytes with redactions...\n"); done = 0; written_count = 0; while (!done) { len = read(source_fd, buf, BUFSIZE); if (len == -1) { report_error("reading source", strerror(errno)); } if (len > 0) { for (i=0; iredact-mmap versus redact-rw. Which is faster on large files? Why do you think this is the case? Try using the time command to benchmark your programs, making sure to do at least ten trials. A: I generated a source file with the following command: rm source; echo "This is a line of text" > source; for i in `seq 1 18`; do cat source >> t; cat t >> source; done; rm t I then ran benchmarks as follows on a class vm, with nothing else running: for x in `seq 1 10`; do rm dest; time ./3000redact-rw line source dest; done 2> rw-times.txt for x in `seq 1 10`; do rm dest; time ./3000redact-mmap line source dest; done 2> mmap-times.txt The times for rw were as follows: real 0m4.524s user 0m1.046s sys 0m1.442s real 0m4.269s user 0m1.046s sys 0m1.267s real 0m4.034s user 0m1.220s sys 0m0.807s real 0m3.949s user 0m1.140s sys 0m1.235s real 0m4.439s user 0m1.146s sys 0m1.373s real 0m4.167s user 0m1.204s sys 0m0.910s real 0m4.219s user 0m1.215s sys 0m0.846s real 0m3.717s user 0m1.173s sys 0m1.305s real 0m3.935s user 0m1.095s sys 0m1.357s real 0m4.069s user 0m1.191s sys 0m0.916s The times for mmap were as follows: real 0m5.709s user 0m1.809s sys 0m0.834s real 0m3.828s user 0m1.561s sys 0m0.698s real 0m3.897s user 0m1.641s sys 0m0.741s real 0m3.991s user 0m1.519s sys 0m0.790s real 0m3.647s user 0m1.373s sys 0m0.568s real 0m4.434s user 0m1.519s sys 0m0.571s real 0m3.784s user 0m1.394s sys 0m0.796s real 0m4.112s user 0m1.517s sys 0m0.601s real 0m3.514s user 0m1.450s sys 0m0.552s real 0m4.001s user 0m1.499s sys 0m0.556s Note these programs seem to run at roughly the same speed. This suggests that their runtime is dominated by the cost of copying the file; the redaction work is not significant, and the different means of copying (mmap versus read/write) also aren't so significant. 10. [3] What happens when process A mmap's file f and process B appends to file f. Will process A see the additional data added to the file by B (if A does no other system calls)? Design an experiment and document your results. A: Below is the code for 3000mmap-wait.c. If you run the following, the program will crash if source is much less than 10,000 bytes in length ($ = commants entered at the prompt): $ echo "This is a test" > source $ ./3000mmap-wait source 10000 Mapping 10000 bytes of source Press a key to continue. Bus error (core dumped) However, if you append data with the following before hitting enter, say by running the following in another window for x in `seq 1 1000`; do echo "This is a test" >> source; done it will now work: $ echo "This is a test" > source $ ./3000mmap-wait source 10000 Mapping 10000 bytes of source Press a key to continue. Last 10 bytes: This is a Done! What this shows is that the the underlying file can change (if it is shared) and the mmap will change in real time, including allowing access to new data at the end of the file, assuming the length of the original mmap was past the original end of file. /* 3000mmap-wait.c */ /* v0.1 October 5, 2019 */ /* Licenced under the GPLv3, copyright Anil Somayaji */ /* You really shouldn't be incorporating parts of this in any other code, it is meant for teaching, not production */ #include #include #include #include #include #include #include #include #include const int SAMPLE_LEN = 10; void report_error(char *context, char *error) { fprintf(stderr, "Error in %s: %s\n", context, error); exit(-1); } int main(int argc, char *argv[]) { char *map_fn, s[SAMPLE_LEN + 1]; int fd; int len; char *map; if (argc < 3) { fprintf(stderr, "Usage: %s \n", argv[0]); report_error("command line", "Not enough arguments"); } map_fn = argv[1]; len = atoi(argv[2]); printf("Mapping %d bytes of %s\n", len, map_fn); fd = open(map_fn, O_RDONLY); if (fd == -1) { report_error("opening file", strerror(errno)); } map = (char *) mmap(NULL, len, PROT_READ, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { report_error("mmap", strerror(errno)); } printf("Press enter to continue.\n"); getchar(); strncpy(s, map + len - SAMPLE_LEN, SAMPLE_LEN); s[SAMPLE_LEN] = '\0'; printf("Last %d bytes: %s\n", SAMPLE_LEN, s); if (munmap(map, len) == -1) { report_error("munmap", strerror(errno)); } if (close(fd) == -1) { report_error("closing file", strerror(errno)); } printf("Done!\n"); return 0; }