Operating Systems 2021F Lecture 5

From Soma-notes
Jump to navigation Jump to search


Video from the lecture given on September 23, 2021 is now available:

Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)


Lecture 5
- update on Assignment 1 (almost there!)
- more on Tutorial 1 & 2 material

Assignment 1
 - almost done! it is incomplete currently
    - adding more questions
    - current ones you can go ahead and answer, at most they'll be renumbered
 - Due October 1st, by 11:59 PM
 - Template will be posted when complete, should be by tomorrow (end of today, but my end of today can be late)

October 1 is also the due date for Tutorials 1 & 2
 - because what's the point of having them due afterwards, since they are
   to help you with A1?
 - brightspace due date is a week after being released, but you can submit with no penalty until the assignment is due
    - you should try to hit the brightspace deadline to not get behind

We'll hold your hand for the tutorials, the assignments are where you show you learned the stuff.

You need to know enough assembly to answer the questions
 - same with everything else :-)
 - lots of material in OS, I have to guide you in what to learn,
   that's what the tutorials and assignments are for

I'm opening Tutorial 2 submissions after class
 - sorry, forgot to do that

Do the tutorials first, then do the assignments
 - but don't spend too much time trying to get the tutorials "perfect"

Most of these questions can be asked in the forums
 - try to limit it to pressing matters

Tutorial 2
 - if you compile with "-z lazy", regular ltrace will work with the binary
   (show calls to standard library functions
 - without this option, ltrace won't show much
 - if you add '-x "*"', you'll get way too much
    - many function calls inside of the library
    - you can limit it in various ways, the * is for pattern matching
 - ltrace used to work on default binaries, but they were "hardened" to prevent certain attacks.  The hardening also broke default ltrace

ltrace is for library calls
strace is for system calls

Remember, library calls are function calls
 - code is in your process's address space, you just didn't write it

System calls leave your process and go to the kernel
 - strace is almost always showing system calls that were made by library functions
 - you can't make system calls in regular C code, it requires special
   assembly code ("syscall")

command line arguments and environment variables
 - data that is given to a process when a new program is run with
   the execve system call
 - execvp is a library call that later makes the execve system call
 - this data is in the address space of the process, just like other variables
    - but they don't come from the executable file

command line argument & environment variables can be passed to main
 - but who passes it to main?
 - directly, the kernel via the execve system call that loaded the program
 - indirectly, the previous code that made the execve system call

(Chat history is all on brightspace, along with the lecture video
  - just go to Zoom/class lectures, cloud recordings)

I don't post the chat on my wiki because it has your names in it
 - I don't want that to be public

What sort of data structures are environment variables & command line arguments stored in?
 - arrays of strings

But there's one problem.  These arrays can be arbitrarily sized.  And we don't have a count of how many entries are in them.  (well, we do for arguments, but not for environment variable)
 - note that the execve system call doesn't have a length parameter for
   the argv array or the envp array
 - array length is like strings in C, they are NULL-terminated

Question: what is actually stored in the environment variable array?  How is it organized (beyond being an array of strings)?  Note that it associates names with values.

So let's start talking about system calls
 - you all know read & write, but to review
    - read & write takes a file descriptor, a buffer, and a count, and returns a count
 - what's the buffer? <-- just memory in the process
 - what is the file descriptor argument?

A file descriptor identifies a file, but it is just a number
 - the number is returned by the open system call
    - so, it corresponds to the file passed in to open

The numbers start at 0 and go up

who interprets file descriptors?
 - they are for the kernel
 - they are per process, every process has an array of open files
    - the file descriptor is an index into that array in the kernel
 - when I say an array of open files, we don't know yet what
   that is an array of
     (but it is a complex struct)

Can I see what files are currently open by a process from the shell?
 - YES

The kernel has a lot of interesting information, and it exposes most of this information through special filesystems
 - /proc and /sys
 - example: info on current processes is in /proc/<PID>
    - directory with many files and directories of info
 - when you run ps, it is just looking through /proc

Question: are the files in /proc "real"?
 - are they on disk taking up space?
 - NO, they are representations of kernel data structures

Are argv and envp kernel data structures?
 - NO
 - they are part of the process, they are process-level data structures
    - kernel can see them, because the kernel can see everything
 - kernel data structures cannot be seen or directly manipulated
   by a process

We can't find the address of a file, a file descriptor is just a small number
 - they are NOT pointers
 - they are indices into an array that the process cannot directly access
    - an array of file structs that belong to the kernel,
      outside the process's address space
    - (the process cannot get a pointer to it)

(When you are in the kernel, you have to manipulate pointers to open file data structures.  But regular processes can't, they just have file descriptors that are passed to system calls, e.g., read, write, close)

So what is close?
 - it is request to destroy/free the file struct associated with the open file
   - to prevent resource usage leaks

C library has file pointers
 - these are wrappers around file descriptors
 - nothing to do with kernel-level file data structures

How else can you release the resources associated with an open file, other than close?
 - terminate the process :-)

So what about standard in, out, and error?
 - they are file descriptors 0, 1, and 2 respectively

Does the kernel treat file descriptors 0, 1, and 2 differently?
 - by default, printf writes to 1
 - by default, scanf reads from 0
 - and errors are directed to 2

The kernel treats fd 0, 1, and 2 exactly like any other open file
 - standard in, out, and error are pure conventions for how those
   file descriptors are used
 - userspace convention, kernel has nothing to do with it and
   in fact doesn't care