Operating Systems 2021F Lecture 6

From Soma-notes

Video

Video from the lecture given on September 28, 2021 is now available:

Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)

Notes

Lecture 6
---------
 - deadlines
   - A1 is due Friday, Oct 1
      - but will be accepted until October 5th 10 AM
        no penalty
        (we'll discuss solutions in class)
   - but T1 & T2 are due by midnight on October 1st

   - Tutorial 3 came out yesterday, officially due in a week
      - but will be accepted until the official due date of A2

 - missing official deadlines don't have grading penalties
    - until the hard deadline
 - but if you miss one, you should consider yourself behind

What is the material for the midterm?
 - the two assignments
 - I literally take the assignment questions as a basis for
   the midterm questions (making a question up to cover the concepts of an assignment question or two)
 - midterm is all short answer, should be answerable without
   referring to notes or using a computer
    - but it is open book, open note, open internet
    - just NO COLLABORATION
 - remember there will be randomized interviews after
   - to make sure they were graded fairly
   - but yes, if it is clear what you know is different from
     what is on the test...we'll have a problem

 - My dog's name is Roshi


Topics for today
----------------
 - stack vs. heap
 - assembly directives
 - standard I/O, I/O redirection
 - terminals

To try and understand a memory map, helpful to run with setarch -R
 - disables ASLR (address space layout randomization),
   a security mitigation against code injection attacks

Note that the addresses in a process are consistent from run to run with setarch -R
 - because each process has its own address space
 - process address spaces are "virtual"
    - no direct correspondence to addresses of actual RAM in your
      computer (physical addresses)
 - OS manages virtual <-> physical address mappings
 - memview gives us a view of virtual addresses
    - cannot access physical addresses outside of the kernel
    - (page tables are the data structure that maintains
       virtual <-> physical mappings, but we'll talk about that
       later)

 - In a classic view of C programs, variables are stored in the
   "stack" or the "heap"
     - stack is for local variables
     - heap is for malloc'd variables
 - but the memory map is more complex

 - normally there is an area for runtime data storage in memory that holds the stack and the heap
    - heap is allocated starting from the bottom (lowest address) and the stack is allocated from the top (highest address)
    - sbrk(0) gives you the address of the beginnig of the free space in the heap
    - as you allocate things dynamically, sbrk(0) increases
    
 - note that code & static data (data fixed at compile time)
   are stored together (i.e., strings embedded in the code)
 
A segment is just a variable-sized area of memory
 - generally with specific semantics (purpose)
     .text is normally the code
     .rodata is read-only data known at compile time
 - main thing to know is the stack and the heap don't have
   anything really in the on-disk segments
     - at most you have a declaration
     - they are then dynamically allocated at runtime
 - stack/heap is really one segment used for dynamic allocation
    - but nowadays they are logically separate because we don't
      want the stack and the heap to ever collide
       - we add barriers between them
 - a "quad" is a 64-bit quantity
   - I think this is historic, quad word, when
     a "word" was 16 bits (i.e., when registers only held
     16 bits, not 64 bit quantities)

- With static linking, all references are resolved at link time
- With dynamic linking, some references have to be resolved at
runtime (the runtime dynamic linker has to run before main starts)

- this is why dynamically linked programs have many more system calls before main than do statically linked programs
  - it is literally loading files from disk, i.e., the C library

 - make sure you understand the output of memview
    - and try similar things in 3000quiz to see its memory map

I would suggest translating the output of 3000memview into a diagram
  - label it with addresses
  - see where things are relative to each other
  - you should see a clear picture
     - and if parts don't make sense, ask!

Standard I/O
 - you've learned about standard in, out, and error
 - correspond to file descriptors 0, 1, and 2 as we
   previously discussed
 - but how do we change them, and where do they point by default?

In most shells, you can use <, >, and | to redirect standard in and out
 - >  redirect standard output
 - <  redirect standard input
 - |  redirect the standard output of one program to the
      standard input of another

These operators can work with arbitrary file descriptors,
just give the number before them (for < and > at least)

Example, standard error redirection is 2>
 (so 1> is the same as >)

If I don't redirect standard in, out, and error, where are they going by default?
 - they are going to a file, but what file?

Whether you are in a graphical text window locally or have ssh'd to a remote system, you'll probably see bash's file descriptors referring to /dev/pts/? (where ? is a small number)

What is /dev/pts?
 - pseudo TTYs

A teletype is like a fancy telegraph
 - you type in things locally, they appear at a remote printer
 - remote person types, you see it locally, printed out
 - OLD fashion version of texting

When computers came around, early interactive interfaces were teletypes
 - computer was one side, rather than a human
 - was printing to paper rather than a screen
 - could produce A LOT of paper

Eventually the paper was replaced with a CRT (cathode ray tube)
 - see the VT100, a "video terminal"
 - teletype but with screen output, not paper output
 - pure text! (except sometimes for weird fonts with shapes)

There were LOTS of video terminals
 - many incompatibilities
 - so UNIX developed ways of dealing with different terminals
   - terminfo database
 - the TERM environment variable says which terminal you're using
 - when you connect to your class VM, what's the value of TERM
   for you?  It may be different from mine

Why does the type of terminal matter?
 - because not all text interfaces are the same!
 - when I run a program in a terminal that shows colors or positions text at specific locations on the screen, it is using special escape codes to do all that
    - it is just sending data to standard out, encoded so
      the terminal understands it
    - the terminfo files tells common libraries how to use
      common terminal capabilities

 - but how do you get a program like gnome-terminal to behave like a teletype, and how can ssh do the same?
   - they implement the pseudo tty interface via /dev/pts
   - any program that implements the pseudo tty interface
     can behave like a terminal to a program

So question, why does it have to be a special device?
 - because terminals are devices with special capabilities
   - special things for interacting with keyboard and screen
     - big thing is speed traditionally, we'd connect via
       modems and so buffering, local echo were necessary
 - different from a regular file

When you interact with a terminal, you're dealing with tech with 50+ years of history
 - lots of weird bits if you really dig in

Terminal tricks
 - reset state: reset
 - disable local echo (say for password entry): stty -echo

man stty to see all the things you can do with a terminal