Operating Systems 2021F Lecture 4

From Soma-notes

Video

Video from the lecture given on September 21, 2021 is now available:

Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)

Notes

Lecture 4
---------

Tutorial 1, you should be done with it
 - but if you aren't it is okay

Tutorial 2 is out now, take about a week to do it

Assignment 1 will be out shortly
 - it will be based on the tutorials 1 & 2, so working on them will prepare you for the assignment

Today, I'm going to go through Tutorial 2 concepts & some Tutorial 1 concepts
 - note that there is a lot of interconnection of ideas

If you see connections I haven't mentioned, feel free to mention or ask about them!

Going forward, I will be doing polls in Zoom
 - I wasn't before because I didn't know how to get individual student results, but the stats are recorded
 - no more polls in Teams, will probably disable channel
 - but this means you need to have your name set properly here when you call in, otherwise we can't give you credit
    - polls and questions asked will dictate participation grades, yes

Today: hello world!

Only use the VPN to go to openstack.scs.carleton.ca
 - to manage your VMs

Otherwise, just use ssh -J  (or ssh -L with a separate ssh to localhost)

Why will ssh fail?
 - first, make sure you can ssh to access, that should work
 - then, see if ssh -J works

Things in UNIX-like systems are case sensitive
 - filenames
 - command-line arguments
Quirk, but it is there, sorry

(Windows and MacOS preserve case but are case insensitive, i.e., README and Readme are the same files there, but on Linux they are different files.  Test this out!)

You do not need to be on the Carleton network to ssh to access
 - that is another way to get on the Carleton network!

(The VPN is for everyone, ssh is for CS-type folks)

My desktop is Ubuntu 21.04, same as the class VMs.

I *do not* post answers to Tutorials
 - Assignments will get answer posted
 - you should know if you understood the concept
    - if you aren't sure, ask!
 - many have multiple answers
    - the specific answers aren't the important part
    - make sure you understand the underlying concepts
    - you're building up a mental model of what is happening
 - if you search for an answer online, that is okay
    - but no point just giving the answer if you don't understand it
 - you should be doing more experiments than searches

In assembly language
 - you have registers, like dedicated variables
    - you do most operations on registers
 - data is loaded into registers from main memory (specified by an address)
 - results in registers are saved to main memory 

When you first compile a C file, it gets turned into
 - an assembly source file, .S
 - an object file (the assembly file converted into machine code)
 - an executable binary (object files connected together to include all necessary code)

Any given object file can't be run, they have to be connected with other code

Registers on older CPUs corresponded to fixed special memory on the CPU
  - on modern CPUs, registers are virtual, they can refer to any number of areas of storage in the CPU
  - but this is all invisible from machine code, it behaves as if we were still dealing with a fixed bit of special purpose memory in the cpu that we can manipulate with mathematical and logical operations
     - you can only tell this is happening if you look at performance, e.g., benchmarks (or there are bugs in your CPU)

(side thing: Participation means participation.  So participate.  Polls are a weak form of participation.  Meaningful questions that help others in the class are better participation.  We scale participation grades at the end based on relative participation, but you can expect to get ~B if you come to every class and participate in polls but say nothing else.)

Old microprocessors had 10,000-100,000 transistors
 - modern ones have billions
 - lots of magic happening in the CPU!

Why go through assembly to get to machine code?
 - because it is easier to have a tool chain
 - you'll have to create assembly anyway for some stuff
 - and, if your compiler outputs assembly it is easier to debug
   - otherwise you'll be constantly disassembling machine code

In assembly language, things that start with . are generally assembly directives
 - basically, metadata used by the assembler for various purposes

in assembly language
 - text that is flush left is normally a label
 - "call" instructions are function calls

I don't expect you to understand all of assembly language
 - but I expect you to know what it is and how it relates to C
 - so when you need to, you know when you need to learn assembly
   (and when you can mostly avoid it)

The linker combines object files
 - resolves references between them

Note that hello is calling puts() rather than printf()
 - the compiler saw we weren't doing anything special with printf(), so replaced it with the simpler call to puts()

I was expecting ltrace to show the call to puts(), but it didn't
 - it used to
 - I will follow up
ltrace is supposed to show library calls
 - like calls to printf, puts
 - but it seems broken

But note the verbose output of strace
 - lots of system calls

system calls are requests to the kernel

By default, binaries are compiled with dynamic linking
 - most library code is loaded at runtime

Statically linked binaries add all library code at compile time
 - makes the binary much larger, especially for small programs

What is a system call?  Is it just a function call, but to different code?
 - is it using the "call" instruction in assembly?

It turns out you can't directly make system calls in C
 - you make function calls to library code that then makes
   the system call
 - in those libraries, system calls are either inline assembly or using special non-standard compiler directives to generate the "syscall" assembly language instructions

A library call is a function call to code in a library

 - we will get to fork

If you can't see the poll, you're probably running an old version of Zoom, make sure to update
 - not sure about some polls not showing up, if that keeps happening let me know

Why do system calls need special assembly language instructions?
Why *can't* they be function calls?
 - note system calls don't specify an address
 - function calls specify a memory address to jump to

System calls are invoking kernel code, and processes can't see kerrnel code (you can't have a valid pointer to kernel code)
 - pointers are a thin C abstraction over memory addresses


So how do you call code that you don't have the address of?
 - that's the special system call instruction

All C code compiles to machine code
 - regular programs and the kernel

Nothing magic about kernel code
 - it just has access to more stuff

So how do we specify the right system call, if not the address?
 - system calls have numbers

On the class VM, you can find system call numbers here:
  /usr/include/x86_64-linux-gnu/asm/unistd_64.h 

System call numbers are constant for a given architecture
 - stable ABI (application binary interface)
   - i.e., programs don't have to be recompiled because
     of kernel-level changes
 - but new system calls get added