Operating Systems 2021F Lecture 4
Video
Video from the lecture given on September 21, 2021 is now available:
Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)
Notes
Lecture 4 --------- Tutorial 1, you should be done with it - but if you aren't it is okay Tutorial 2 is out now, take about a week to do it Assignment 1 will be out shortly - it will be based on the tutorials 1 & 2, so working on them will prepare you for the assignment Today, I'm going to go through Tutorial 2 concepts & some Tutorial 1 concepts - note that there is a lot of interconnection of ideas If you see connections I haven't mentioned, feel free to mention or ask about them! Going forward, I will be doing polls in Zoom - I wasn't before because I didn't know how to get individual student results, but the stats are recorded - no more polls in Teams, will probably disable channel - but this means you need to have your name set properly here when you call in, otherwise we can't give you credit - polls and questions asked will dictate participation grades, yes Today: hello world! Only use the VPN to go to openstack.scs.carleton.ca - to manage your VMs Otherwise, just use ssh -J (or ssh -L with a separate ssh to localhost) Why will ssh fail? - first, make sure you can ssh to access, that should work - then, see if ssh -J works Things in UNIX-like systems are case sensitive - filenames - command-line arguments Quirk, but it is there, sorry (Windows and MacOS preserve case but are case insensitive, i.e., README and Readme are the same files there, but on Linux they are different files. Test this out!) You do not need to be on the Carleton network to ssh to access - that is another way to get on the Carleton network! (The VPN is for everyone, ssh is for CS-type folks) My desktop is Ubuntu 21.04, same as the class VMs. I *do not* post answers to Tutorials - Assignments will get answer posted - you should know if you understood the concept - if you aren't sure, ask! - many have multiple answers - the specific answers aren't the important part - make sure you understand the underlying concepts - you're building up a mental model of what is happening - if you search for an answer online, that is okay - but no point just giving the answer if you don't understand it - you should be doing more experiments than searches In assembly language - you have registers, like dedicated variables - you do most operations on registers - data is loaded into registers from main memory (specified by an address) - results in registers are saved to main memory When you first compile a C file, it gets turned into - an assembly source file, .S - an object file (the assembly file converted into machine code) - an executable binary (object files connected together to include all necessary code) Any given object file can't be run, they have to be connected with other code Registers on older CPUs corresponded to fixed special memory on the CPU - on modern CPUs, registers are virtual, they can refer to any number of areas of storage in the CPU - but this is all invisible from machine code, it behaves as if we were still dealing with a fixed bit of special purpose memory in the cpu that we can manipulate with mathematical and logical operations - you can only tell this is happening if you look at performance, e.g., benchmarks (or there are bugs in your CPU) (side thing: Participation means participation. So participate. Polls are a weak form of participation. Meaningful questions that help others in the class are better participation. We scale participation grades at the end based on relative participation, but you can expect to get ~B if you come to every class and participate in polls but say nothing else.) Old microprocessors had 10,000-100,000 transistors - modern ones have billions - lots of magic happening in the CPU! Why go through assembly to get to machine code? - because it is easier to have a tool chain - you'll have to create assembly anyway for some stuff - and, if your compiler outputs assembly it is easier to debug - otherwise you'll be constantly disassembling machine code In assembly language, things that start with . are generally assembly directives - basically, metadata used by the assembler for various purposes in assembly language - text that is flush left is normally a label - "call" instructions are function calls I don't expect you to understand all of assembly language - but I expect you to know what it is and how it relates to C - so when you need to, you know when you need to learn assembly (and when you can mostly avoid it) The linker combines object files - resolves references between them Note that hello is calling puts() rather than printf() - the compiler saw we weren't doing anything special with printf(), so replaced it with the simpler call to puts() I was expecting ltrace to show the call to puts(), but it didn't - it used to - I will follow up ltrace is supposed to show library calls - like calls to printf, puts - but it seems broken But note the verbose output of strace - lots of system calls system calls are requests to the kernel By default, binaries are compiled with dynamic linking - most library code is loaded at runtime Statically linked binaries add all library code at compile time - makes the binary much larger, especially for small programs What is a system call? Is it just a function call, but to different code? - is it using the "call" instruction in assembly? It turns out you can't directly make system calls in C - you make function calls to library code that then makes the system call - in those libraries, system calls are either inline assembly or using special non-standard compiler directives to generate the "syscall" assembly language instructions A library call is a function call to code in a library - we will get to fork If you can't see the poll, you're probably running an old version of Zoom, make sure to update - not sure about some polls not showing up, if that keeps happening let me know Why do system calls need special assembly language instructions? Why *can't* they be function calls? - note system calls don't specify an address - function calls specify a memory address to jump to System calls are invoking kernel code, and processes can't see kerrnel code (you can't have a valid pointer to kernel code) - pointers are a thin C abstraction over memory addresses So how do you call code that you don't have the address of? - that's the special system call instruction All C code compiles to machine code - regular programs and the kernel Nothing magic about kernel code - it just has access to more stuff So how do we specify the right system call, if not the address? - system calls have numbers On the class VM, you can find system call numbers here: /usr/include/x86_64-linux-gnu/asm/unistd_64.h System call numbers are constant for a given architecture - stable ABI (application binary interface) - i.e., programs don't have to be recompiled because of kernel-level changes - but new system calls get added