Operating Systems 2021F Lecture 7: Difference between revisions
Created page with "==Video== Video from the lecture given on September 28, 2021 is now available: * [https://homeostasis.scs.carleton.ca/~soma/os-2021f/lectures/comp3000-2021f-lec07-20210930.m4..." |
No edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
==Video== | ==Video== | ||
Video from the lecture given on September | Video from the lecture given on September 30, 2021 is now available: | ||
* [https://homeostasis.scs.carleton.ca/~soma/os-2021f/lectures/comp3000-2021f-lec07-20210930.m4v video] | * [https://homeostasis.scs.carleton.ca/~soma/os-2021f/lectures/comp3000-2021f-lec07-20210930.m4v video] | ||
* [https://homeostasis.scs.carleton.ca/~soma/os-2021f/lectures/comp3000-2021f-lec07-20210930.cc.vtt auto-generated captions] | * [https://homeostasis.scs.carleton.ca/~soma/os-2021f/lectures/comp3000-2021f-lec07-20210930.cc.vtt auto-generated captions] | ||
Line 36: | Line 36: | ||
I want citations (in part) so I can know where good & bad info is coming from | I want citations (in part) so I can know where good & bad info is coming from | ||
https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw | |||
Remember that assignment will be split up by question | Remember that assignment will be split up by question | ||
- so citations at end won't go with question | - so citations at end won't go with question | ||
Line 82: | Line 82: | ||
(I will research spawn and get back to you) | (I will research spawn and get back to you) | ||
</pre> | |||
Added after lecture: | |||
* [https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw CreateProcess()] is the standard way to make processes in Win32 (there are ANSI and Unicode-accepting variants) | |||
* [https://en.wikipedia.org/wiki/Spawn_(computing) spawn] is a fork+execve combo that was for MS-DOS (note that MS-DOS doesn't have real processes | |||
<pre> | |||
Note that fork inherently is expensive without tricks | Note that fork inherently is expensive without tricks | ||
- on Linux fork is pretty fast, there are tricks | - on Linux fork is pretty fast, there are tricks |
Latest revision as of 16:08, 5 October 2021
Video
Video from the lecture given on September 30, 2021 is now available:
Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)
Notes
Lecture 7 --------- - textbooks to *give away* (take and enjoy!) outside 5137 HP if you grab one, please PM me and tell me what you grabbed *Lots* of OS textbooks a few architecture, security, and game dev books one distributed systems I'm cleaning out my office :-) What are you all working on? - maybe finishing up Tutorials 1 & 2 (Due by end of tomorrow, should have been finished a week ago) - Assignment 1 (Due officially by end of tomorrow, but accepted until Tuesday at 10) - Tutorial 3 Last time we talked about standard I/O Today it is fork/execve I don't generally do late penalties - either it is in or not - but I try to be as flexible on due dates as I can I want citations (in part) so I can know where good & bad info is coming from https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw Remember that assignment will be split up by question - so citations at end won't go with question - please put with each question, even if you have to repeat - you can use tabs and newlines, just make sure only the question lines start with a number and a period and a space after the period. - please start on line with the question number, but this isn't essential Fork, execve - covered somewhat in 2401, but much more to say - fork duplicates the current process - where there was one there are two - they are almost completely identical - execve replaces current running program with new one from disk - destroys (almost) all state from currently running program Note that running a program is separate from creating a new process - most non-UNIX systems just have something to create a process that normally takes a program binary as an argument (RunProgram or something) Many criticize the fork/execve split in the OS research community - many suggest in inhibits innovation in parallel/distributed systems - because fork is expensive in such contexts, and generally you throw away all that work once you do an execve and load a new program One general principle of modern performance engineering: - copying data is EXPENSIVE - avoid it whenever you can - this is true because CPUs are so much faster than RAM - you can do many calculations in the time it takes to copy data - so better to pass around references to data rather than copying the data when you can - which is why many modern systems prefer to use immutable data, you don't have to copy it as much (I will research spawn and get back to you)
Added after lecture:
- CreateProcess() is the standard way to make processes in Win32 (there are ANSI and Unicode-accepting variants)
- spawn is a fork+execve combo that was for MS-DOS (note that MS-DOS doesn't have real processes
Note that fork inherently is expensive without tricks - on Linux fork is pretty fast, there are tricks - (Copy-on-write, will discuss later when we discuss how virtual memory is implemented) So what is the basic operation that shells do when you run an external command? - they fork themselves - the child process then execve the requested program Note that processes are much cheaper to create on Linux than Windows - one reason multithreading is so favored on windows vs multiple processes on Linux systems People use Linux in the cloud not just because it is free but because it works very well - Windows never competed there on technical merits When you run fork(), it returns twice - in the parent (original process), it returns the PID of the child (the new process) - in the child, it returns 0 - note in each process it only returns one int value - the return value of fork is what prevents the new process from just doing what the original process would have done anyway (duplicating work for no purpose) Can the parent and child talk to each other? - by default, only a little bit - the parent gets a "return value" when the child exits, the number returned by main - but that's it Note that there are no shared variables between the parent and child - child has a copy of parent's data, but changes after fork are local to child and parent We will talk ways for processes to communicate soon (like signals) - but this can happen between any two processes, nothing special about parent & child relationship for this Does the PID of a process change when you do an execve? - no! it stays the same, because it is the same process - just running new code What happens when fork fails? - it returns -1 to the parent, no child is created This would happen if - the system lacked resources to create a new process - the current process doesn't have enough privileges to create a new process (maybe it only gets 5?) If you want to see something pathological, try an infinite loop around fork() - called a fork bomb - tends to make systems very unhappy - don't do this while you are watching me, unless it is in a remote VM! errno, what's that? - sometimes system calls need to return additional error information - so, they set errno, it is a separate mechanism for passing back errors - actually encoded in the return value of the system call, but extracted before it gets to a a process by the C library Can execve fail? YES - if the program being run isn't an executable or doesn't have the right permissions - if execve returns, it has failed Does fork create threads? - NO, it does not - a new thread is like a copy of the current process, except that all memory is shared with the parent - if the child changes a variable, the parent sees it - with fork, the parent wouldn't see the change the child's variable - threads share memory, a fork copies memory - so threads share an address space, processes don't, even if one was forked from another We can define a process as one or more execution contexts in an address space - a thread is an execution context in an address space In UNIX-like systems, you create threads normally using pthread_create() - on Linux, this is implemented using the clone system call - and on Linux, fork is nowadays also implemented using clone Modern linux still has a fork system call, but it is rarely used (only there for backwards compatibilty) - instead, processes & threads are created with clone - process vs thread depends on arguments to clone When should you use threads vs processes? - ideally, NEVER USE THREADS - even if you want shared memory, just share what you need, not everything - shared memory is inherently problematic - have to coordinate changes or things get messed up (have to do mutual exclusion, will explain later) The reason we have multithreading nowadays is mostly because Windows has really expensive processes and it is cheaper to have many threads - original UNIX never had threads, thought it was a bad idea Multithreaded programming is inherently hard - part of why OS kernels are so difficult to make, they HAVE to be multithreaded - so why bother if you don't have to? You can't make system calls directly from C - only with compiler-specific directives or inline assembly When we make "system calls" in C, we are always calling a C wrapper of the system call - sometimes the wrapper is very thin, sometimes not so much The reason to do the tutorials is to get a better understanding - I don't expect things to be clear just from me talking about things When the child starts, it is at exactly the same place as the parent, the return from the fork() call A wrapper is just code around other code - for a system call like write, the write library call just translates the C-style arguments to what is needed for the system call and then calls it - write is a thin wrapper since the semantics of the write function and the write system call are essentially the same - note that fork is a bit "thicker" of a wrapper around the clone system call, because clone's semantics are broader than just fork In general we don't worry about the exact system call interface - only the C library cares most of the time - regular code just calls library calls, and many consider the C interface to be the "standard" interface - but some languages (like Go) prefer to code directly to the system call interface and avoid the C library (on most systems) Go was created at Google by some of the founders of UNIX and C - no way would they be limited by their past creations - hence they try to avoid the C library when they can - creators understand the limitations of their creations in a way few others ever do (Go and Rust are very different, very different use cases) Quick version - signals: messages between processes, typically for control purposes - wait: how parent processes get info about child processes that have terminated (and, optionally, wait for them to terminate)