Operating Systems 2021F Lecture 9
Video
Video from the lecture given on October 7, 2021 is now available:
Video is also available through Brightspace (Resources->Class zoom meetings->Cloud Recordings tab)
Notes
Lecture 9 --------- Topics * foreground & background processes * signals * process hierarchy & orphans * process lifecycle - key system calls "not yet finalized" - feature complete, but may have bugs - finalized after I've made solutions - go ahead and work on it, but if something seems too hard please ask! What is a process's lifecycle? - process is born: fork/clone - has to start with an existing process - one who did the fork/clone - the parent - new process - the child - all processes have a parent except pid 1 - "init", now part of systemd - 1 is started when the system is booted - when 1 stops, the system shuts down - the kernel has to start 1 explicitly, hardcoded in the kernel - note there is always a process hierarchy - like a family tree - pstree shows the hierarchy - thus, a process starts always as a copy of another process - to run a new program, use execve - but this replaces the code in the execve'ing process - it "commits suicide", throws away its code in favor of another - what happens when a process that has children terminates? - another process has to become its parent, it must be adopted - traditionally this was pid 1, but nowadays systemd has master processes for each user - maintaining the hierarchy is essential because that's what determines who gets the return value when a process terminates When a process terminates - it returns a value from main (0 if all is well) - kernel holds on to the return value - kernel sends SIGCHLD to parent - when parent process calls wait, kernel gives it the return value returned by child What happens if the parent DOESN'T call wait? - kernel has to retain information - must maintain an entry in the list of processes (process table) - this entry in the process table will stay no matter what signal is sent to the associated PID - the process has already terminated - this is known as a "zombie" process (actually denoted as a "Z" state) - only way to get rid of a zombie process is to kill the parent - because then a new parent will take over and can call wait - there are a finite number of processes that can be created (16K traditionally, but more now) - so each zombie represents a process that can't be created - generally not an issue unless situation is pathological Why doesn't the kernel just take care of this? - kernel tries to implement mechanism more than policy - and maybe the return value of a process is important - kernel doesn't want to decide this sort of matter - responsibility of userspace (i.e., init/systemd) A process can be in many states - ps will show you them - most common ones are R (running) S (sleeping) - Z is for defunct (zombie) processes Signals are small messages sent to processes - can be sent by kernel or another process - processes register handlers for each signal - C library defines default handlers, most cause the process to terminate When a parent process is killed, the child will normally keep running - unless it is part of a session that has ended, then all processes in the session are terminated - this is how things are cleaned up when you logout - but you can defeat this easily When you type & at the end of a command in bash, you tell bash to run it in the background - by default, bash will call wait immediately after creating a child process to run a command - this wait will block, i.e., bash will actually wait and do nothing else until the child has terminated - but, with &, bash will just go and print a prompt - child will continue running - if you haven't redirected standard out and error, output will be mixed with bash - background processes with standard out & error (and input, if it takes input) redirected can continue running even if you log out Redirecting files > redirect standard out (fd 1) < redirect standard in (fd 0) 2> redirect standard error (fd 2) >& redirect standard out and error (1 & 2) #> redirect file descriptor # >& and & are completely different, sorry Note redirecting standard in/out/error is essential for running code in the background - unless it never uses standard in/out/error tail gives you the last few lines of a file - tail -f "follows the file" keeps updating - useful for watching logs Ctrl-D is end of file - can use to exit programs that are expecting input on standard in, as we're saying there is no more input to be had A process that is running in the background and doesn't terminate when you log out is known as a "daemon" in UNIX - typically is a child of init/systemd - normally provides some sort of service - this is why background processes on UNIX end with a "d" You can start new daemons easily, just run something in the background and make sure to redirect its standard out & error - and for proper usage, create it as a grandchild and have the child terminate, instant orphan => becomes child of init/systemd You can always kill a daemon by sending it a signal - you may need to be root But if the daemon was started by systemd this may be futile - systemd may just restart it (respawn the daemon) - to truly shut it down, you have to tell systemd to stop it Why do you have to be root to kill most daemons? - because users can normally only send signals to their own processes (otherwise they are ignored) - only root can send signals to any process Any user can normally start a daemon - privileges will be limited to what privileges the starting user has - but on UNIX that is quite enough to start up a network server - but not on all ports, so not a true web or ssh server So can multiple users be logged in at the same time? - YES - entire point of UNIX, it was for users sharing a computer - nowadays we more use one user per computer, and different "users" are just for background processes, i.e., a database will run as one user, web server as another - "root" is just a user, just one with all the privileges - kernel defers to its requests, but kernel does the actions - many security configurations nowadays limit the power of the root user (e.g., SELinux) - like "admin" on Windows Process lifecycle cont - started with fork/clone - runs a program binary with execve (in same process) - terminate with exit, send return value to parent (normal termination) - terminate on receiving a signal - say SIGTERM Signals MOSTLY have handlers - except for two, SIGSTOP and SIGKILL - SIGSTOP pauses execution of a process - SIGKILL (-9) forcibly terminates a process "force quit" - otherwise, processes can choose to ignore or do almost anything when they receive other signals - think of signal handlers as primitive event handlers You send signals with the kill system call - defaults to SIGTERM, request to terminate process - kill -SIGKILL or kill -9, only use it if process doesn't respond to SIGTERM (it can ignore it) - trouble with SIGKILL is process doesn't run a handle so it can't clean up, could result in loss of data (i.e., an editor might do an emergency save on SIGTERM but can't do that on SIGKILL) Ctrl-C by default sends SIGINT, an interrupt signal - normally just terminates the process too, but process can ignore What happens if we call wait and - there are no children at all? => wait returns immediately, returns -1 - there is a child that has a return value => wait returns immediately with the return value of the child - multiple children have terminated? => wait gets the return value for just one, you can tell which by the return PID - no child has terminated, but one or more children are running? => wait blocks, process will sleep until a child exits, then wait will return with child's exit status When a signal handler runs, does it run in another process or thread? - NO, runs in "the main thread", same as rest of code - when a process receives a signal, control immediately jumps to the signal handler - when signal handler exits, process execution continues as before - process could have been doing anything, so signal handler has to be careful what it does, otherwise it could corrupt things - process can say "don't send me signals now", can ignore any signal except SIGSTOP and SIGKILL for as long as it wants Signals are a VERY PRIMITIVE form of concurrency - kinda dangerous - normally signal handlers should do as little as possible so they don't mess things up - definitely avoid doing any system calls if you don't have to, let the main execution take care of that