Operating Systems 2017F Lecture 10: Difference between revisions

Latest revision as of 16:03, 20 October 2017

Video

Video from the lecture given on October 10, 2017 is now available. (The audio worked!)

Notes

In-Class

Lecture 10
----------

Concurrency
 - more than one thing happening at the same time

challenge with concurrency is keeping shared state consistent
 * files that are written to by multiple processes
 * two or more processes communicating
 * multiple threads within a process

concurrency leads to race conditions
 - order of interleaved actions affects output
 - ex. did process A or B modify the file first?

We want deterministic behavior from concurrent computations

To get consistent computation, we need to coordinate
 - but you can't interact via regular variables

Regular variables are subject to the memory hierarchy
 - variables are stored in registers, L1 cache, L2 cache, L3 cache, main memory
 - can be deeper in some contexts
 - so what if coordination variable is in L1 cache for one thread and in main memory for another?
 - hardware maintains consistency...eventually

mechanisms for coordinating concurrent computations must bypass the memory hierachy

Instead use
 - filesystem locking primitives (kernel)
 - special machine language instructions

Operating systems - kernels in particular - are highly concurrent
 - makes them painful to code
 - have to always worry about having exclusive (or safe) access to data, resources


Abstractions for managing concurrency
 - atomic variables guarantee strict ordering of operations

Example

Variable X:  ***** <- range of memory locations

To modify X, you...
  - load X into a register
  - modify the register
  - save the register to X

What if two cores try to modify X at the same time?
  - both could have X in a register at the same time

X is 5
core A increments by 1
core B decrements by 1

What's X after both have run?

If B changes X but doesn't see A's change...you get 4
If A changes X but doesn't see B's change...you get 6

For X to be atomic, ^^^ shouldn't be possible
 - has to make sure only one core accesses X at a time

Atomic variables are typically used to build semaphores...

But what we really want is mutual exclusion

 - only one thread/process/CPU can access the resource at a time

Want mutual exclusion for larger resources, e.g. data structures, devices

Higher-level languages directly support mutual exclusion, e.g.
Java synchronized methods

Semaphores are an analogy to a railroad control mechanism
 - only one train on the track!

When a train enters the shared track
 - grab the semaphore (increment atomic variable)
 - if not available, wait (check first to see if it is 0)

When a train leaves the shared track
 - release the semaphore (decrement atomic variable)

Lesson: don't implement semaphores yourself, please!

In the Linux kernel
 - locks using semaphores of various types are associated with all kinds
   of data structures
   e.g. inodes

When you can't get the lock, what do you do?
 - sleep (tell the scheduler to wake you up when the lock is available)
 - deal with not getting the resource
 - spin (do a null loop)

If the wait time is short, spinning is the fastest thing you can do

Where possible, everything should be exclusive, minimize sharing

But in userspace...
 - shared memory...NOOOOOOO
    - unless you use atomic variable, or something higher level
      like a semaphore
 - messages
    - series of messages can be used to synchronize state

3-way handshakes, e.g. opening a TCP connection
 - SYN (synchronize) client->server
 - SYN-ACK (acknowledge SYN) server->client
 - ACK (acknowledge) client->server

Can send messages between processes
 - signals!
 - other ways too


concurrency pattern: producer/consumer
 * two processes P and C
 * P "makes" things, C "consumes" them
 * want to coordinate P and C
   - P shouldn't produce too much (C gets too far behind)
   - C shouldn't waste time waiting for P

 * shared circular buffer for storing output of P and input of C
 * when buffer is empty, C should sleep
 * when buffer is full, P should sleep

For those who have used Mutexes before, a mutex is essentially a binary semaphore (hence the name, mutual exclusion).

A mutex allows only a single thread to access a resource, whereas a semaphore allows, and can keep count of how many threads are accessing a certain resource at once.

Mutexes are more common in practice, for some examples of when to use a semaphore, please see: 
    http://www.freertos.org/Real-time-embedded-RTOS-Counting-Semaphores.html

When using a semaphore, you wait on it using sem_wait(), an analogy is waiting for a key to go to the washroom.
When youre finished with the resource, you return the key using sem_post() for someone else to use.

@@ Line 1: / Line 1: @@
+==Video==
-== Notes ==
+Video from the lecture given on October 10, 2017 [http://homeostasis.scs.carleton.ca/~soma/os-2017f/lectures/comp3000-2017f-lec10-10Oct2017.mp4 is now available].  (The audio worked!)
-Mmap
+==Notes==
------------------
-From Man Mmap:
-Mapping will be created at nearby page boundary
-What is page/paging?
+===In-Class===
-Virtual Memory
-Every process gets its own "memory map" meaning it has:
-It's own range of addresses
-For example, Address 2000 is different in each process
-Computer hardware is accessed by memory addresses (RAM is accessed by PHYSICAL addresses - every bit of RAM has a unique address)
+<pre>
-Processes access memory using VIRTUAL addresses
+Lecture 10
+----------
-Need a mechanism to translate virtual to physical addresses *on every memory access by a process* (code and data)
+Concurrency
+ - more than one thing happening at the same time
-Virtual address for PID 4512 -> physical address 512241 (they are different addresses that need to be mapped fast because this limits speed of all completion)
+challenge with concurrency is keeping shared state consistent
+ * files that are written to by multiple processes
+ * two or more processes communicating
+ * multiple threads within a process
-The OS kernel provides information through which the Hardware performs the mapping
+concurrency leads to race conditions
+ - order of interleaved actions affects output
+ - ex. did process A or B modify the file first?
-When do mappings get updated?
+We want deterministic behavior from concurrent computations
-When the CPU (core) switches from running one process to the next
-What occurs when a virtual address has no corresponding physical address for the process?
+To get consistent computation, we need to coordinate
--> You get segfault/similar error
+ - but you can't interact via regular variables
-What is mmap?
+Regular variables are subject to the memory hierarchy
-Mmap is a tool to handle virtual memory map of a process:
+ - variables are stored in registers, L1 cache, L2 cache, L3 cache, main memory
--> virtual address range corresponds to contents of a file rather than physical RAM
+ - can be deeper in some contexts
--> You can use it to get RAM
+ - so what if coordination variable is in L1 cache for one thread and in main memory for another?
+ - hardware maintains consistency...eventually
-Why is it wrong to make all virtual addresses correspond to some physical RAM address?
+mechanisms for coordinating concurrent computations must bypass the memory hierachy
-	- Revoke access
-	- We are limited by how much RAM we have; eve if use swapping (pretend disk is RAM; even SSD's are slower than RAM!)
-Memory Allocation
+Instead use
--------------------------------
+ - filesystem locking primitives (kernel)
-Memory allocation challenges:
+ - special machine language instructions
-Need contiguous address ranges for parts of data structures (arrays)
-You can't move a data structure once it has been allocated (easily) -> pointers will be invalid
+Operating systems - kernels in particular - are highly concurrent
+ - makes them painful to code
+ - have to always worry about having exclusive (or safe) access to data, resources
-"-" not allocated
-'*" allocated
-----****----*****------*********----
-Want to allocate ********
+Abstractions for managing concurrency
-While there is room for number of slots it will not be contiguous (This causes external fragmentation) -> think of fitting furniture in your room by rearranging everything to the point where you cannot enter the room
+ - atomic variables guarantee strict ordering of operations
-This arises when you have variable sized allocation so to fix this use fixed size allocations (but now you are wasting storage)
-Internal fragmentation: space you waste in an allocation (using a 4K file for a 1 byte of data so you waste 1 byte - 4k of data
+Example
-External fragmentation: See above example
-It is generally preferable to have internal fragmentation over external fragmentation:
+Variable X:  ***** <- range of memory locations
-All low-level alloctions are in fixed sized units *that can be placed anywhere*
-How to keep this from limiting the size of arrays
-"virtualizing" the indexing (addresses)
-With both disk and memory, low-level storage is allocated in fixed_sized chunks. In memory it is called a page. On disk, a block.
+To modify X, you...
+  - load X into a register
+  - modify the register
+  - save the register to X
-Sizes are always power of 2 because indexes have to be stored in a fixed number of bits.
+What if two cores try to modify X at the same time?
-Because indexes have to be stored in a fixed number of bits
+  - both could have X in a register at the same time
-This is why we use hexadecimal or octal, not decimal to refer to memory indexes (addresses)
-Each hexadecimal digit maps to 4 bits
+X is 5
-Octal digit maps to 3 bits
+core A increments by 1
-Decimal digits map to less than 4 bits
+core B decrements by 1
-Memory indexes are in bits, so use something that is convenient for bits
+What's X after both have run?
--*---***
+If B changes X but doesn't see A's change...you get 4
-01234567 <-- "physical"
+If A changes X but doesn't see B's change...you get 6
-Create a "mapping":
+For X to be atomic, ^^^ shouldn't be possible
-->0
+ - has to make sure only one core accesses X at a time
-->4
-->1
-->2
-----****
+Atomic variables are typically used to build semaphores...
-01234567 <-- "virtual"
-Array is range 0-3
+But what we really want is mutual exclusion
-So when we do a[1] in virtual space, we get contents of 4
-Process virtual addresses are contiguous, physical memory storage isn't at the boundary of allocation
+ - only one thread/process/CPU can access the resource at a time
-You would need a very big table for mapping every byte in memory to another byte
+Want mutual exclusion for larger resources, e.g. data structures, devices
-Typically table maps 4K or more at a time (Power of 2)
+Higher-level languages directly support mutual exclusion, e.g.
+Java synchronized methods
-bit addresses
+Semaphores are an analogy to a railroad control mechanism
+ - only one train on the track!
-bits, 2^8 = 256
+When a train enters the shared track
-^12 = 4096, which is 4K
+ - grab the semaphore (increment atomic variable)
+ - if not available, wait (check first to see if it is 0)
-Allocate in 4k units (pages)
+When a train leaves the shared track
-	- 12 bits of address
+ - release the semaphore (decrement atomic variable)
-No contiguous storage in physical memory for anything larger than 4K.
-Fragmentation in upper 20 bits
+Lesson: don't implement semaphores yourself, please!
-Upper 20 bits: where to put the page
+In the Linux kernel
-Lower 12 bits: where in the page
+ - locks using semaphores of various types are associated with all kinds
+   of data structures
+   e.g. inodes
-I need a data structure that can map 20 bits to 20 bits with this, I map 20 upper bits and copy the lwoer 12 bits from virtual to physical
+When you can't get the lock, what do you do?
+ - sleep (tell the scheduler to wake you up when the lock is available)
+ - deal with not getting the resource
+ - spin (do a null loop)
-Lower 12 bits is a page offset
+If the wait time is short, spinning is the fastest thing you can do
-Upper 20 bits is the page number
-Page -> frame mapping
+Where possible, everything should be exclusive, minimize sharing
-(Frames are RAM, pages are virtual)
-I have to store the page to frame mapping inside pages and I cannot have them be contiguous
+But in userspace...
+ - shared memory...NOOOOOOO
+    - unless you use atomic variable, or something higher level
+      like a semaphore
+ - messages
+    - series of messages can be used to synchronize state
-We use page table (It's similar to a tree)
+-way handshakes, e.g. opening a TCP connection
+ - SYN (synchronize) client->server
+ - SYN-ACK (acknowledge SYN) server->client
+ - ACK (acknowledge) client->server
-  ###       ###       ###         ###
+Can send messages between processes
-****** ****** ****** ******
+ - signals!
+ - other ways too
-A page is 4k
-Every address is 32 bits (4 bytes)
-You can fit 1024 "pointers" into a page
--> that is 10 bits
-If I only had 4 megabytes of memory, I just need a page table that can fit in one page
+concurrency pattern: producer/consumer
+ * two processes P and C
+ * P "makes" things, C "consumes" them
+ * want to coordinate P and C
+   - P shouldn't produce too much (C gets too far behind)
+   - C shouldn't waste time waiting for P
-If I want more, I have a page that points to pages
+ * shared circular buffer for storing output of P and input of C
-page: 1024 pointers each referring to a page
+ * when buffer is empty, C should sleep
-Up to (2^10)=1,024 pages: each of those pages has 1024 pointers!
+ * when buffer is full, P should sleep
-Up to (2^20) = 1,048,576 pages: Each of those has 4096 bytes of data
+</pre>
-Need 10 bits for offwset in page directory
+----
-bits for offset in page table
-bits for offset in data page
-bits total
-DO NOT NEED TO DO ADDRESS CALCULATIONS BUT YOU SHOULD UNDERSTAND THE BASIC CONCEPTS
+<pre>
+For those who have used Mutexes before, a mutex is essentially a binary semaphore (hence the name, mutual exclusion).
+A mutex allows only a single thread to access a resource, whereas a semaphore allows, and can keep count of how many threads are accessing a certain resource at once.
+Mutexes are more common in practice, for some examples of when to use a semaphore, please see:
+    http://www.freertos.org/Real-time-embedded-RTOS-Counting-Semaphores.html
+When using a semaphore, you wait on it using sem_wait(), an analogy is waiting for a key to go to the washroom.
+When youre finished with the resource, you return the key using sem_post() for someone else to use.
+</pre>