Operating Systems 2017F Lecture 10: Difference between revisions

From Soma-notes
Rquaium (talk | contribs)
Created page with " == Notes == Mmap ----------------- From Man Mmap: Mapping will be created at nearby page boundary What is page/paging? Virtual Memory Every process gets its own "memory m..."
 
 
(4 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==Video==


== Notes ==
Video from the lecture given on October 10, 2017 [http://homeostasis.scs.carleton.ca/~soma/os-2017f/lectures/comp3000-2017f-lec10-10Oct2017.mp4 is now available].  (The audio worked!)


Mmap
==Notes==
-----------------
From Man Mmap:
Mapping will be created at nearby page boundary


What is page/paging?
===In-Class===
Virtual Memory
Every process gets its own "memory map" meaning it has:
It's own range of addresses
For example, Address 2000 is different in each process


Computer hardware is accessed by memory addresses (RAM is accessed by PHYSICAL addresses - every bit of RAM has a unique address)
<pre>
Processes access memory using VIRTUAL addresses
Lecture 10
----------


Need a mechanism to translate virtual to physical addresses *on every memory access by a process* (code and data)
Concurrency
- more than one thing happening at the same time


Virtual address for PID 4512 -> physical address 512241 (they are different addresses that need to be mapped fast because this limits speed of all completion)
challenge with concurrency is keeping shared state consistent
* files that are written to by multiple processes
* two or more processes communicating
* multiple threads within a process


The OS kernel provides information through which the Hardware performs the mapping
concurrency leads to race conditions
- order of interleaved actions affects output
- ex. did process A or B modify the file first?


When do mappings get updated?
We want deterministic behavior from concurrent computations
When the CPU (core) switches from running one process to the next


What occurs when a virtual address has no corresponding physical address for the process?
To get consistent computation, we need to coordinate
-> You get segfault/similar error
- but you can't interact via regular variables


What is mmap?
Regular variables are subject to the memory hierarchy
Mmap is a tool to handle virtual memory map of a process:
- variables are stored in registers, L1 cache, L2 cache, L3 cache, main memory
-> virtual address range corresponds to contents of a file rather than physical RAM
- can be deeper in some contexts
-> You can use it to get RAM
- so what if coordination variable is in L1 cache for one thread and in main memory for another?
- hardware maintains consistency...eventually


Why is it wrong to make all virtual addresses correspond to some physical RAM address?
mechanisms for coordinating concurrent computations must bypass the memory hierachy
- Revoke access
- We are limited by how much RAM we have; eve if use swapping (pretend disk is RAM; even SSD's are slower than RAM!)


Memory Allocation
Instead use
-------------------------------
- filesystem locking primitives (kernel)
Memory allocation challenges:
- special machine language instructions
Need contiguous address ranges for parts of data structures (arrays)  
You can't move a data structure once it has been allocated (easily) -> pointers will be invalid


Operating systems - kernels in particular - are highly concurrent
- makes them painful to code
- have to always worry about having exclusive (or safe) access to data, resources


"-" not allocated
'*" allocated
----****----*****------*********----


Want to allocate ********
Abstractions for managing concurrency
While there is room for number of slots it will not be contiguous (This causes external fragmentation) -> think of fitting furniture in your room by rearranging everything to the point where you cannot enter the room
- atomic variables guarantee strict ordering of operations
This arises when you have variable sized allocation so to fix this use fixed size allocations (but now you are wasting storage)


Internal fragmentation: space you waste in an allocation (using a 4K file for a 1 byte of data so you waste 1 byte - 4k of data
Example
External fragmentation: See above example


It is generally preferable to have internal fragmentation over external fragmentation:
Variable X: ***** <- range of memory locations
All low-level alloctions are in fixed sized units *that can be placed anywhere*  
How to keep this from limiting the size of arrays
"virtualizing" the indexing (addresses)


With both disk and memory, low-level storage is allocated in fixed_sized chunks. In memory it is called a page. On disk, a block.
To modify X, you...
  - load X into a register
  - modify the register
  - save the register to X


Sizes are always power of 2 because indexes have to be stored in a fixed number of bits. 
What if two cores try to modify X at the same time?
Because indexes have to be stored in a fixed number of bits
  - both could have X in a register at the same time
This is why we use hexadecimal or octal, not decimal to refer to memory indexes (addresses)


Each hexadecimal digit maps to 4 bits
X is 5
Octal digit maps to 3 bits
core A increments by 1
Decimal digits map to less than 4 bits
core B decrements by 1


Memory indexes are in bits, so use something that is convenient for bits
What's X after both have run?


-*---***
If B changes X but doesn't see A's change...you get 4
01234567 <-- "physical"
If A changes X but doesn't see B's change...you get 6


Create a "mapping":
For X to be atomic, ^^^ shouldn't be possible
0->0
- has to make sure only one core accesses X at a time
1->4
2->1
3->2


----****
Atomic variables are typically used to build semaphores...
01234567 <-- "virtual"


Array is range 0-3
But what we really want is mutual exclusion
So when we do a[1] in virtual space, we get contents of 4


Process virtual addresses are contiguous, physical memory storage isn't at the boundary of allocation
- only one thread/process/CPU can access the resource at a time


You would need a very big table for mapping every byte in memory to another byte 
Want mutual exclusion for larger resources, e.g. data structures, devices


Typically table maps 4K or more at a time (Power of 2)
Higher-level languages directly support mutual exclusion, e.g.
Java synchronized methods


32 bit addresses
Semaphores are an analogy to a railroad control mechanism
- only one train on the track!


8bits, 2^8 = 256
When a train enters the shared track
2^12 = 4096, which is 4K
- grab the semaphore (increment atomic variable)
- if not available, wait (check first to see if it is 0)


Allocate in 4k units (pages)  
When a train leaves the shared track
- 12 bits of address
- release the semaphore (decrement atomic variable)
No contiguous storage in physical memory for anything larger than 4K.


Fragmentation in upper 20 bits
Lesson: don't implement semaphores yourself, please!


Upper 20 bits: where to put the page
In the Linux kernel
Lower 12 bits: where in the page
- locks using semaphores of various types are associated with all kinds
  of data structures
  e.g. inodes


I need a data structure that can map 20 bits to 20 bits with this, I map 20 upper bits and copy the lwoer 12 bits from virtual to physical
When you can't get the lock, what do you do?
- sleep (tell the scheduler to wake you up when the lock is available)
- deal with not getting the resource
- spin (do a null loop)


Lower 12 bits is a page offset
If the wait time is short, spinning is the fastest thing you can do
Upper 20 bits is the page number


Page -> frame mapping
Where possible, everything should be exclusive, minimize sharing
(Frames are RAM, pages are virtual)


I have to store the page to frame mapping inside pages and I cannot have them be contiguous
But in userspace...
- shared memory...NOOOOOOO
    - unless you use atomic variable, or something higher level
      like a semaphore
- messages
    - series of messages can be used to synchronize state


We use page table (It's similar to a tree)
3-way handshakes, e.g. opening a TCP connection
- SYN (synchronize) client->server
- SYN-ACK (acknowledge SYN) server->client
- ACK (acknowledge) client->server


  ###      ###      ###        ###
Can send messages between processes
****** ****** ****** ******
- signals!
- other ways too


A page is 4k
Every address is 32 bits (4 bytes)
You can fit 1024 "pointers" into a page
-> that is 10 bits


If I only had 4 megabytes of memory, I just need a page table that can fit in one page
concurrency pattern: producer/consumer
* two processes P and C
* P "makes" things, C "consumes" them
* want to coordinate P and C
  - P shouldn't produce too much (C gets too far behind)
  - C shouldn't waste time waiting for P


If I want more, I have a page that points to pages
* shared circular buffer for storing output of P and input of C
1 page: 1024 pointers each referring to a page
* when buffer is empty, C should sleep
Up to (2^10)=1,024 pages: each of those pages has 1024 pointers!
* when buffer is full, P should sleep
Up to (2^20) = 1,048,576 pages: Each of those has 4096 bytes of data
</pre>


Need 10 bits for offwset in page directory
----
10 bits for offset in page table
12 bits for offset in data page
32 bits total


DO NOT NEED TO DO ADDRESS CALCULATIONS BUT YOU SHOULD UNDERSTAND THE BASIC CONCEPTS
<pre>
For those who have used Mutexes before, a mutex is essentially a binary semaphore (hence the name, mutual exclusion).
 
A mutex allows only a single thread to access a resource, whereas a semaphore allows, and can keep count of how many threads are accessing a certain resource at once.
 
Mutexes are more common in practice, for some examples of when to use a semaphore, please see:
    http://www.freertos.org/Real-time-embedded-RTOS-Counting-Semaphores.html
 
When using a semaphore, you wait on it using sem_wait(), an analogy is waiting for a key to go to the washroom.
When youre finished with the resource, you return the key using sem_post() for someone else to use.
 
</pre>

Latest revision as of 16:03, 20 October 2017

Video

Video from the lecture given on October 10, 2017 is now available. (The audio worked!)

Notes

In-Class

Lecture 10
----------

Concurrency
 - more than one thing happening at the same time

challenge with concurrency is keeping shared state consistent
 * files that are written to by multiple processes
 * two or more processes communicating
 * multiple threads within a process

concurrency leads to race conditions
 - order of interleaved actions affects output
 - ex. did process A or B modify the file first?

We want deterministic behavior from concurrent computations

To get consistent computation, we need to coordinate
 - but you can't interact via regular variables

Regular variables are subject to the memory hierarchy
 - variables are stored in registers, L1 cache, L2 cache, L3 cache, main memory
 - can be deeper in some contexts
 - so what if coordination variable is in L1 cache for one thread and in main memory for another?
 - hardware maintains consistency...eventually

mechanisms for coordinating concurrent computations must bypass the memory hierachy

Instead use
 - filesystem locking primitives (kernel)
 - special machine language instructions

Operating systems - kernels in particular - are highly concurrent
 - makes them painful to code
 - have to always worry about having exclusive (or safe) access to data, resources


Abstractions for managing concurrency
 - atomic variables guarantee strict ordering of operations

Example

Variable X:  ***** <- range of memory locations

To modify X, you...
  - load X into a register
  - modify the register
  - save the register to X

What if two cores try to modify X at the same time?
  - both could have X in a register at the same time

X is 5
core A increments by 1
core B decrements by 1

What's X after both have run?

If B changes X but doesn't see A's change...you get 4
If A changes X but doesn't see B's change...you get 6

For X to be atomic, ^^^ shouldn't be possible
 - has to make sure only one core accesses X at a time

Atomic variables are typically used to build semaphores...

But what we really want is mutual exclusion

 - only one thread/process/CPU can access the resource at a time

Want mutual exclusion for larger resources, e.g. data structures, devices

Higher-level languages directly support mutual exclusion, e.g.
Java synchronized methods

Semaphores are an analogy to a railroad control mechanism
 - only one train on the track!

When a train enters the shared track
 - grab the semaphore (increment atomic variable)
 - if not available, wait (check first to see if it is 0)

When a train leaves the shared track
 - release the semaphore (decrement atomic variable)

Lesson: don't implement semaphores yourself, please!

In the Linux kernel
 - locks using semaphores of various types are associated with all kinds
   of data structures
   e.g. inodes

When you can't get the lock, what do you do?
 - sleep (tell the scheduler to wake you up when the lock is available)
 - deal with not getting the resource
 - spin (do a null loop)

If the wait time is short, spinning is the fastest thing you can do

Where possible, everything should be exclusive, minimize sharing

But in userspace...
 - shared memory...NOOOOOOO
    - unless you use atomic variable, or something higher level
      like a semaphore
 - messages
    - series of messages can be used to synchronize state

3-way handshakes, e.g. opening a TCP connection
 - SYN (synchronize) client->server
 - SYN-ACK (acknowledge SYN) server->client
 - ACK (acknowledge) client->server

Can send messages between processes
 - signals!
 - other ways too


concurrency pattern: producer/consumer
 * two processes P and C
 * P "makes" things, C "consumes" them
 * want to coordinate P and C
   - P shouldn't produce too much (C gets too far behind)
   - C shouldn't waste time waiting for P

 * shared circular buffer for storing output of P and input of C
 * when buffer is empty, C should sleep
 * when buffer is full, P should sleep

For those who have used Mutexes before, a mutex is essentially a binary semaphore (hence the name, mutual exclusion).

A mutex allows only a single thread to access a resource, whereas a semaphore allows, and can keep count of how many threads are accessing a certain resource at once.

Mutexes are more common in practice, for some examples of when to use a semaphore, please see: 
    http://www.freertos.org/Real-time-embedded-RTOS-Counting-Semaphores.html

When using a semaphore, you wait on it using sem_wait(), an analogy is waiting for a key to go to the washroom.
When youre finished with the resource, you return the key using sem_post() for someone else to use.