Operating Systems 2017F Lecture 18: Difference between revisions
| No edit summary | |||
| (6 intermediate revisions by 4 users not shown) | |||
| Line 1: | Line 1: | ||
| ==Video== | |||
| == Additional Notes == | The video from the lecture given on Nov. 16, 2017 [http://homeostasis.scs.carleton.ca/~soma/os-2017f/lectures/comp3000-2017f-lec18-16Nov2017.mp4 is now available]. | ||
| ==Notes== | |||
| ===In Class=== | |||
| <pre> | |||
| Lecture 18: Filesystems and such | |||
| -------------------------------- | |||
| * How can you recover a filesystem? | |||
| * How do you delete a file? | |||
| A filesystem is | |||
|  * persistent data structure | |||
|  * stored in fixed-sized blocks (at least 512 bytes in size) | |||
|  * maps hierarchical filenames to file contents | |||
|  * has metadata about files (somehow) | |||
| What's in a filesystem? | |||
|  * data blocks | |||
|  * metadata blocks | |||
| How do you organize metadata? | |||
| First job: identify basic characteristics of the filesystem | |||
| You need a "summary" block that tells you about everything else | |||
|  => this is the "superblock" | |||
| Normally the superblock is the first block of the filesystem | |||
| In the superblock | |||
|  - what kind of filesystem is this? | |||
|     - what filesystem magic number is there | |||
|  - how big is the filesystem? | |||
|  - how is it organized? | |||
|  - where can I find the rest of the metadata? | |||
| for POSIX filesystems | |||
|  - file metadata is stored in...inodes | |||
|  - most have pre-reserved inodes | |||
| So we have | |||
|  - superblock | |||
|  - inode blocks | |||
|  - data blocks | |||
|    - data blocks for directories | |||
|    - data blocks for files | |||
| How do you recover from damage? | |||
|  - filesystems never "reboot", must remain correct over | |||
|    the course of years | |||
|  - but errors will happen | |||
|    - bitrot | |||
|    - "accidental" corruption | |||
|    - computer failure/memory corruption/hard reboot | |||
| To make filesystems fast, data & metadata is cached in RAM | |||
|  - bad things happen if this data hasn't been written to disk and you reboot | |||
|  - even worse things happen if your RAM is bad and corrupts the data | |||
| Also bad...what if we lose the superblock? | |||
|  - you could lose EVERYTHING | |||
|  - so we have backup superblocks | |||
| Old scandisk/fsck was slow because they had to scan all filesystem metadata | |||
|  - not to recover data, but to fix metadata | |||
| Nowadays fsck is very fast and we rarely lose data due to losing power | |||
|  - we must be writing data to disk all the time | |||
|  - but isn't writing all the time slow? | |||
| On magnetic hard disks (not SSDs) | |||
|  - sequential operations are fast | |||
|  - random access is slow | |||
|    - we have to move the read/write head | |||
| So, on modern systems we update metadata (and sometimes data) by writing | |||
| sequentially to disk...and then later writing randomly | |||
|  - sequential writes go to the "journal" | |||
| On fsck on a journaled filesystem | |||
|  - just check the journal for pending operations (replay the journal) | |||
| There exist filesystems that are pure journal | |||
|  - log-based filesystem | |||
| logs and journal inherently create multiple copies of data and metadata that are hard to track.  This makes deletion nearly impossible (at least to guarantee) | |||
| Only way to guarantee...encrypt everything | |||
|  - if every file has its own key, you can delete the key and thus "delete" the data | |||
| Solid State Disks (SSD) use log-structured storage at a level below blocks. | |||
|  - writes are coarse-grained (you have to write a lot at once) | |||
|  - you don't want to write to the same cells too often, they'll die | |||
|    - have to do "wear-leveling" | |||
| </pre> | |||
| Additional notes : | |||
| Lecture 18, november 17  | |||
| Comp 3000:  | |||
| Midterm review: | |||
| 2)mmap is called in dynamically linked libraries  | |||
| Work on 2404 assignment | |||
| 8) yes , using mmap  | |||
| 9) shell did , because it has to open a new file  | |||
| 10) mmap allocates the entire file ,  | |||
| 11) after the fork , the memory won’t be shared, no communication will happen.  | |||
| 12) no, race condition, busy wait , spin lock in the kernel. Some other user can modify the code which we are waiting. Before we decrement someone else will modify it. Someone can change with the semantics.  | |||
| How do you do kernel hacking? | |||
| 1)	Be humble,  | |||
| o	you don’t necessary know everything, everyone is retarded in their own way  | |||
| 2)	Verify you assumptions  | |||
| o	By experiments  | |||
| o	Compile and run | |||
| 3)	Check for errors! | |||
| o	Saves time | |||
| o	Kernel has to live “ cleanly”  | |||
| 4)	Find another part of the kernel that is close to what you want to do  | |||
| o	Use their ideas to apply yours by analyzing their code. | |||
| o	Follow their pattern “pattern match” to avoid problems since you may not understand all the abstractions and assumptions | |||
| o	Realize if their assumptions match yours | |||
| 5)	Understand the “flow of control” in the program | |||
| o	Architecture  | |||
| o	Division of responsibilities  | |||
| o	Division Why does this matter? | |||
| o	Possible to make a module and run in the background! Anything is possible! All you have to do is be creative. In fact you have to do a kernel thread  | |||
| 	When does the Ethernet card receives data? | |||
| o	The Ethernet card sends an interrupt to the kernel  | |||
| o	The CPU calls the kernel code for handling Ethernet data | |||
| 	When does the kernel run?  | |||
| o	The kernel gets woken up for those events  | |||
| o	The clock generates a timer interrupt  | |||
| 	Interrupt requires a CPU score to be taken over | |||
| o	Core was probably running a user space process and this is scheduling is about | |||
| 	Schedule: what to do after having kicked a user space process off a core. | |||
| o	Can it be a complex CPU algorithm : no since it always sending interrupts? | |||
| o	Determines what is the next task to do? | |||
| 	Normally on a core: | |||
| o	A running userspace process | |||
| o	Interrupt happens | |||
| o	Core switches to supervisor mode, runs kernel code | |||
| o	Last part of the  kernel code is the scheduler, chooses which userspace code to run  | |||
| o	Goto top | |||
| 	Kernel is entered via interrupts, exited via scheduler  | |||
| 	Entry and exit the kernel has to do low-level tasks | |||
| o	Uses assembly code | |||
| 	Limits because it is hard to manage  | |||
| On the website: | |||
| 	Arch : arch culture specific and the driver specific code,  | |||
| 	Entry_64.5 : before the system call it calls this. Don’t mess up with it. It take cares of dispatching the system calls. | |||
| 	Shced.h: what the kernel uses to keep track of processes , go through it  | |||
| 	:1 means bitfields in c  | |||
| 	What criteria should the scheduler use?  | |||
| o	“fairness” : everyone should get a turn, everyone gets to share the CPU. | |||
| o	Starvation : a term when a program does not get the CPU | |||
| o	Prevent starvation! | |||
| o	Equal share of resources. | |||
| o	Why would not you want your scheduler being fair? To avoid “foreground” tasks in the interactive systems. | |||
| o	Never enough biased towards “ foreground” tasks | |||
| o	Series of hacks and heuristics   | |||
| 	Memory is allocated lazily is the kernel :  | |||
| o	which means it is possible to allocate way more memory that can be ever used. | |||
| o	This will cause “memory debt” | |||
| o	Out of memory killer : killing process when you exceed the amount of memory(ex: shoots whoever deposits the money) | |||
| Continuation of lecture 18:  | |||
| Important notes: | |||
| How can you recover a filesystem? | |||
| How do you delete a file? | |||
| What is a filesystem?:  | |||
| *persistent data structure  | |||
| * stored in fixed-sized blocks (at least 512 bytes in size) | |||
| *maps hierarchical filenames to file contents  | |||
| *has metadata about files (somwhow) | |||
| What is in a filesystem? | |||
| *data blocks  | |||
| *metadata blocks  | |||
| How do you organize metdata: | |||
| 1)	First you must identify characteristics of the file system  | |||
| Superblock :  summary y block which tells you about the other blocks you have and it depends on which file system you have. It’s usually the first block of a file system.  | |||
| In the superblock? : | |||
| 1)	What kind of file system is this? By checking what is the magic number it has  | |||
| 2)	How big is the file system? | |||
| 3)	How is it organized? | |||
| 4)	Where can I find the rest of the metadata? | |||
| *How can you identify which file system it is from looking at the super class | |||
| -> google “magic number of a file”  | |||
| -> ex: jpg ctr^c ctr^c : switched the pictures into a binary file  | |||
| -> look at the beginning of the file you will see JFIF:  first several bytes in general that identifies the type of the file (magic number) | |||
| File extension :  | |||
| 	what is it ?  | |||
| 	is it important | |||
| 	the kernel does not know and not care about it | |||
| For POSIX file systems:  | |||
| -.> file metadata is stored in inodes  | |||
| -.> most have pre-reserved inodes  | |||
|    -> the only way you can run out of inodes if you keep creating small files  | |||
| Usenet : al the things you use to post messages thro social media, email, etc. Those were made using Usenet. Like email but Local Usenet server. But it died over time. Every message is stored in an individual file.  | |||
| Important commands:  | |||
| File * : to identify the kind of file system | |||
| 1.	As : Run dumpe2fs foo. What does the output of this command mean? | |||
| 	Does this give you info about the file system? | |||
| 	File bar : bar is the file name and cp comp3000-midterm-2017.pdf bar  | |||
| 	Evince bar : opens up the pdf file | |||
| ===Additional Notes=== | |||
| Lec 18 <br> | Lec 18 <br> | ||
| * More on filesystems <br> | * More on filesystems <br> | ||
| Line 12: | Line 236: | ||
| <br> | <br> | ||
| What's in a filesystem <br> | What's in a filesystem <br> | ||
| * data blocks <br> | * data blocks (stores file content) <br> | ||
| * metadata blocks, you need someway to find the blocks<br> | * metadata blocks, you need someway to find the blocks<br> | ||
| <br> | <br> | ||
| Line 18: | Line 242: | ||
| <br> | <br> | ||
| First identify basic characteristics of the filesystem <br> | First identify basic characteristics of the filesystem <br> | ||
| - How big is the filesystem? <br> | |||
| - What is the block size? <br> | |||
| <br> | |||
| How do we differentiate between this and other filesystems? | |||
| <br> | <br> | ||
| You need a "superblock" which is a "summary" block that tells you about everything else<br> | You need a "superblock" which is a "summary" block that tells you about everything else<br> | ||
| <br> | <br> | ||
| Normally the superblock is the first block of the filesystem <br> | - Format depends on filesystem <br> | ||
| - Normally the superblock is the first block of the filesystem <br> | |||
| <br> | <br> | ||
| - Think of it almost like the root of a binary tree <br> | |||
| In the superblock <br> | In the superblock <br> | ||
| * Type of filesystem <br> | * Type of filesystem <br> | ||
| ** What filesystem magic number is there <br> | ** What filesystem magic number is there (lets us identify one filesystem from another just by looking at the first block) <br> | ||
| **  | ** File command to know file type <br> | ||
| * Size of the filesystem <br> | * Size of the filesystem <br> | ||
| * How the filesystem is organized <br> | * How the filesystem is organized (different filesystems organize their data differently) <br> | ||
| * Where can I find the rest of the metadata <br> | * Where can I find the rest of the metadata <br> | ||
| He opened a .jpg as a binary file to show us the magic number in a file, first several bytes identify type of file. Kernel does not care about file extension | He opened a .jpg as a binary file to show us the magic number in a file, first several bytes identify type of file. <br> | ||
| - Kernel does not care about file extension but userspace programs may care about the extension.<br> | |||
| - File extensions are only really useful for the people looking at them | |||
| - Typical for binary file formats to have a set of bytes that identify the type of file | |||
| <br> | <br> | ||
| POSIX is a standard for  | POSIX is a standard for maintaining compatibility between operating systems <br> | ||
| - QNX, UNIX, MacOS are all POSIX compliant | |||
| - Others comply on a varying scale | |||
| For POSIX filesystems <br> | For POSIX filesystems <br> | ||
| * File metadata is stored in INODES<br> | * File metadata is stored in INODES<br> | ||
| *  | * When you create a filesystem, certain blocks are dedicated to being INODES <br> | ||
| - Possible to have space in your filesystem without being able to store things if you run out of INODES <br> | |||
| <br> | |||
| What is usenet? | |||
| - A worldwide distributed discussion system (stone age version of reddit) <br> | |||
| - Deprecated now because it could not handle the spam people uploaded into it, lol <br> | |||
| - Format for usenet was every message stored in its own file => lots of small files <br> | |||
| - Everyone has a local usenet server, access to read posts on the forum <br> | |||
| - To post a message, send to local server which replicates it and sends it to all the other servers <br> | |||
| So we have: <br> | So we have: <br> | ||
| * superblock <br> | * superblock <br> | ||
| Line 45: | Line 287: | ||
| <br> | <br> | ||
| How do you recover from damage?<br> | How do you recover from damage?<br> | ||
| *  | * Filesystems never "reboot", must remain correct over time <br> | ||
| * Errors will happen: bitrot, accidental corruption, computer failiure/memory corruption/hard reboot <br> | * Errors will happen: bitrot (when bits change), accidental corruption, computer failiure/memory corruption/hard reboot <br> | ||
| <br> | <br> | ||
| To make filesystems fast, data and  | To make filesystems fast, data and metadata are cached in RAM <br> | ||
| * Bad things happen if this data hasn't been writen to disk and you reboot <br> | * Bad things happen if this data hasn't been writen to disk and you reboot <br> | ||
| * Even worse things happen if your RAM is bad and corrupts the data <br>   | * Even worse things happen if your RAM is bad and corrupts the data <br>   | ||
| * FSCK is like scandisk in Windows 98   | * FSCK is like scandisk in Windows 98 (this only happens when you do a hard reset) | ||
| <br> | <br> | ||
| What happens if you lose the superblock?<br> | |||
| * You could lose EVERYTHING <br> | * You could lose EVERYTHING <br> | ||
| *  | * Node trunc dd command blew away first bytes of the file system so you could not mount it because you corrupted the superblock. However, fsck fixed this because we have backup superblocks :D <br> | ||
| - Most filesystems keep copies of the superblock in random locations throughout which takes up some unusable amount of data <br> | |||
| - But this is an impractical way to deal with data blocks <br> | |||
| <br> | <br> | ||
| Old scandisk/fsck was slow because  | Old scandisk/fsck was slow because it had to scan all filesystem metadata, this is bad since we may lose power before it finished running <br> | ||
| * Not to recover data, but to fix metadata <br> | * Not to recover data, but to fix metadata <br> | ||
| * lost+found might have some files that you  | * lost+found directory might have some files that you can recover <br> | ||
| <br> | |||
| What is lost+found? <br> | |||
| - Part of the filesystem for fsck to use, dedicated directory <br> | |||
| - Can find nodes that appear to be allocated but has no associated filename, no hardlinks, is inaccessible <br> | |||
| - If you run fsck and it returns error messages you may be able to recover them by looking in lost+found <br> | |||
| - Almost useless for modern filesystems <br> | |||
| <br> | <br> | ||
| Nowadays fsck is very fast and we rarely lose data due to losing power <br> | Nowadays fsck is very fast and we rarely lose data due to losing power <br> | ||
| * What this means is we must be writing to disk all the time <br> | * What this means is we must be writing to disk all the time <br> | ||
| * But isn't writing slow?  | * But isn't writing slow? => Not necessarily, all writes aren't the same <br> | ||
| <br> | <br> | ||
| On Magnetic Hard Disks (not SSD's) <br> | On Magnetic Hard Disks (not SSD's) <br> | ||
| Line 69: | Line 319: | ||
| * random access is slow <br> | * random access is slow <br> | ||
| ** we have to move the read/write head <br> | ** we have to move the read/write head <br> | ||
| <br> | |||
| So, on modern systems we update metadata (and sometimes data) by writing sequentially to disk...and then later writing randomly (which means were actually writing twice) <br> | |||
| * Sequential writes go to the journal <br> | |||
| <br> | |||
| On fsck on a journaled filesystem | |||
| * Just check the journal for pending operations (replaying the journal) <br>  | |||
| * There exists filesystems for optimizing writes that are pure journal<br> | |||
| * Log based filesystem <br> | |||
| <br> | |||
| How to delete things? <br> | |||
| * Compact it, we don't really know how many copies are stored in the filesystem | |||
| <br> | |||
| Logs and journal inherently create multiple copies of data and metadata that are hard to track. This makes deletion nearly impossible (at least to guarantee) <br> | |||
| <br> | |||
| Only way to guarantee...encrypt everything <br> | |||
| * If every file has its own key, you can delete the key and this technically deletes the data <br> | |||
| * Only way to recover data is to break encryption scheme (this is nearly impossible) | |||
| <br> | |||
| SSDs use log-structured storage at a layer below the regular filesystem <br> | |||
| * Writes are coarse-grained (efficient for writing large amounts at once) <br> | |||
| * You don't want to write to the same cells too often, they will die <br>  | |||
| ** Instead spread out where you write data => "wear-leveling" <br> | |||
| <br> | |||
| * All modern Intel cpus have a management chip that manages power and such <br> | |||
| ** Runs a small operating system called minux (this can also be compromised) | |||
Latest revision as of 21:05, 16 November 2017
Video
The video from the lecture given on Nov. 16, 2017 is now available.
Notes
In Class
Lecture 18: Filesystems and such
--------------------------------
* How can you recover a filesystem?
* How do you delete a file?
A filesystem is
 * persistent data structure
 * stored in fixed-sized blocks (at least 512 bytes in size)
 * maps hierarchical filenames to file contents
 * has metadata about files (somehow)
What's in a filesystem?
 * data blocks
 * metadata blocks
How do you organize metadata?
First job: identify basic characteristics of the filesystem
You need a "summary" block that tells you about everything else
 => this is the "superblock"
Normally the superblock is the first block of the filesystem
In the superblock
 - what kind of filesystem is this?
    - what filesystem magic number is there
 - how big is the filesystem?
 - how is it organized?
 - where can I find the rest of the metadata?
for POSIX filesystems
 - file metadata is stored in...inodes
 - most have pre-reserved inodes
So we have
 - superblock
 - inode blocks
 - data blocks
   - data blocks for directories
   - data blocks for files
How do you recover from damage?
 - filesystems never "reboot", must remain correct over
   the course of years
 - but errors will happen
   - bitrot
   - "accidental" corruption
   - computer failure/memory corruption/hard reboot
To make filesystems fast, data & metadata is cached in RAM
 - bad things happen if this data hasn't been written to disk and you reboot
 - even worse things happen if your RAM is bad and corrupts the data
Also bad...what if we lose the superblock?
 - you could lose EVERYTHING
 - so we have backup superblocks
Old scandisk/fsck was slow because they had to scan all filesystem metadata
 - not to recover data, but to fix metadata
Nowadays fsck is very fast and we rarely lose data due to losing power
 - we must be writing data to disk all the time
 - but isn't writing all the time slow?
On magnetic hard disks (not SSDs)
 - sequential operations are fast
 - random access is slow
   - we have to move the read/write head
So, on modern systems we update metadata (and sometimes data) by writing
sequentially to disk...and then later writing randomly
 - sequential writes go to the "journal"
On fsck on a journaled filesystem
 - just check the journal for pending operations (replay the journal)
There exist filesystems that are pure journal
 - log-based filesystem
logs and journal inherently create multiple copies of data and metadata that are hard to track.  This makes deletion nearly impossible (at least to guarantee)
Only way to guarantee...encrypt everything
 - if every file has its own key, you can delete the key and thus "delete" the data
Solid State Disks (SSD) use log-structured storage at a level below blocks.
 - writes are coarse-grained (you have to write a lot at once)
 - you don't want to write to the same cells too often, they'll die
   - have to do "wear-leveling"
Additional notes :
Lecture 18, november 17
Comp 3000: Midterm review: 2)mmap is called in dynamically linked libraries Work on 2404 assignment
8) yes , using mmap 
9) shell did , because it has to open a new file 
10) mmap allocates the entire file , 
11) after the fork , the memory won’t be shared, no communication will happen. 
12) no, race condition, busy wait , spin lock in the kernel. Some other user can modify the code which we are waiting. Before we decrement someone else will modify it. Someone can change with the semantics. 
How do you do kernel hacking?
1)	Be humble, 
o	you don’t necessary know everything, everyone is retarded in their own way 
2)	Verify you assumptions 
o	By experiments 
o	Compile and run
3)	Check for errors!
o	Saves time
o	Kernel has to live “ cleanly” 
4)	Find another part of the kernel that is close to what you want to do 
o	Use their ideas to apply yours by analyzing their code.
o	Follow their pattern “pattern match” to avoid problems since you may not understand all the abstractions and assumptions
o	Realize if their assumptions match yours
5)	Understand the “flow of control” in the program
o	Architecture 
o	Division of responsibilities 
o	Division Why does this matter?
o	Possible to make a module and run in the background! Anything is possible! All you have to do is be creative. In fact you have to do a kernel thread 
	When does the Ethernet card receives data?
o	The Ethernet card sends an interrupt to the kernel 
o	The CPU calls the kernel code for handling Ethernet data
	When does the kernel run? 
o	The kernel gets woken up for those events 
o	The clock generates a timer interrupt 
	Interrupt requires a CPU score to be taken over
o	Core was probably running a user space process and this is scheduling is about
	Schedule: what to do after having kicked a user space process off a core.
o	Can it be a complex CPU algorithm : no since it always sending interrupts?
o	Determines what is the next task to do?
	Normally on a core:
o	A running userspace process
o	Interrupt happens
o	Core switches to supervisor mode, runs kernel code
o	Last part of the  kernel code is the scheduler, chooses which userspace code to run 
o	Goto top
	Kernel is entered via interrupts, exited via scheduler 
	Entry and exit the kernel has to do low-level tasks
o	Uses assembly code
	Limits because it is hard to manage 
On the website:
	Arch : arch culture specific and the driver specific code, 
	Entry_64.5 : before the system call it calls this. Don’t mess up with it. It take cares of dispatching the system calls.
	Shced.h: what the kernel uses to keep track of processes , go through it 
	:1 means bitfields in c 
	What criteria should the scheduler use? 
o	“fairness” : everyone should get a turn, everyone gets to share the CPU.
o	Starvation : a term when a program does not get the CPU
o	Prevent starvation!
o	Equal share of resources.
o	Why would not you want your scheduler being fair? To avoid “foreground” tasks in the interactive systems.
o	Never enough biased towards “ foreground” tasks
o	Series of hacks and heuristics  
 Memory is allocated lazily is the kernel : o which means it is possible to allocate way more memory that can be ever used. o This will cause “memory debt” o Out of memory killer : killing process when you exceed the amount of memory(ex: shoots whoever deposits the money)
Continuation of lecture 18:
Important notes: How can you recover a filesystem? How do you delete a file? What is a filesystem?:
- persistent data structure
- stored in fixed-sized blocks (at least 512 bytes in size)
- maps hierarchical filenames to file contents
- has metadata about files (somwhow)
What is in a filesystem?
- data blocks
- metadata blocks
How do you organize metdata: 1) First you must identify characteristics of the file system Superblock : summary y block which tells you about the other blocks you have and it depends on which file system you have. It’s usually the first block of a file system. In the superblock? : 1) What kind of file system is this? By checking what is the magic number it has 2) How big is the file system? 3) How is it organized? 4) Where can I find the rest of the metadata?
- How can you identify which file system it is from looking at the super class
-> google “magic number of a file” -> ex: jpg ctr^c ctr^c : switched the pictures into a binary file -> look at the beginning of the file you will see JFIF: first several bytes in general that identifies the type of the file (magic number) File extension :  what is it ?  is it important  the kernel does not know and not care about it
For POSIX file systems: -.> file metadata is stored in inodes -.> most have pre-reserved inodes
-> the only way you can run out of inodes if you keep creating small files
Usenet : al the things you use to post messages thro social media, email, etc. Those were made using Usenet. Like email but Local Usenet server. But it died over time. Every message is stored in an individual file.
Important commands: File * : to identify the kind of file system 1. As : Run dumpe2fs foo. What does the output of this command mean?  Does this give you info about the file system?  File bar : bar is the file name and cp comp3000-midterm-2017.pdf bar  Evince bar : opens up the pdf file
Additional Notes
Lec 18 
- More on filesystems 
- How can you recover a fs and how do you delete a file? 
A filesystem is a: 
- Persistent data structure 
- Stored in fixed size blocks (at least 512 bytes in size) 
- Maps hierarchical filenames to file contents 
- Has metadata about files somehow 
What's in a filesystem 
- data blocks (stores file content) 
- metadata blocks, you need someway to find the blocks
How do you organize metadata?
First identify basic characteristics of the filesystem 
- How big is the filesystem? 
- What is the block size? 
How do we differentiate between this and other filesystems?
You need a "superblock" which is a "summary" block that tells you about everything else
- Format depends on filesystem 
- Normally the superblock is the first block of the filesystem 
- Think of it almost like the root of a binary tree 
In the superblock 
- Type of filesystem 
 - What filesystem magic number is there (lets us identify one filesystem from another just by looking at the first block) 
- File command to know file type 
 
- What filesystem magic number is there (lets us identify one filesystem from another just by looking at the first block) 
- Size of the filesystem 
- How the filesystem is organized (different filesystems organize their data differently) 
- Where can I find the rest of the metadata 
He opened a .jpg as a binary file to show us the magic number in a file, first several bytes identify type of file. 
- Kernel does not care about file extension but userspace programs may care about the extension.
- File extensions are only really useful for the people looking at them
- Typical for binary file formats to have a set of bytes that identify the type of file
POSIX is a standard for maintaining compatibility between operating systems 
- QNX, UNIX, MacOS are all POSIX compliant
- Others comply on a varying scale
For POSIX filesystems 
- File metadata is stored in INODES
- When you create a filesystem, certain blocks are dedicated to being INODES 
- Possible to have space in your filesystem without being able to store things if you run out of INODES 
What is usenet?
- A worldwide distributed discussion system (stone age version of reddit) 
- Deprecated now because it could not handle the spam people uploaded into it, lol 
- Format for usenet was every message stored in its own file => lots of small files 
- Everyone has a local usenet server, access to read posts on the forum 
- To post a message, send to local server which replicates it and sends it to all the other servers 
So we have: 
- superblock 
- inode blocks 
- data blocks 
 - data blocks for directories 
- data blocks for files 
 
- data blocks for directories 
How do you recover from damage?
- Filesystems never "reboot", must remain correct over time 
- Errors will happen: bitrot (when bits change), accidental corruption, computer failiure/memory corruption/hard reboot 
To make filesystems fast, data and metadata are cached in RAM 
- Bad things happen if this data hasn't been writen to disk and you reboot 
- Even worse things happen if your RAM is bad and corrupts the data 
- FSCK is like scandisk in Windows 98 (this only happens when you do a hard reset)
What happens if you lose the superblock?
- You could lose EVERYTHING 
- Node trunc dd command blew away first bytes of the file system so you could not mount it because you corrupted the superblock. However, fsck fixed this because we have backup superblocks :D 
- Most filesystems keep copies of the superblock in random locations throughout which takes up some unusable amount of data 
- But this is an impractical way to deal with data blocks 
Old scandisk/fsck was slow because it had to scan all filesystem metadata, this is bad since we may lose power before it finished running 
- Not to recover data, but to fix metadata 
- lost+found directory might have some files that you can recover 
What is lost+found? 
- Part of the filesystem for fsck to use, dedicated directory 
- Can find nodes that appear to be allocated but has no associated filename, no hardlinks, is inaccessible 
- If you run fsck and it returns error messages you may be able to recover them by looking in lost+found 
- Almost useless for modern filesystems 
Nowadays fsck is very fast and we rarely lose data due to losing power 
- What this means is we must be writing to disk all the time 
- But isn't writing slow? => Not necessarily, all writes aren't the same 
On Magnetic Hard Disks (not SSD's) 
- sequential oeprations are fast 
- random access is slow 
 - we have to move the read/write head 
 
- we have to move the read/write head 
So, on modern systems we update metadata (and sometimes data) by writing sequentially to disk...and then later writing randomly (which means were actually writing twice) 
- Sequential writes go to the journal 
On fsck on a journaled filesystem
- Just check the journal for pending operations (replaying the journal) 
- There exists filesystems for optimizing writes that are pure journal
- Log based filesystem 
How to delete things? 
- Compact it, we don't really know how many copies are stored in the filesystem
Logs and journal inherently create multiple copies of data and metadata that are hard to track. This makes deletion nearly impossible (at least to guarantee) 
Only way to guarantee...encrypt everything 
- If every file has its own key, you can delete the key and this technically deletes the data 
- Only way to recover data is to break encryption scheme (this is nearly impossible)
SSDs use log-structured storage at a layer below the regular filesystem 
- Writes are coarse-grained (efficient for writing large amounts at once) 
- You don't want to write to the same cells too often, they will die 
 - Instead spread out where you write data => "wear-leveling" 
 
- Instead spread out where you write data => "wear-leveling" 
- All modern Intel cpus have a management chip that manages power and such 
 - Runs a small operating system called minux (this can also be compromised)