Networks

From Soma-notes

Dynamic Linking

Linking is the process of bringing other people's code into your own program. Linking used to be entirely static on older systems (programs were linked only once). + The advantage to this is that it keeps things simple. - The disadvantage being rampant code duplication. This implies that if running multiple programs that want to run the same function, you have multiple copies of the same function.

Static Linking -> Code is included at compile time. Dynamic Linking -> "Stubs" are included at compile time. -> Real code is loaded at runtime.

Hence, with Dynamic Linking, a lot of work is done every time you run an application.

mmap

In order to work around this, we use a key system call: mmap, that maps a file's contents to an area of memory. You can map the contents as read-only, or read/write. + By using mmap, the file's contents can be shared by multiple processes. + The other advantage is that it is on-demand. Doing a map operation doesn't necessarily generate I/O, whereas doing a read will. Thus, only when you access the memory will you load it. Even if you map an entire library and use only 2 functions, it will only load what you need, when you need it.

Virtual Memory The idea is that we can map part of a file into a pagetable. The Pagetable Entry (PTE) has a bit that represents if the entry is present/valid. If the PTE is invalid, it may generate a page fault through SEGFAULT. The mapped file is then accessed, and its contents loaded into memory.

This is known as Demand Paging. This is advantageous, as it provides potential for, and increases the capacity for concurrent access. It also cuts on disk overhead, as less pages are read into memory this way.

Hard Links

inode: Refers to the contents of a file. -> mapping and references to data blocks.

In UNIX, there is no "delete" function for a file. All we can do is delete a directory entry, or hard link.

inode also keeps track of which parts of a file are on disk, and which parts are in RAM. It does this by way of a reference counter. When you open a file, you increment its reference counter. This makes for the ability to open a file, delete all of its hard links and work on it. Once that file is closed, it is effectively deleted.

Network Packets

Instead of sending data in a stream, we divide it into fixed sized chunks called Packets. Packets are meant for communications, rather than storage. As such, each packet has a source, and a destination. Most generally, both the source and destination are IP Addresses. We also have source/destination ports.

Internet Protocol (IP) is the computer networking protocol used on the internet. Ethernet is another such protocol, however its format is used for sending data across a wire.

One challenge of networking is that if your computer is connected to a network (such as the internet), arbitrary data can be sent and received. At no point does the network stop and for for the OS to keep up. Modern Operating Systems have been optimized for networking.

Ports are numbers that identify programs on a remote box. There is no standard to the allocation of ports, however over time, computers (as well as people) have come to expect certain types of programs on certain numbered ports. For example, - On port 80 you would expect to find a web server working with the Hypertext Transfer Protocol (http). - On port 25, you would expect to find a mail server that deals with the Simple Mail Transfer Protocol (smtp).

A Domain Name System (DNS) translates an IP Address into a domain name (such as www.google.com)

Packets are sent to the kernel through drivers. - The kernel then analyzes each packet and send it to a running process that has been designated to receive that packet. Ethernet -> IP -> TCP -> Process

This is done using a Best-Effort delivery service.

Transmission Control Protocol (TCP) turns bunches of packets into a stream. It is then decoded by the kernel, before the stream is sent to a process. Kernel decodes the data on its TCP/IP Stack. A stack structure is used because there are several layers of data that need to be reassembled.

Sockets

Sockets are an abstraction that are composed of: A Protocol (commonly TCP or UDP) A source IP/port A destination IP/port.

User Datagram Protocol (UDP) is a thin wrapper over IP with the addition of a port. Unlike TCP, UDP does not concern itself with ordering of data or reliability. The advantage this holds over TCP, is that it is quicker.

Path of a Packet

Data is retrieved in the buffer. Socket is set up. Once the socket has been setup, a file descriptor is returned. Then the data is written.

For incoming data from the Ethernet, an interrupt will be generated. The kernel must be prompt, otherwise network data is lost. This problem is addressed partly in Gigabit Ethernet, which employs a buffer that is dumped once it is filled.

We could copy the data from the network before it is lost, but this is orders of magnitude slower than not copying anything. As such, it is desirable to minimize the number of times data is copied. One way we can minimize on the number of copies is by giving the network card Direct Memory Access (DMA).

Another trick is to using mapping techniques.

Firewalls

Classic idea of a firewall is a set of rules in the kernel that determines what specific IPs and Ports are able to do (or have done within them). see Packet Filtering.