Soma-notes - User contributions [en]

Internet Attribution: Between Privacy and Cruciality

2011-04-12T01:37:25Z

Omi: /* Requirements for an Internet Attribution System */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a means of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a means of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet's infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it entices advanced users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions over the internet.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes as an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts with a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution; binding an act to a person. This may include intermediate attributions to other agents, for example (software, device, etc.) and then attribution from that agent to a person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, and as such focusing on the internet. For sake of simplicity, in this paper, whenever we make mention of "attribution," we will do so with reference to "binding an act to a person on the internet".

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going "Scott free"; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required to fully understand some of the concepts and terminology discussed within this paper.

=Background=

The problem of attribution is not one that just came up; it has been around for decades, but mostly to address identification issues as it pertained to websites or Internet Service Providers. A lot of different approaches towards attribution have been taken, but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the request packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that that browser is making to the server and sends one as part of the response, which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified, and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such, cookies are not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (computer, printer, scanner, etc) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet Registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 4,294,967,296 addresses which is less than the number of people on this planet today. The very last batch of IP addresses were assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This address depletion was foreseen since the 90s, and sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system.

IP addresses can either be static or dynamic. A static IP address is an address permanently assigned to a user due to specific configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address at a higher fee.

===IP Addresses as an Attribution System===
In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleading or inconclusive geographical location. Dynamic IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage to authentication systems is that it can provide attribution across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are usually attached to user accounts, and sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of private information on the web server. So in essence, it always has something to do with privacy.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet across multiple devices, and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system, or attribution system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly searching for a cooking recipe online, would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy that.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution, personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients, as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in a pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of this paper. However, such ethical arguments must be addressed prior to the design of an attribution system, because a system that compromises individual privacy and protection, should not be utilized.

Before designing an attribution system for the Internet, many questions need to be answered, some of which are: Who should have the authority to attribute? What information can they attributed and why do they need it? How is attribution achieved or measured? How much can intermediate systems' cooperation contribute to achieving attribution? How do you deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

An attribution system will have many useful applications. The identification property can be useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognized, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

This problem arises largely due to how the Internet is designed. It does not have strong identification mechanisms, which in turn provides users with a certain level of anonymity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but a good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack, attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the sender and receiver, different attribution policy will be required.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other network users. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail because there are many issues and complicated dependencies. There are a lot of questions to answer, or at least to try to answer before one can even think of implementing such a system. In this section we have defined high-level requirements for a good attribution system. While definition of good attribution system is not so clear, we take into account everything we have discussed in the previous sections. The following requirements attempts to define the system in a way that avoids current problems, achieves high degree of attribution, and remains realistic.

We have separated those requirements into three sections: general requirements, deployment requirements, and practice requirements. The general requirements define the idea and overall goal of the system in high level, abstract terms. The deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society. Finally, the practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros==

* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

==Cons==

* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Vulnerabilities==

* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-12T01:23:35Z

Omi: /* IP Addresses */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a means of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a means of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet's infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it entices advanced users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions over the internet.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes as an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts with a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution; binding an act to a person. This may include intermediate attributions to other agents, for example (software, device, etc.) and then attribution from that agent to a person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, and as such focusing on the internet. For sake of simplicity, in this paper, whenever we make mention of "attribution," we will do so with reference to "binding an act to a person on the internet".

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going "Scott free"; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required to fully understand some of the concepts and terminology discussed within this paper.

=Background=

The problem of attribution is not one that just came up; it has been around for decades, but mostly to address identification issues as it pertained to websites or Internet Service Providers. A lot of different approaches towards attribution have been taken, but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the request packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that that browser is making to the server and sends one as part of the response, which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified, and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such, cookies are not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (computer, printer, scanner, etc) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet Registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 4,294,967,296 addresses which is less than the number of people on this planet today. The very last batch of IP addresses were assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This address depletion was foreseen since the 90s, and sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system.

IP addresses can either be static or dynamic. A static IP address is an address permanently assigned to a user due to specific configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address at a higher fee.

===IP Addresses as an Attribution System===
In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleading or inconclusive geographical location. Dynamic IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage to authentication systems is that it can provide attribution across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are usually attached to user accounts, and sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of private information on the web server. So in essence, it always has something to do with privacy.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet across multiple devices, and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system, or attribution system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly searching for a cooking recipe online, would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy that.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution, personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients, as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in a pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of this paper. However, such ethical arguments must be addressed prior to the design of an attribution system, because a system that compromises individual privacy and protection, should not be utilized.

Before designing an attribution system for the Internet, many questions need to be answered, some of which are: Who should have the authority to attribute? What information can they attributed and why do they need it? How is attribution achieved or measured? How much can intermediate systems' cooperation contribute to achieving attribution? How do you deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

An attribution system will have many useful applications. The identification property can be useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognized, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

This problem arises largely due to how the Internet is designed. It does not have strong identification mechanisms, which in turn provides users with a certain level of anonymity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but a good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack, attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the sender and receiver, different attribution policy will be required.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other network users. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros==

* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

==Cons==

* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Vulnerabilities==

* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-12T01:20:48Z

Omi: /* The Attribution Dilemma */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a means of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a means of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet's infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it entices advanced users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions over the internet.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes as an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts with a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution; binding an act to a person. This may include intermediate attributions to other agents, for example (software, device, etc.) and then attribution from that agent to a person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, and as such focusing on the internet. For sake of simplicity, in this paper, whenever we make mention of "attribution," we will do so with reference to "binding an act to a person on the internet".

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going "Scott free"; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required to fully understand some of the concepts and terminology discussed within this paper.

=Background=

The problem of attribution is not one that just came up; it has been around for decades, but mostly to address identification issues as it pertained to websites or Internet Service Providers. A lot of different approaches towards attribution have been taken, but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the request packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that that browser is making to the server and sends one as part of the response, which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified, and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such, cookies are not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (computer, printer, scanner, etc) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet Registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 4,294,967,296 addresses which is less than the number of people on this planet today. The very last batch of IP addresses were assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This address depletion was foreseen since the 90s, and sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system.

IP addresses can either be static or dynamic. A static IP address is an address permanently assigned to a user due to specific configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address at a higher fee.

===IP Addresses as an Attribution System===
Although IP addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage to authentication systems is that it can provide attribution across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are usually attached to user accounts, and sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of private information on the web server. So in essence, it always has something to do with privacy.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet across multiple devices, and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system, or attribution system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly searching for a cooking recipe online, would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy that.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution, personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients, as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in a pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of this paper. However, such ethical arguments must be addressed prior to the design of an attribution system, because a system that compromises individual privacy and protection, should not be utilized.

Before designing an attribution system for the Internet, many questions need to be answered, some of which are: Who should have the authority to attribute? What information can they attributed and why do they need it? How is attribution achieved or measured? How much can intermediate systems' cooperation contribute to achieving attribution? How do you deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

An attribution system will have many useful applications. The identification property can be useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognized, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

This problem arises largely due to how the Internet is designed. It does not have strong identification mechanisms, which in turn provides users with a certain level of anonymity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but a good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack, attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the sender and receiver, different attribution policy will be required.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other network users. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros==

* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

==Cons==

* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Vulnerabilities==

* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-12T00:25:35Z

Omi: /* Background */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a means of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a means of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet's infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it entices advanced users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions over the internet.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes as an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts with a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution; binding an act to a person. This may include intermediate attributions to other agents, for example (software, device, etc.) and then attribution from that agent to a person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, and as such focusing on the internet. For sake of simplicity, in this paper, whenever we make mention of "attribution," we will do so with reference to "binding an act to a person on the internet".

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going "Scott free"; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required to fully understand some of the concepts and terminology discussed within this paper.

=Background=

The problem of attribution is not one that just came up; it has been around for decades, but mostly to address identification issues as it pertained to websites or Internet Service Providers. A lot of different approaches towards attribution have been taken, but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the request packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that that browser is making to the server and sends one as part of the response, which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified, and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such, cookies are not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (computer, printer, scanner, etc) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet Registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 4,294,967,296 addresses which is less than the number of people on this planet today. The very last batch of IP addresses were assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This address depletion was foreseen since the 90s, and sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system.

IP addresses can either be static or dynamic. A static IP address is an address permanently assigned to a user due to specific configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address at a higher fee.

===IP Addresses as an Attribution System===
Although IP addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage to authentication systems is that it can provide attribution across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are usually attached to user accounts, and sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of private information on the web server. So in essence, it always has something to do with privacy.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet across multiple devices, and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system, or attribution system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly searching for a cooking recipe online, would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy that.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros==

* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

==Cons==

* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Vulnerabilities==

* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T23:41:49Z

Omi: /* Introduction */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a means of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a means of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet's infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it entices advanced users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions over the internet.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes as an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts with a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution; binding an act to a person. This may include intermediate attributions to other agents, for example (software, device, etc.) and then attribution from that agent to a person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, and as such focusing on the internet. For sake of simplicity, in this paper, whenever we make mention of "attribution," we will do so with reference to "binding an act to a person on the internet".

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going "Scott free"; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required to fully understand some of the concepts and terminology discussed within this paper.

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros==

* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

==Cons==

* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Vulnerabilities==

* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T23:10:46Z

Omi:

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a means of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a means of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros==

* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

==Cons==

* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Vulnerabilities==

* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T22:56:31Z

Omi: /* Pros, Cons and Vulnerabilities */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros==

* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

==Cons==

* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Vulnerabilities==

* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T22:53:19Z

Omi: /* Requirements for internet attribution system */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for an Internet Attribution System=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T22:52:11Z

Omi: /* Why is it difficult to achieve attribution? */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve Attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T22:51:52Z

Omi: /* The attribution dilemma */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The Attribution Dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

A link to the paper

2011-04-11T03:39:27Z

Omi: /* References */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, yet remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet. Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither: main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be added later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an agent.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an NIC, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of application supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is a running a system in the background that performs external (over the internet) system calls (global clock synchronization ) or is automated for periodic communication or automatic response to incoming requests. E.g., NTP, or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., TCP connection initiation packets and handshaking, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that are do not user the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution regarding the proposed framework.
==Assumptions==
For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Thirdly, we assume that the owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the "identification stamp" of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake identification stamp. A "fake identification stamp" is defined as:
* Either having a false unique chip identifier that refers to an imaginary device.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity access the globally distributed database of "identification stamps" and adds the new identification stamp of the agent that asked for license. If a device is not licensed (i.e., its "identification stamp" was not inserted to the distributed database), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB and sending it a copy of the IS found on the packet. If a packet founds to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==
Obviously, the proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. If not, it is prevented from locomotion.

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
- Public PCs (in labs...), bound to whom?
- Full awareness of users with their systems
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Privacy and Attribution Tradeoff==
The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

=Conclusion=

=References=
[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

<references/>

A link to the paper

2011-04-11T03:38:26Z

Omi: /* Authentication Systems */

A link to the paper

2011-04-11T03:37:20Z

Omi: /* Background */

A link to the paper

2011-04-10T20:33:17Z

Omi:

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISP, who in turn allocates them to their users.

Pros

Cons

==Authentication Systems==

Pros

Cons

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

==General==

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither – main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not a good idea, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and mostly have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

==Deployment==
It is much easier to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc). The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should know the answer, it should be possible to know the answer, the answer "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies:
* "Identification Stamp": An identification stamp is a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature.
*

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros,cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution.
==Assumptions==
For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the identification stamp of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

First, Access devices must be licensed from the trustful entity
 If not, it will not be able to benefit from global routing
services.
2. Licensing: binding a human's unique feature with a
machine’s unique feature
 Human unique feature: iris intricate structure
 Machine unique feature: MAC address
3. Licensing generates identification stamps

==Pros, Cons and Vulnerabilities==

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Privacy and Attribution Tradeoff==
The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

=Conclusion=

=References=
<references/>

A link to the paper

2011-03-29T13:51:35Z

Omi:

A link to the paper

2011-03-22T17:19:38Z

Omi: /* Why we need Attribution */

A link to the paper

2011-03-22T17:14:01Z

Omi: /* Why we need Attribution */

Talk:DistOS-2011W Attribution

2011-03-17T18:23:51Z

Omi:

=What Is Attribution?=
* Binding an act to the Agent.
[Prof.Anil]
*Attribution may refer to: Something, such as a quality or characteristic, that is related to a particular possessor; an attribute.
[Wikipedia]
*something ascribed; an attribute.
[Dictionary.com]

==What is an Agent?==
*An Agent can be a person or machine; The origin of the act.

==What is an Attribute?==
*to consider as a quality or characteristic of the person, thing, group, etc. [Dictionary.com]
*something attributed as belonging to a person, thing, group, etc.; a quality, character, characteristic, or property. [Dictionary.com]

=Why do we want Attribution?=
*We want attribution in order to be able to identify the origin of acts done on the internet.

== How does this affect us?==

=When do we want Attribution?=

=When do we not want Attribution?=

=How is attribution done today?=

==Cookies==

===Pros===

===Cons===

===What is missing?===

==Login/ Required Authentication==

===Pros===

===Cons===

===What is missing?===

=How should attribution be done?=

=Challenges=

In order to develop an effective attribution system for computers the following challenges need to be addressed:

==Identification==

This is probably the biggest challenge of them all; how to identify users. Identification needs to be unique enough that no two users can ever have the same identification. A strong question is, to what level should this identification be? Is it enough to stop at the computer level, or should it stop at the user level.

* If we choose to only identify computers, and leave responsibility to the owner of the computer, what should happen in the case of a stolen computer that is used to commit virtual crime. Maybe to cover this case it should be treated like cars and have insurance against to protect against theft.

*In the case of identification at the human level, what information should be used in the identification. What information are people willing to give up. People generally like the partial anonymity over the internet, doing this is pretty much asking people to give that up. But maybe this is needed for the better of everyone.

==Privacy==

Being that people like the anonymity of surfing the internet, identification of who is where should only be made possible in the aftereffect when called upon. In other words, people should be able to surf the web anonymously, but in the event that maybe a DOS attach is executed, it can easily be traced back to the attacker.

==Deployment==

There are billions of computers already connected to the internet all over the world today. Development of an attribution system should take this into account.

==Tracing==

This is another key thing to consider. People may argue that this is not an aspect of attribution. On the other hand, tracing is the main or sole reason behind the need for attribution in the first place. Not considering it part of attribution is like BMW not considering the driver in the development of their vehicles.

==Storage==

This ties into identification; where should these identifications be stored and who should be granted access to them.

Talk:DistOS-2011W Attribution

2011-03-17T17:54:24Z

Omi:

=What Is Attribution?=
* Binding an act to the Agent.

=Why do we want Attribution?=

== How does this affect us?==

=When do we want Attribution?=

=When do we not want Attribution?=

=How is attribution done today?=

==Cookies==

===Pros===

===Cons===

===What is missing?===

==Login/ Required Authentication==

===Pros===

===Cons===

===What is missing?===

=How should attribution be done?=

=Challenges=

In order to develop an effective attribution system for computers the following challenges need to be addressed:

==Identification==

This is probably the biggest challenge of them all; how to identify users. Identification needs to be unique enough that no two users can ever have the same identification. A strong question is, to what level should this identification be? Is it enough to stop at the computer level, or should it stop at the user level.

* If we choose to only identify computers, and leave responsibility to the owner of the computer, what should happen in the case of a stolen computer that is used to commit virtual crime. Maybe to cover this case it should be treated like cars and have insurance against to protect against theft.

*In the case of identification at the human level, what information should be used in the identification. What information are people willing to give up. People generally like the partial anonymity over the internet, doing this is pretty much asking people to give that up. But maybe this is needed for the better of everyone.

==Privacy==

Being that people like the anonymity of surfing the internet, identification of who is where should only be made possible in the aftereffect when called upon. In other words, people should be able to surf the web anonymously, but in the event that maybe a DOS attach is executed, it can easily be traced back to the attacker.

==Deployment==

There are billions of computers already connected to the internet all over the world today. Development of an attribution system should take this into account.

==Tracing==

This is another key thing to consider. People may argue that this is not an aspect of attribution. On the other hand, tracing is the main or sole reason behind the need for attribution in the first place. Not considering it part of attribution is like BMW not considering the driver in the development of their vehicles.

==Storage==

This ties into identification; where should these identifications be stored and who should be granted access to them.

Talk:DistOS-2011W Attribution

2011-03-17T17:53:53Z

Omi:

=What Is Attribution?=
* Binding an act to the Agent.

=Why do we want Attribution?=

== How does this affect us?==

=When do we want Attribution?

=When do we not want Attribution?=

=How is attribution done today?=

==Cookies==

===Pros===

===Cons===

===What is missing?===

==Login/ Required Authentication==

===Pros===

===Cons===

===What is missing?===

=How should attribution be done?=

=Challenges=

In order to develop an effective attribution system for computers the following challenges need to be addressed:

==Identification==

This is probably the biggest challenge of them all; how to identify users. Identification needs to be unique enough that no two users can ever have the same identification. A strong question is, to what level should this identification be? Is it enough to stop at the computer level, or should it stop at the user level.

* If we choose to only identify computers, and leave responsibility to the owner of the computer, what should happen in the case of a stolen computer that is used to commit virtual crime. Maybe to cover this case it should be treated like cars and have insurance against to protect against theft.

*In the case of identification at the human level, what information should be used in the identification. What information are people willing to give up. People generally like the partial anonymity over the internet, doing this is pretty much asking people to give that up. But maybe this is needed for the better of everyone.

==Privacy==

Being that people like the anonymity of surfing the internet, identification of who is where should only be made possible in the aftereffect when called upon. In other words, people should be able to surf the web anonymously, but in the event that maybe a DOS attach is executed, it can easily be traced back to the attacker.

==Deployment==

There are billions of computers already connected to the internet all over the world today. Development of an attribution system should take this into account.

==Tracing==

This is another key thing to consider. People may argue that this is not an aspect of attribution. On the other hand, tracing is the main or sole reason behind the need for attribution in the first place. Not considering it part of attribution is like BMW not considering the driver in the development of their vehicles.

==Storage==

This ties into identification; where should these identifications be stored and who should be granted access to them.

Talk:DistOS-2011W Attribution

2011-03-13T21:24:02Z

Omi: Created page with "=Challenges= In order to develop an effective attribution system for computers the following challenges need to be addressed: ==Identification== This is probably the biggest c…"

=Challenges=

In order to develop an effective attribution system for computers the following challenges need to be addressed:

==Identification==

This is probably the biggest challenge of them all; how to identify users. Identification needs to be unique enough that no two users can ever have the same identification. A strong question is, to what level should this identification be? Is it enough to stop at the computer level, or should it stop at the user level.

* If we choose to only identify computers, and leave responsibility to the owner of the computer, what should happen in the case of a stolen computer that is used to commit virtual crime. Maybe to cover this case it should be treated like cars and have insurance against to protect against theft.

*In the case of identification at the human level, what information should be used in the identification. What information are people willing to give up. People generally like the partial anonymity over the internet, doing this is pretty much asking people to give that up. But maybe this is needed for the better of everyone.

==Privacy==

Being that people like the anonymity of surfing the internet, identification of who is where should only be made possible in the aftereffect when called upon. In other words, people should be able to surf the web anonymously, but in the event that maybe a DOS attach is executed, it can easily be traced back to the attacker.

==Deployment==

There are billions of computers already connected to the internet all over the world today. Development of an attribution system should take this into account.

==Tracing==

This is another key thing to consider. People may argue that this is not an aspect of attribution. On the other hand, tracing is the main or sole reason behind the need for attribution in the first place. Not considering it part of attribution is like BMW not considering the driver in the development of their vehicles.

==Storage==

This ties into identification; where should these identifications be stored and who should be granted access to them.

DistOS-2011W Distributed File Sharing

2011-03-13T18:16:13Z

Omi:

Author: Omi Iyamu

oiyamu@gmail.com

PDF available at [[File:Example.jpg] PDF]

Abstract

File sharing is a tool necessary for group collaboration, a simple way to make your files available to others, and nice way to access file contents across multiple machines. This paper discusses on a high-level the different file-sharing systems currently being used and the different strategies they employ to facilitate file sharing. In section 2, different file sharing systems are categorized based on scale into Local Area Network sharing and Internet based sharing. Section 3 discusses the steps involved in the process of sharing an actual file using the different file sharing systems discussed previously in section 2. Finally in section 4, this paper discusses the challenges that need to be overcome to develop an effective file sharing system for a distributed operating system and gives some suggestions to how some of them may be overcome.

=Introduction=

File sharing in a distributed environment should differ from that in a local environment. In this paper, whenever a mention of a distributed operating system is made, it will be done so with reference to an Internet based operating system. As such, the distributed environment that will be talked about will be the Internet. Whenever a local environment is mentioned, it will be done so with reference to a local area network.

The scope of this paper is just a review of a few file-sharing systems. The motivation is to determine what challenges need to be addressed in the development of a file sharing system that can be deployed on a distributed operating system.

Discussions in this paper will be on a high level in order to enable readers that do not have strong technical background ease of understanding. However, a small level of computer science or similar background is needed.

=File Sharing systems=

The main differences between different file sharing systems are the modes of access and the methods used to transfer the shared files. There are numerous types of file sharing systems out there; I have categorized them into two types based on scale. Section 2.1 talks about Local Area Network sharing, which can be considered as a small-scale file sharing system. Section 2.2 talks about Internet based file-sharing systems, which can be considered large scale file sharing.

==Local Area Network Sharing==

On a Local Area Network (LAN), the computers present on a LAN have some degree of trust between them. The key advantages to using sharing systems designed for Local Area Networks is the ability to set access restrictions to files being shared and increased transfer speeds. Examples of such are AFP (Apple Filing Protocol) used by Apple and SMB (Server Message Block) used by Windows.

==Internet Based File Sharing==

There are a number of Internet based or online file sharing systems that take different approaches to file sharing. Some examples are peer-2-peer networks, discussed in section 2.2.1, and FTP (File Transfer Protocol), discussed in section 2.2.2.

===Peer-2-peer Systems===

Peer-2-peer is one of the most commonly used file sharing systems out there. User computers act as both client and server nodes and share content in between themselves. There are two main styles to which peer-2-peer file-sharing systems work by, one involves the use of torrents and the other does not.

* Torrent style
Out of all the torrent based peer-2-peer networks Bit-torrent by is the most commonly used today [1]. In itself, Bit-torrent is just a file downloading protocol that enables simulations downloading from different sources holding the exact same file.

* Non-torrent style
This is more of the older style peer-2-pper networks like Kazaa. Unlike torrent networks, there is a centralized server that holds information about who is sharing what files and downloading is done from one single computer to another single computer.

===File Transfer Protocol===

FTP as the name suggests is a file transfer protocol. File transfer is made from a single computer source to a single receiving computer. FTP file systems are often password protected, this is to ensure only authorized users access the files. To access an FTP file system you need to know the IP address or the domain name to the computer to which you want to access. When a file is requested for, the complete file is downloaded onto the requesting computer.

=File Sharing Process=

There are numerous file sharing protocols available and can generally be broken up into three main steps, the sharing of the file itself, the finding for the shared file, and the accessing or transferring of the shared file. In this section we will be discussing the process for peer-2-peer networks and Local Area Networks.

==Sharing the file==

The sharing of the actual file is the process of setting up a file for sharing. Different file sharing systems follow a different process of actually getting a file to be enabled for sharing.

===Peer-2-peer sharing===

Peer-2-peer torrent networks generally follow a submission process towards file sharing. With Bit torrent, a user injects new content buy uploading a torrent file to a torrent search website such as supernova.com and creating a seed with the first copy of the file [1]. Bit torrent has a mediator system that checks the content of files to make sure they are what they say they are. When a user submits a new file, a mediator has to check it before it is allowed into the sharing network. After a user has submitted several files that passed mediation, he will then be promoted to unmediated submitter status. This means the user is trusted enough to submit files that will be directly injected into the sharing network without having to be mediated [1]. Non-torrent peer-2-peer networks don’t follow this submission system; all you have to do to share a file is usually just to place it in the share directory used buy the third-party peer-2-peer application.

There is no notion of setting access restrictions with peer-2-peer file sharing. Users generally have unrestricted access to shared content; they can be downloaded, edited, and re-uploaded by all.

===Local Area Network sharing===

In local Area Networks, setting up a file to be shared does not involve any submission process or mediation. Being that members of the network have some level of trust between them, to setup a file for sharing, all you have to do is go into the file’s properties and enable its sharing property. Access restrictions can also be set to restrict read and or write properties of the files or directories being shared.

* Read only
In this setting the user is only allowed to view contents of the file. This is to say that no changes can be made to the root file. The only way around this is to copy the particular file over and make changes to your local copy.

* Write only
This setting is used on directories. In this setting a directory will be turned into a drop box. That is to say another user on the network can write files to the given directory but cannot view the contents of the directory. Access to read the contents of the directory is only for the owner of the directory.

* Read and Write
This setting will allow the user to make changes the file, and save these changes on to the root file. In this, the file does not need to be copied over. In a directory case, contents of the directory can be modified remotely.

==Locating shared files==

People share files so that themselves and or other people may access it remotely. As such, finding a file that has been shared is a key step in the process of sharing. Methods of locating shared files differ between sharing systems.

===Peer-2-peer file search===

n peer-2-peer systems, finding the shared files you want is pretty easy. Non-torrent networks like Kazaa have a centralized server that holds lists of who is sharing what [3]. In order to search thorough this list, a third-party peer-2-peer application is needed. However cleaning of the file lists on these types of systems is poor which results in users sometimes downloading “fake” files.

In torrent networks like Bit-torrent where the shared files are checked on submission, the likelihood of downloading a fake file is reduced. However, searching for a shared file is done via third party search engines like supernova.com and isohunt.com.

===Local Area Network file search===

In local area networks, in order to find shared files you need to know where the file is located. This is to say that if lets say you are looking for a particular file and you don’t know the location, you may have to comb through the entire network manually in search of this file.

==Transferring the file==

In order to access a file over any network, some level of transfer needs to be made whether temporary or permanent. Files are transferred temporarily only if they only need to be viewed or edited. Files are transferred permanently if it is being copied or moved completely. File sharing systems like peer-2-peer only transfer files permanently, whereas most local file sharing systems over a local area network will only make a permanent transfer when a copy or cut command is executed.

===Peer-2-peer file transfer===

After the user has identified his target file. Depending on the type of the peer-2-peer network, there are two main ways the file can be transferred to the user.

* Single user to single user transfer
In this style of transfer, the complete file is downloaded from a single source. Non-torrent peer-2-peer networks use this style of transfer. Torrent networks only uses this style when dealing with shared files that only have a single seed.

* Multiple users to single user transfer
In this style of transfer, the file is simultaneously downloaded from multiple sources. This is the style more used by torrent networks like Bit torrent. Files shared on torrent networks are split into chunks. The torrent file itself hold information about seeds for the particular shared file. As such, different chunks of the shared file is downloaded simultaneously onto the users computer and reassembled. This way much higher download speeds can be achieved compared to the single-to-single user transfers.

===Local operating system file transfer===

In a local area network setting, files are generally viewed from the root. Technically, the complete or portions of the file are transferred to main memory and then viewed form there, the same way it would if you had a local copy. The only difference being that instead of the transfer being made from your local storage (hard drive) to main memory, the transfer is from a remote storage device somewhere on the network to main memory. The only real reason why this can be done is that transfer speeds over a local network is faster than over the Internet. As such, access restrictions can properly be enforced.

=Sharing of Distributed Files=

When we think of file sharing we generally think of the file location being on our computer. With a distributed file system the location of the file to which we want to share most likely will not physically be on our computer. This brings a level of complexity to the actual sharing of the file.

Sharing of a file in a distributed operating system’s case will have to be scalable enough that it can be deployed over the Internet. This means that traditional AFP and SMB approaches will have difficulty scaling up to the task. Examples of file sharing systems that already work on this level as discussed are peer-2-peer networks and FTP. Defining an effective file sharing system for a distributed operating system the following challenges need to be addressed.

* Transfer speed
When a file is to be transferred it should be done so with the highest speed possible. A torrent approach may not necessarily be a complete answer as multiple copies of the file is needed to improve speed. This will be a huge problem with sensitive files in which a user may not want multiple copies of it located all over the internet.

* Duplicate files
As it is already, common files like music files may have millions of copies located on different computers all over the world. For a distributed file system, having so many copies of the same file is an ineffective use of space and should be avoided where possible.

* File integrity
Corrupted files or fake files are an issue in sharing because they may end up corrupting computers that access the file. One way this is mitigated today is through reporting systems in which users can report a fake or corrupted file to the host or source. Another approach is by plain old checking systems that go through files checking its integrity. In torrent systems, as previously discussed, mediators manually do the checking of files.

* File backup
This is a solution to help file integrity as well as data loss. If it is determined that a file has lost its integrity, there needs to be a mechanism to restore the integrity of the file. The easiest way to do this is to restore the file from a good backup. Data or file loss can happen in a lot of ways, for instance if a server in which the file is stored goes down. In this case, a back up copy needs to be located somewhere else that the user can access.

* Access restrictions
File sharing systems like FTP, AFP and SMB can restrict a users ability to access a particular file with authentication mechanisms. Having such capabilities in a distributed environment for sharing is certainly necessary in order to have a more flexible and restricted sharing ability. AFP and SMB take access restrictions further to also restrict read and write capabilities.

* Search capability
This can be looked at as more of a convenience measure than a need; it would be nice for a user to be able to search through all the shared files that he or she has access. Having this will certainly aid in the development of more user friendly distributed operating systems.

=Conclusion=

File sharing is a need necessary to accomplish many collaborative tasks not only in the work place, but in other areas as well. We have discussed the differences in some of the popular file sharing systems being used today like peer-2-peer networks and Local Area Network file sharing. The similarity between both of these is that the shared files are stored on the host computers. In a distributed environment this may not be the case. Through the study of the current file sharing systems, we have found that in order to develop an effective file sharing system for a distributed operating system, challenges such as, transfer speeds, duplicate files, file integrity, file backup, access restrictions, and search capabilities need to be addressed. Current file sharing systems address some of these issues but no single one addresses all of them properly. As such maybe a hybrid between the Local Area Network sharing and Internet based file sharing is needed.

=References=

[1] J. Pouwelse, P. Garbacki, D. Epema, H. Sips. The Bit-torrent P2P File-Sharing System. Delft University of Technology, Delft, The Netherlands.

[2] R. Bhagwan, S. Savage, and G. M. Voelker. Understanding availability. In Inter- national Workshop on Peer to Peer Systems, Berkeley, CA, USA, February 2003.

[3] B. Cohen. Incentives build robustness in bittorrent. In Workshop on Economics of Peer-to- Peer Systems, Berkeley, USA, May 2003.

[4] S. Saroiu, P. Krishna, G. Steven, D. Gribble. A Measurement Study of Peer-to-peer File Sharing Systems. University of Washington, Seattle, WA, USA.

[5] N. Leibowitz, M. Ripeanu, and A. Wierzbicki. Deconstructing the kazaa network. In 3rd IEEE Workshop on Internet Applications (WIAPP’03), San Jose, CA, USA, June 2003.

[6] R. Sherwood, R. Braud, and B. Bhattacharjee. Slurpie: A cooperative bulk data transfer protocol. In IEEE Infocom, Honk Kong, China, March 2004.

[7] B.T. Loo, J.M. Hellerstein, R. Huebsch, S. Shenker, I. Stoica. Enhancing P2P File-Sharing with an Internet-Scale Query Processor.UC Berkeley. VLDB Conference, Toronto, Canada, 2004.

DistOS-2011W Distributed File Sharing

2011-03-13T18:14:25Z

Omi:

Author: Omi Iyamu
oiyamu@gmail.com

PDF available at [PDF]

Abstract

File sharing is a tool necessary for group collaboration, a simple way to make your files available to others, and nice way to access file contents across multiple machines. This paper discusses on a high-level the different file-sharing systems currently being used and the different strategies they employ to facilitate file sharing. In section 2, different file sharing systems are categorized based on scale into Local Area Network sharing and Internet based sharing. Section 3 discusses the steps involved in the process of sharing an actual file using the different file sharing systems discussed previously in section 2. Finally in section 4, this paper discusses the challenges that need to be overcome to develop an effective file sharing system for a distributed operating system and gives some suggestions to how some of them may be overcome.

=Introduction=

File sharing in a distributed environment should differ from that in a local environment. In this paper, whenever a mention of a distributed operating system is made, it will be done so with reference to an Internet based operating system. As such, the distributed environment that will be talked about will be the Internet. Whenever a local environment is mentioned, it will be done so with reference to a local area network.

The scope of this paper is just a review of a few file-sharing systems. The motivation is to determine what challenges need to be addressed in the development of a file sharing system that can be deployed on a distributed operating system.

Discussions in this paper will be on a high level in order to enable readers that do not have strong technical background ease of understanding. However, a small level of computer science or similar background is needed.

=File Sharing systems=

The main differences between different file sharing systems are the modes of access and the methods used to transfer the shared files. There are numerous types of file sharing systems out there; I have categorized them into two types based on scale. Section 2.1 talks about Local Area Network sharing, which can be considered as a small-scale file sharing system. Section 2.2 talks about Internet based file-sharing systems, which can be considered large scale file sharing.

==Local Area Network Sharing==

On a Local Area Network (LAN), the computers present on a LAN have some degree of trust between them. The key advantages to using sharing systems designed for Local Area Networks is the ability to set access restrictions to files being shared and increased transfer speeds. Examples of such are AFP (Apple Filing Protocol) used by Apple and SMB (Server Message Block) used by Windows.

==Internet Based File Sharing==

There are a number of Internet based or online file sharing systems that take different approaches to file sharing. Some examples are peer-2-peer networks, discussed in section 2.2.1, and FTP (File Transfer Protocol), discussed in section 2.2.2.

===Peer-2-peer Systems===

Peer-2-peer is one of the most commonly used file sharing systems out there. User computers act as both client and server nodes and share content in between themselves. There are two main styles to which peer-2-peer file-sharing systems work by, one involves the use of torrents and the other does not.

* Torrent style
Out of all the torrent based peer-2-peer networks Bit-torrent by is the most commonly used today [1]. In itself, Bit-torrent is just a file downloading protocol that enables simulations downloading from different sources holding the exact same file.

* Non-torrent style
This is more of the older style peer-2-pper networks like Kazaa. Unlike torrent networks, there is a centralized server that holds information about who is sharing what files and downloading is done from one single computer to another single computer.

===File Transfer Protocol===

FTP as the name suggests is a file transfer protocol. File transfer is made from a single computer source to a single receiving computer. FTP file systems are often password protected, this is to ensure only authorized users access the files. To access an FTP file system you need to know the IP address or the domain name to the computer to which you want to access. When a file is requested for, the complete file is downloaded onto the requesting computer.

=File Sharing Process=

There are numerous file sharing protocols available and can generally be broken up into three main steps, the sharing of the file itself, the finding for the shared file, and the accessing or transferring of the shared file. In this section we will be discussing the process for peer-2-peer networks and Local Area Networks.

==Sharing the file==

The sharing of the actual file is the process of setting up a file for sharing. Different file sharing systems follow a different process of actually getting a file to be enabled for sharing.

===Peer-2-peer sharing===

Peer-2-peer torrent networks generally follow a submission process towards file sharing. With Bit torrent, a user injects new content buy uploading a torrent file to a torrent search website such as supernova.com and creating a seed with the first copy of the file [1]. Bit torrent has a mediator system that checks the content of files to make sure they are what they say they are. When a user submits a new file, a mediator has to check it before it is allowed into the sharing network. After a user has submitted several files that passed mediation, he will then be promoted to unmediated submitter status. This means the user is trusted enough to submit files that will be directly injected into the sharing network without having to be mediated [1]. Non-torrent peer-2-peer networks don’t follow this submission system; all you have to do to share a file is usually just to place it in the share directory used buy the third-party peer-2-peer application.

There is no notion of setting access restrictions with peer-2-peer file sharing. Users generally have unrestricted access to shared content; they can be downloaded, edited, and re-uploaded by all.

===Local Area Network sharing===

In local Area Networks, setting up a file to be shared does not involve any submission process or mediation. Being that members of the network have some level of trust between them, to setup a file for sharing, all you have to do is go into the file’s properties and enable its sharing property. Access restrictions can also be set to restrict read and or write properties of the files or directories being shared.

* Read only
In this setting the user is only allowed to view contents of the file. This is to say that no changes can be made to the root file. The only way around this is to copy the particular file over and make changes to your local copy.

* Write only
This setting is used on directories. In this setting a directory will be turned into a drop box. That is to say another user on the network can write files to the given directory but cannot view the contents of the directory. Access to read the contents of the directory is only for the owner of the directory.

* Read and Write
This setting will allow the user to make changes the file, and save these changes on to the root file. In this, the file does not need to be copied over. In a directory case, contents of the directory can be modified remotely.

==Locating shared files==

People share files so that themselves and or other people may access it remotely. As such, finding a file that has been shared is a key step in the process of sharing. Methods of locating shared files differ between sharing systems.

===Peer-2-peer file search===

n peer-2-peer systems, finding the shared files you want is pretty easy. Non-torrent networks like Kazaa have a centralized server that holds lists of who is sharing what [3]. In order to search thorough this list, a third-party peer-2-peer application is needed. However cleaning of the file lists on these types of systems is poor which results in users sometimes downloading “fake” files.

In torrent networks like Bit-torrent where the shared files are checked on submission, the likelihood of downloading a fake file is reduced. However, searching for a shared file is done via third party search engines like supernova.com and isohunt.com.

===Local Area Network file search===

In local area networks, in order to find shared files you need to know where the file is located. This is to say that if lets say you are looking for a particular file and you don’t know the location, you may have to comb through the entire network manually in search of this file.

==Transferring the file==

In order to access a file over any network, some level of transfer needs to be made whether temporary or permanent. Files are transferred temporarily only if they only need to be viewed or edited. Files are transferred permanently if it is being copied or moved completely. File sharing systems like peer-2-peer only transfer files permanently, whereas most local file sharing systems over a local area network will only make a permanent transfer when a copy or cut command is executed.

===Peer-2-peer file transfer===

After the user has identified his target file. Depending on the type of the peer-2-peer network, there are two main ways the file can be transferred to the user.

* Single user to single user transfer
In this style of transfer, the complete file is downloaded from a single source. Non-torrent peer-2-peer networks use this style of transfer. Torrent networks only uses this style when dealing with shared files that only have a single seed.

* Multiple users to single user transfer
In this style of transfer, the file is simultaneously downloaded from multiple sources. This is the style more used by torrent networks like Bit torrent. Files shared on torrent networks are split into chunks. The torrent file itself hold information about seeds for the particular shared file. As such, different chunks of the shared file is downloaded simultaneously onto the users computer and reassembled. This way much higher download speeds can be achieved compared to the single-to-single user transfers.

===Local operating system file transfer===

In a local area network setting, files are generally viewed from the root. Technically, the complete or portions of the file are transferred to main memory and then viewed form there, the same way it would if you had a local copy. The only difference being that instead of the transfer being made from your local storage (hard drive) to main memory, the transfer is from a remote storage device somewhere on the network to main memory. The only real reason why this can be done is that transfer speeds over a local network is faster than over the Internet. As such, access restrictions can properly be enforced.

=Sharing of Distributed Files=

When we think of file sharing we generally think of the file location being on our computer. With a distributed file system the location of the file to which we want to share most likely will not physically be on our computer. This brings a level of complexity to the actual sharing of the file.

Sharing of a file in a distributed operating system’s case will have to be scalable enough that it can be deployed over the Internet. This means that traditional AFP and SMB approaches will have difficulty scaling up to the task. Examples of file sharing systems that already work on this level as discussed are peer-2-peer networks and FTP. Defining an effective file sharing system for a distributed operating system the following challenges need to be addressed.

* Transfer speed
When a file is to be transferred it should be done so with the highest speed possible. A torrent approach may not necessarily be a complete answer as multiple copies of the file is needed to improve speed. This will be a huge problem with sensitive files in which a user may not want multiple copies of it located all over the internet.

* Duplicate files
As it is already, common files like music files may have millions of copies located on different computers all over the world. For a distributed file system, having so many copies of the same file is an ineffective use of space and should be avoided where possible.

* File integrity
Corrupted files or fake files are an issue in sharing because they may end up corrupting computers that access the file. One way this is mitigated today is through reporting systems in which users can report a fake or corrupted file to the host or source. Another approach is by plain old checking systems that go through files checking its integrity. In torrent systems, as previously discussed, mediators manually do the checking of files.

* File backup
This is a solution to help file integrity as well as data loss. If it is determined that a file has lost its integrity, there needs to be a mechanism to restore the integrity of the file. The easiest way to do this is to restore the file from a good backup. Data or file loss can happen in a lot of ways, for instance if a server in which the file is stored goes down. In this case, a back up copy needs to be located somewhere else that the user can access.

* Access restrictions
File sharing systems like FTP, AFP and SMB can restrict a users ability to access a particular file with authentication mechanisms. Having such capabilities in a distributed environment for sharing is certainly necessary in order to have a more flexible and restricted sharing ability. AFP and SMB take access restrictions further to also restrict read and write capabilities.

* Search capability
This can be looked at as more of a convenience measure than a need; it would be nice for a user to be able to search through all the shared files that he or she has access. Having this will certainly aid in the development of more user friendly distributed operating systems.

=Conclusion=

File sharing is a need necessary to accomplish many collaborative tasks not only in the work place, but in other areas as well. We have discussed the differences in some of the popular file sharing systems being used today like peer-2-peer networks and Local Area Network file sharing. The similarity between both of these is that the shared files are stored on the host computers. In a distributed environment this may not be the case. Through the study of the current file sharing systems, we have found that in order to develop an effective file sharing system for a distributed operating system, challenges such as, transfer speeds, duplicate files, file integrity, file backup, access restrictions, and search capabilities need to be addressed. Current file sharing systems address some of these issues but no single one addresses all of them properly. As such maybe a hybrid between the Local Area Network sharing and Internet based file sharing is needed.

=References=

[1] J. Pouwelse, P. Garbacki, D. Epema, H. Sips. The Bit-torrent P2P File-Sharing System. Delft University of Technology, Delft, The Netherlands.

[2] R. Bhagwan, S. Savage, and G. M. Voelker. Understanding availability. In Inter- national Workshop on Peer to Peer Systems, Berkeley, CA, USA, February 2003.

[3] B. Cohen. Incentives build robustness in bittorrent. In Workshop on Economics of Peer-to- Peer Systems, Berkeley, USA, May 2003.

[4] S. Saroiu, P. Krishna, G. Steven, D. Gribble. A Measurement Study of Peer-to-peer File Sharing Systems. University of Washington, Seattle, WA, USA.

[5] N. Leibowitz, M. Ripeanu, and A. Wierzbicki. Deconstructing the kazaa network. In 3rd IEEE Workshop on Internet Applications (WIAPP’03), San Jose, CA, USA, June 2003.

[6] R. Sherwood, R. Braud, and B. Bhattacharjee. Slurpie: A cooperative bulk data transfer protocol. In IEEE Infocom, Honk Kong, China, March 2004.

[7] B.T. Loo, J.M. Hellerstein, R. Huebsch, S. Shenker, I. Stoica. Enhancing P2P File-Sharing with an Internet-Scale Query Processor.UC Berkeley. VLDB Conference, Toronto, Canada, 2004.

DistOS-2011W Distributed File Sharing

2011-03-13T18:11:13Z

Omi:

Author: Omi Iyamu
oiyamu@gmail.com

PDF available at [PDF]

*Abstract*

File sharing is a tool necessary for group collaboration, a simple way to make your files available to others, and nice way to access file contents across multiple machines. This paper discusses on a high-level the different file-sharing systems currently being used and the different strategies they employ to facilitate file sharing. In section 2, different file sharing systems are categorized based on scale into Local Area Network sharing and Internet based sharing. Section 3 discusses the steps involved in the process of sharing an actual file using the different file sharing systems discussed previously in section 2. Finally in section 4, this paper discusses the challenges that need to be overcome to develop an effective file sharing system for a distributed operating system and gives some suggestions to how some of them may be overcome.

=Introduction=

File sharing in a distributed environment should differ from that in a local environment. In this paper, whenever a mention of a distributed operating system is made, it will be done so with reference to an Internet based operating system. As such, the distributed environment that will be talked about will be the Internet. Whenever a local environment is mentioned, it will be done so with reference to a local area network.

The scope of this paper is just a review of a few file-sharing systems. The motivation is to determine what challenges need to be addressed in the development of a file sharing system that can be deployed on a distributed operating system.

Discussions in this paper will be on a high level in order to enable readers that do not have strong technical background ease of understanding. However, a small level of computer science or similar background is needed.

=File Sharing systems=

The main differences between different file sharing systems are the modes of access and the methods used to transfer the shared files. There are numerous types of file sharing systems out there; I have categorized them into two types based on scale. Section 2.1 talks about Local Area Network sharing, which can be considered as a small-scale file sharing system. Section 2.2 talks about Internet based file-sharing systems, which can be considered large scale file sharing.

==Local Area Network Sharing==

On a Local Area Network (LAN), the computers present on a LAN have some degree of trust between them. The key advantages to using sharing systems designed for Local Area Networks is the ability to set access restrictions to files being shared and increased transfer speeds. Examples of such are AFP (Apple Filing Protocol) used by Apple and SMB (Server Message Block) used by Windows.

==Internet Based File Sharing==

There are a number of Internet based or online file sharing systems that take different approaches to file sharing. Some examples are peer-2-peer networks, discussed in section 2.2.1, and FTP (File Transfer Protocol), discussed in section 2.2.2.

===Peer-2-peer Systems===

Peer-2-peer is one of the most commonly used file sharing systems out there. User computers act as both client and server nodes and share content in between themselves. There are two main styles to which peer-2-peer file-sharing systems work by, one involves the use of torrents and the other does not.

* Torrent style
Out of all the torrent based peer-2-peer networks Bit-torrent by is the most commonly used today [1]. In itself, Bit-torrent is just a file downloading protocol that enables simulations downloading from different sources holding the exact same file.

* Non-torrent style
This is more of the older style peer-2-pper networks like Kazaa. Unlike torrent networks, there is a centralized server that holds information about who is sharing what files and downloading is done from one single computer to another single computer.

===File Transfer Protocol===

FTP as the name suggests is a file transfer protocol. File transfer is made from a single computer source to a single receiving computer. FTP file systems are often password protected, this is to ensure only authorized users access the files. To access an FTP file system you need to know the IP address or the domain name to the computer to which you want to access. When a file is requested for, the complete file is downloaded onto the requesting computer.

=File Sharing Process=

There are numerous file sharing protocols available and can generally be broken up into three main steps, the sharing of the file itself, the finding for the shared file, and the accessing or transferring of the shared file. In this section we will be discussing the process for peer-2-peer networks and Local Area Networks.

==Sharing the file==

The sharing of the actual file is the process of setting up a file for sharing. Different file sharing systems follow a different process of actually getting a file to be enabled for sharing.

===Peer-2-peer sharing===

Peer-2-peer torrent networks generally follow a submission process towards file sharing. With Bit torrent, a user injects new content buy uploading a torrent file to a torrent search website such as supernova.com and creating a seed with the first copy of the file [1]. Bit torrent has a mediator system that checks the content of files to make sure they are what they say they are. When a user submits a new file, a mediator has to check it before it is allowed into the sharing network. After a user has submitted several files that passed mediation, he will then be promoted to unmediated submitter status. This means the user is trusted enough to submit files that will be directly injected into the sharing network without having to be mediated [1]. Non-torrent peer-2-peer networks don’t follow this submission system; all you have to do to share a file is usually just to place it in the share directory used buy the third-party peer-2-peer application.

There is no notion of setting access restrictions with peer-2-peer file sharing. Users generally have unrestricted access to shared content; they can be downloaded, edited, and re-uploaded by all.

===Local Area Network sharing===

In local Area Networks, setting up a file to be shared does not involve any submission process or mediation. Being that members of the network have some level of trust between them, to setup a file for sharing, all you have to do is go into the file’s properties and enable its sharing property. Access restrictions can also be set to restrict read and or write properties of the files or directories being shared.

* Read only
In this setting the user is only allowed to view contents of the file. This is to say that no changes can be made to the root file. The only way around this is to copy the particular file over and make changes to your local copy.

* Write only
This setting is used on directories. In this setting a directory will be turned into a drop box. That is to say another user on the network can write files to the given directory but cannot view the contents of the directory. Access to read the contents of the directory is only for the owner of the directory.

* Read and Write
This setting will allow the user to make changes the file, and save these changes on to the root file. In this, the file does not need to be copied over. In a directory case, contents of the directory can be modified remotely.

==Locating shared files==

People share files so that themselves and or other people may access it remotely. As such, finding a file that has been shared is a key step in the process of sharing. Methods of locating shared files differ between sharing systems.

===Peer-2-peer file search===

n peer-2-peer systems, finding the shared files you want is pretty easy. Non-torrent networks like Kazaa have a centralized server that holds lists of who is sharing what [3]. In order to search thorough this list, a third-party peer-2-peer application is needed. However cleaning of the file lists on these types of systems is poor which results in users sometimes downloading “fake” files.

In torrent networks like Bit-torrent where the shared files are checked on submission, the likelihood of downloading a fake file is reduced. However, searching for a shared file is done via third party search engines like supernova.com and isohunt.com.

===Local Area Network file search===

In local area networks, in order to find shared files you need to know where the file is located. This is to say that if lets say you are looking for a particular file and you don’t know the location, you may have to comb through the entire network manually in search of this file.

==Transferring the file==

In order to access a file over any network, some level of transfer needs to be made whether temporary or permanent. Files are transferred temporarily only if they only need to be viewed or edited. Files are transferred permanently if it is being copied or moved completely. File sharing systems like peer-2-peer only transfer files permanently, whereas most local file sharing systems over a local area network will only make a permanent transfer when a copy or cut command is executed.

===Peer-2-peer file transfer===

After the user has identified his target file. Depending on the type of the peer-2-peer network, there are two main ways the file can be transferred to the user.

* Single user to single user transfer
In this style of transfer, the complete file is downloaded from a single source. Non-torrent peer-2-peer networks use this style of transfer. Torrent networks only uses this style when dealing with shared files that only have a single seed.

* Multiple users to single user transfer
In this style of transfer, the file is simultaneously downloaded from multiple sources. This is the style more used by torrent networks like Bit torrent. Files shared on torrent networks are split into chunks. The torrent file itself hold information about seeds for the particular shared file. As such, different chunks of the shared file is downloaded simultaneously onto the users computer and reassembled. This way much higher download speeds can be achieved compared to the single-to-single user transfers.

===Local operating system file transfer===

In a local area network setting, files are generally viewed from the root. Technically, the complete or portions of the file are transferred to main memory and then viewed form there, the same way it would if you had a local copy. The only difference being that instead of the transfer being made from your local storage (hard drive) to main memory, the transfer is from a remote storage device somewhere on the network to main memory. The only real reason why this can be done is that transfer speeds over a local network is faster than over the Internet. As such, access restrictions can properly be enforced.

=Sharing of Distributed Files=

When we think of file sharing we generally think of the file location being on our computer. With a distributed file system the location of the file to which we want to share most likely will not physically be on our computer. This brings a level of complexity to the actual sharing of the file.

Sharing of a file in a distributed operating system’s case will have to be scalable enough that it can be deployed over the Internet. This means that traditional AFP and SMB approaches will have difficulty scaling up to the task. Examples of file sharing systems that already work on this level as discussed are peer-2-peer networks and FTP. Defining an effective file sharing system for a distributed operating system the following challenges need to be addressed.

* Transfer speed
When a file is to be transferred it should be done so with the highest speed possible. A torrent approach may not necessarily be a complete answer as multiple copies of the file is needed to improve speed. This will be a huge problem with sensitive files in which a user may not want multiple copies of it located all over the internet.

* Duplicate files
As it is already, common files like music files may have millions of copies located on different computers all over the world. For a distributed file system, having so many copies of the same file is an ineffective use of space and should be avoided where possible.

* File integrity
Corrupted files or fake files are an issue in sharing because they may end up corrupting computers that access the file. One way this is mitigated today is through reporting systems in which users can report a fake or corrupted file to the host or source. Another approach is by plain old checking systems that go through files checking its integrity. In torrent systems, as previously discussed, mediators manually do the checking of files.

* File backup
This is a solution to help file integrity as well as data loss. If it is determined that a file has lost its integrity, there needs to be a mechanism to restore the integrity of the file. The easiest way to do this is to restore the file from a good backup. Data or file loss can happen in a lot of ways, for instance if a server in which the file is stored goes down. In this case, a back up copy needs to be located somewhere else that the user can access.

* Access restrictions
File sharing systems like FTP, AFP and SMB can restrict a users ability to access a particular file with authentication mechanisms. Having such capabilities in a distributed environment for sharing is certainly necessary in order to have a more flexible and restricted sharing ability. AFP and SMB take access restrictions further to also restrict read and write capabilities.

* Search capability
This can be looked at as more of a convenience measure than a need; it would be nice for a user to be able to search through all the shared files that he or she has access. Having this will certainly aid in the development of more user friendly distributed operating systems.

=Conclusion=

File sharing is a need necessary to accomplish many collaborative tasks not only in the work place, but in other areas as well. We have discussed the differences in some of the popular file sharing systems being used today like peer-2-peer networks and Local Area Network file sharing. The similarity between both of these is that the shared files are stored on the host computers. In a distributed environment this may not be the case. Through the study of the current file sharing systems, we have found that in order to develop an effective file sharing system for a distributed operating system, challenges such as, transfer speeds, duplicate files, file integrity, file backup, access restrictions, and search capabilities need to be addressed. Current file sharing systems address some of these issues but no single one addresses all of them properly. As such maybe a hybrid between the Local Area Network sharing and Internet based file sharing is needed.

=References=

[1] J. Pouwelse, P. Garbacki, D. Epema, H. Sips. The Bit-torrent P2P File-Sharing System. Delft University of Technology, Delft, The Netherlands.

[2] R. Bhagwan, S. Savage, and G. M. Voelker. Understanding availability. In Inter- national Workshop on Peer to Peer Systems, Berkeley, CA, USA, February 2003.

[3] B. Cohen. Incentives build robustness in bittorrent. In Workshop on Economics of Peer-to- Peer Systems, Berkeley, USA, May 2003.

[4] S. Saroiu, P. Krishna, G. Steven, D. Gribble. A Measurement Study of Peer-to-peer File Sharing Systems. University of Washington, Seattle, WA, USA.

[5] N. Leibowitz, M. Ripeanu, and A. Wierzbicki. Deconstructing the kazaa network. In 3rd IEEE Workshop on Internet Applications (WIAPP’03), San Jose, CA, USA, June 2003.

[6] R. Sherwood, R. Braud, and B. Bhattacharjee. Slurpie: A cooperative bulk data transfer protocol. In IEEE Infocom, Honk Kong, China, March 2004.

[7] B.T. Loo, J.M. Hellerstein, R. Huebsch, S. Shenker, I. Stoica. Enhancing P2P File-Sharing with an Internet-Scale Query Processor.UC Berkeley. VLDB Conference, Toronto, Canada, 2004.

DistOS-2011W Distributed File Sharing

2011-03-12T04:41:16Z

Omi: Created page with "Author: Omi Iyamu oiyamu@gmail.com PDF available at [PDF] =Abstract= File sharing is a tool necessary for group collaboration, a simple way to make your files available to …"

Author: Omi Iyamu
oiyamu@gmail.com

PDF available at [PDF]
=Abstract=
File sharing is a tool necessary for group collaboration, a simple way to make your files available to others, and nice way to access file contents across multiple machines. This paper discusses on a high-level the different file-sharing systems currently being used and the different strategies they employ to facilitate file sharing. In section 2, different file sharing systems are categorized based on scale into Local Area Network sharing and Internet based sharing. Section 3 discusses the steps involved in the process of sharing an actual file using the different file sharing systems discussed previously in section 2. Finally in section 4, this paper discusses the challenges that need to be overcome to develop an effective file sharing system for a distributed operating system and gives some suggestions to how some of them may be overcome.

=1.0 Introduction=
File sharing in a distributed environment should differ from that in a local environment. In this paper, whenever a mention of a distributed operating system is made, it will be done so with reference to an Internet based operating system. As such, the distributed environment that will be talked about will be the Internet. Whenever a local environment is mentioned, it will be done so with reference to a local area network.

The scope of this paper is just a review of a few file-sharing systems. The motivation is to determine what challenges need to be addressed in the development of a file sharing system that can be deployed on a distributed operating system.

Discussions in this paper will be on a high level in order to enable readers that do not have strong technical background ease of understanding. However, a small level of computer science or similar background is needed.

=2.0 File Sharing systems=
The main differences between different file sharing systems are the modes of access and the methods used to transfer the shared files. There are numerous types of file sharing systems out there; I have categorized them into two types based on scale. Section 2.1 talks about Local Area Network sharing, which can be considered as a small-scale file sharing system. Section 2.2 talks about Internet based file-sharing systems, which can be considered large scale file sharing.

==

==Local Naming==
The Sun Network File System (NFS) specifies that each client sees a UNIX file
namespace with a private root. Due to each client being free to manage
its own namespace, several workstations mounting the same remote directory
might not have the same view of the files contained in that directory. However,
if file-sharing or location transparency is required, it can be achieved by
convention (e.g., users agreeing on calling a file a specific name) rather than
by design.

One of the first distributed file systems, the Apollo DOMAIN File System
[6] uses 64-bit unique identifiers (UIDs) for every object in the
system. Each Apollo client also has a UID created the time of its manufacture.
When a new file is created, the UID for that file is derived from the time and
UID of the file's workstation (this guarantees uniqueness of UIDs per fil
e without a
central server assigning them).

The Andrew file system [4] uses an internal 96-bit identifier for
uniquely identifying files. These identifiers are used in the background to
refer to files, but are never shown to users. Andrew clients see a partitioned
namespace comprised of a local and shared namespace. The shared namespace is
identical on all workstations, managed by a central server which can be
replicated. The local namespace is typically only used for files required to
boot an Andrew client, and to initialize the distributed client operation.

==Cryptographic Naming==
OceanStore [5] stores objects at the lowest level by identifying
them with a
globally unique identifier (GUID). GUIDs are convenient in distributed
systems because they do not require a central authority to give them out. This
allows any client on the system to autonomously generate a valid GUID
with low probability of collisions (GUIDs are typically long bit strings e.g.,
more than 128 bits). At the same time, the benefit of an autonomous,
de-centralized namespace management allows for malicious clients to hijack
someone else's namespace and intentionally create collisions. To address this
issue, OceanStore uses a technique proposed by Mazieres et al. [7]
called
''self-certifying path names'' .

Self-certifying pathnames have all the benefits of public key cryptography
without the burden of key management, which is known to be difficult,
especially at a very large scale. One of the design goals of self-certifying
pathnames is for clients to cryptographically verify the contents of any file
on the network, without requiring exernal information. The novelty of this
approach is that file names inherently contain all information necessary to
communicate with remote servers. Essentially, an object's GUID is the secure
hash (SHA-1 or similar) of the object's owner's key and some human readable
name. By embedding a client key into the GUID, servers and other clients can
verify the identity and ownership of an object without querying a
third-party server.

Freenet [2] also uses keypair-based naming but in a slightly
different way than OceanStore. Freenet identifies all files by a binary key
which is obtained by applying a hash function. There are three types of keys in
this distributed file system:

'''Keyword-signed key (KSK)''' This is the simplest identifier because it
is derived from an arbitrary text string chosen by the user who is storing the
file on the network. A user storing a PDF document might use the text string
"freenet/distributed/file/system" to describe the file. The string is used to
deterministically generate a private/public keypair. The public part of the key
is hashed and becomes the file identifier.

We note that files can be recovered by guessing or bruteforcing the text
string. Also, nothing stops two different users from coming up with the same
descriptive string, and the second user's file would be rejected by the system,
as there would be a collision in the namespace.

'''Signed-subspace key (SSK)''' This method enables personal namespaces
for users. For this to work, users generate a public/private keypair using a
good random number generator. The user also creates a descriptive text string,
but in this case, it is XORed with the public key to generate the file key.
This method allows users to manage their own namespace (i.e., collisions can
still occur locally if the user picks the same string for two files). Users can
also
publish a list of keywords and a public key if they want to make those files
publicly available.

'''Content-hash key (CHK)''' In this method, the file key is derived by
hashing the contents of file. Files are also encrypted with a random encryption
key specific to that file. For others to retrieve the file, the owner makes
available the file hash along with the decryption key.

==Hierarchical naming==
Cheriton et al. [1] suggest naming objects using a long
name which includes multiple pieces of information: (1) the resource's name
and location on the file server where it resides; (2) the organization where
that file server is located; and (3) a global administrative domain
representing all the organizations participating the distributed file system.
For example a file name of "[edu/standford/server4/bin/listdir" is split
into:[edu (Gobal domain), /stanford/server4 (organization domain), and /bin/listdir (directory and file)

This naming scheme gives clients all the necessary information (using only the
file name) to locate a file in a globally distributed file system. While this
may seem like a good solution, there a few inherent limitations to the
proposal.

First, file replication and load balancing can only be done at the lowest level
(i.e., in the file server selected by the organization hosting the file). This
can lead to a bottleneck when multiple files in the same organization become
"hot". The authors suggest using caching and multicast to improve performance
and avoid congestion on inter-organization links. Second, it requires all
organizations participating in the system to agree or regulate the common
namespace, much like the current Domain Name System (DNS). For this to work
there must be an organization in which each stakeholder in the system is
equally represented. While systems like these do exist currently (e.g.,
ICANN (The Internet Corporation for Assigned Names and Numbers (ICANN)
is a non-profit organization that represents regional registrars, the Internet
Engineering Task Force (IETF), Internet users and providers to help keep the
Internet secure, stable and inter-operable.)), they have large amounts of
administrative overhead and therefore limit the speed at which changes to
deployed implementations can take place.

One advantage of the approach of Cheriton et al. is that names and directory
structures must only be unique within an organization/server. The system as a
whole does not have to keep track of every organization-level implementation,
yet different organizations should still be able to exchange data.

==Metadata Servers==
The Google File System (GFS) [3] takes a different approach to
naming files. GFS assumes that all the clients communicate with a single master
server, who keeps a table mapping full pathnames to metadata (file locks and
location). The namespace is therefore centrally managed, and all clients must
register file operations with the master before they can be performed. While
this architecture has an obvious central point of failure (which can be
addressed by replication), it has the advantage of not having to deal with a
distributed namespace. This central design also has the advantage of improving
data consistency across multi-level distribution nodes. It also allows data
to be moved to optimal nodes to increase performance or distribute load. It's
worth noting that lookup tables are a fundamentally different way to find
contents in a directory as compared to UNIX ''inodes'' and related data
structures. This approach has inherent limitations such as not being able to
support symlinks .

Ceph [11] client nodes use near-POSIX file system interfaces which are
relayed back to a central metadata cluster. The metadata cluster is responsible
for managing the system-wide namespace, coordinating security and verifying
consistency. Ceph decouples data from metadata which enables the system to also
distribute metadata servers themselves. The metadata servers store pointers to
"object-storage clusters" which hold the actual data portion of the file. The
metadata servers also handle file read and write operations, which then
redirect clients to the appropriate object storage cluster or device.

=Locating Resources=

==Local File Systems==
In some distributed systems, files are copied locally and replicated to remote
servers in the background. NFS [9] is one example where clients
mount the remote file system locally. The remote directory structure is mapped
on to a local namespace which makes files transparently accessible to
clients. In this scheme, there is no need for distributing indexes or metadata,
since all files appear to be local. A client can find files on the
"distributed" file system in the same way local files are found.

==Metadata Servers==
File systems that use lookup tables for storing the
location and
metadatada of files (e.g., [3,11]) can locate resources trivially
by
querying the lookup table. The table usually contains a pointer to either the
file itself or a server hosting that file who can in turn handle the file
operation request.

A very basic implementation of a metadata lookup is used in the Apollo Domain
File System [6]. A central name server maps client-readable strings
(e.g., "/home/dbarrera/file1" ) to UIDs. The name server can be
distributed by replicating it a multiple locations, allowing clients to query
the nearest server instead of a central one.

The Andrew file system [4] uses unique file identifiers to
populate a ''location database'' on the central server which maps file
identifiers to locations. The server is therefore responsible for forwarding
file access requests to the correct client hosting that file.

==Distributed Index Search==
Systems like Freenet [2] by design want to make it difficult for
unauthorized users to access restricted files. This is a difficult problem,
since the system aims to be highly distributed, but at the same time provide
guarantees that files won't be read or modified by unauthorized third-parties.
However, Freenet has developed an interesting approach to locating files: when
a file is requested from the network, a user must first obtain or calculate the
file key. The user's node requests that file
from neighboring nodes, who in turn check if the file is stored locally, and if
not forward the request to the next nearest neighbor. If a node cannot forward
a request any longer (because a loop would be created or all nodes have
already been queried), then a failure message is transmitted back to the
previous node. If a file is found at some point along the request path,
then the file is sent back through all the intermediate nodes until it reaches
the request originator, which allows these intermediate nodes to keep a copy of
the file as a cache. The next time that file key is requested, a node which is
closer might have it, which will increase the retrieval speed. Nodes
"forget" about cached copies of files in a least recently used (LRU) manner,
allowing the network to automatically balance load and use available space
optimally.

Distributing a file index was proposed Plaxton et al. [8] as well.
Their proposal however attempts have all nodes in the network maintain a
''virtual tree'' . The tree information is distributed such that each node
knows about copies of files residing on itself and all nodes that form the
subtree rooted at that node. All nodes are constantly being updated with
neighbor information, meaning that new nodes slowly obtain tree information to
become the roots of their subtrees. This method has the advantage of
distributing load and providing a hierarchical search functionality that can
use well known algorithms (BFS, DFS) to find resources on a network.

==Pseudo-random Data Distribution==
Ceph [11] distributes data through a method that maximizes bandwidth and
efficiently uses storage resources. Ceph also avoids data imbalance (e.g.,
new devices are under-used) and load-asymmetries (e.g., often requested data
placed on only new devices) with a globally known algorithm called CRUSH
(Controlled Replication Under Scalable Hashing). By using a predefined number
of ''placement groups'' (the smallest unit of object storage groups), the
CRUSH algorithm stores and replicates data across the network in a
pseudo-random way. This algorithm tells the metadata servers both where the
data should be stored and where it can be found later, which helps clients and
metadata servers in locating resources.

=Conclusions=
This paper has presented a brief survey of distributed file system research
conducted over the past 20 years. A wide range of distributed file systems have
been designed to have varying levels of scalability, usability and efficiency.
Depending on the requirements of a distributed file system, different approaches
may be taken to address two main concerns: file naming and file retrieval.
Unfortunately there is no clear winner in either of these categories, which
means that selecting the "right" method for a given file system will always
depend on the requirements and users of that system.

=References=
[1] D. R. Cheriton and T. P. Mann. Decentralizing a global naming service for improved performance and fault tolerance. ACM Transactions on Computer Systems, 7:147–183, 1989.

[2] I. Clarke, O. Sandberg, B. Wiley, and T. Hong. Freenet: A distributed anonymous information storage and retrieval system. In Designing Privacy Enhancing Technologies, pages 46–66. Springer, 2001.

[3] S. Ghemawat, H. Gobioﬀ, and S. Leung. The Google ﬁle system. ACM SIGOPS Operating Systems Review, 37(5):29–43, 2003.

[4] J. Howard and C.-M. U. I. T. Center. An overview of the Andrew ﬁle system. Citeseer, 1988.

[5] J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, C. Wells, et al. Oceanstore: An architecture for global-scale persistent storage. ACM SIGARCH Computer Architecture News, 28(5):190–201, 2000.

[6] P. Levine. The Apollo DOMAIN Distributed File System. NATO ASI Series: Theory and Practice of Distributed Operating Systems, Y. Paker, JP. Banatre, M. Bozyi git, pages 241–260.

[7] D. Mazieres, M. Kaminsky, M. Kaashoek, and E. Witchel. Separating key management from ﬁle system security. ACM SIGOPS Operating Systems Review, 33(5):124–139, 1999.

[8] C. G. Plaxton, R. Rajaraman, A. W. Richa, and A. W. Richa. Accessing nearby copies of replicated objects in a distributed environment. pages 311–320, 1997.

[9] M. Satyanarayanan. A survey of distributed ﬁle systems. Annual Review of Computer Science, 4(1):73–104, 1990.

[10] M. Satyanarayanan, J. Kistler, P. Kumar, M. Okasaki, E. Siegel, and D. Steere. Coda: a highly available file system for a distributed workstation environment. Computers, IEEE Transactions on, 39(4):447–459, Apr. 1990.

[11] S. Weil, S. Brandt, E. Miller, D. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceedings of the 7th symposium on Operating systems design and implementation, pages 307–320. USENIX Association, 2006.

Distributed OS: Winter 2011

2011-03-12T02:24:26Z

Omi: /* Literature review paper (graduate students) */

==Evaluation==

Grades in this class will be determined based on the following criteria.

Undergraduate Students:
* 20% Class participation
* 20% Wiki participation
* 10% Group project oral presentation (April 5th in class)
* 30% Group project written report (Due April 11th)
* 20% Implementation report (Due March 1st)

Graduate Students:
* 15% Class participation
* 20% Wiki participation
* 10% Group project oral presentation (April 5th in class)
* 30% Group project written report (Due April 11th)
* 25% Literature review paper (Due March 1st)

Proposals for Implementation reports & Literature reviews should be emailed to Prof. Somayaji by '''February 1st'''.

==Using the Wiki==

All of the standard Mediawiki functions are available on this wiki in addition to the following extensions:

* [http://www.mediawiki.org/wiki/Extension:Cite/Cite.php Cite]: for easier references/endnotes
* [http://www.mediawiki.org/wiki/Extension:GraphViz GraphViz]: for inline graph drawing
* [http://www.mediawiki.org/wiki/Extension:SyntaxHighlight_GeSHi SyntaxHighlight]: for source code syntax highlighting (be sure to use the "source" tag)

==Implementation reports (undergrads)==

An implementation report is a 5-10 page paper that either
# describes in detail one existing software system with distributed OS-like properties,
# compare and contrasts an important characteristic of 3 or more software systems with distributed OS-like properties, or
# reports on experiences setting up and using a software system with distributed OS-like properties.
Topics for an implementation report must be approved by Prof. Somayaji.

Implementation reports for Winter 2011:
* [[DistOS-2011W NTP |NTP]]
* [[DistOS-2011W Globus |Globus Toolkit]]
* [[DistOS-2011W Implementation Template|Implementation Template]]
* [[DistOS-2011W BigTable|BigTable]]
* [[DistOS-2011W Cassandra and Hamachi|Cassandra and Hamachi]]
* [[DistOS-2011W Wuala |Wuala]]
* [[DistOS-2011W FWR |FWR]]
* [[DistOS-2011W Plan 9| Plan 9]]
* [[DistOS-2011W Akamai and CDN| Akamai and CDN]]
* [[DistOS-2011W Diaspora| Diaspora]]
* [[DistOS-2011W Eucalyptus |Eucalyptus]]
* [[DistOS-2011W Jolicloud |Jolicloud]]

Students: please add your report above following the template.

==Literature review paper (graduate students)==

The literature review paper should be a 8-12 page paper that reviews research and well-known commercial work in an area of distributed operating systems research or a closely related area.

Literature Review papers for Winter 2011:
* [[DistOS-2011W Naming and Locating Objects in Distributed Systems|Naming and Locating Objects in Distributed Systems]]
* [[DistOS-2011W Distributed File Sharing|Distributed File Sharing]]
* [[DistOS-2011W User Controlled Bandwidth: How Social Protocols Affect Network Protocols and Our Need for Speed|User Controlled Bandwidth]]
* [[DistOS-2011W General Purpose Frameworks for Performance-Portable Code|General Purpose Frameworks for Performance-Portable Code]]
* [[DistOS-2011W Distributed Data Structures: a survey|Distributed Data Structures: a survey]]
* [[DistOS-2011W Distributed File System Security|Distributed File System Security]]
* [[DistOS-2011W Real-Time Distributed Operating Systems|Real-Time Distributed Operating Systems]]
* [[DistOS-2011W Failure Detection in Distributed Systems|Failure Detection in Distributed Systems]]
Students: please add your paper above.

==Group Projects==
# [[DistOS-2011W Observability & Contracts|Observability & Contracts]]: How do I observe the acts of other agents, particularly "public" acts? How can make contracts between computers (promises to exchange actions in present for actions in the future)?
# [[DistOS-2011W Attribution|Attribution]]: How do we know who did what?
# [[DistOS-2011W Reputation|Reputation]]: How do we remember and disseminate knowledge of past actions?
# [[DistOS-2011W Justice|Justice]]: Given that we can gather evidence of misbehavior, how can that evidence be assembled, judged, and the resulting decision enforced?
# [[DistOS-2011W Public Goods|Public Goods]]: How can we build and maintain public goods (e.g., indices, caches)?

==Readings==

===January 13, 2011===
[http://keys.ccrcentral.net/ccr/writing/ CCR] (two papers)

===January 18, 2011===
[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/oceanstore-sigplan.pdf OceanStore] and [http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/fast2003-pond.pdf Pond]

===February 3, 2011===

*'''[http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=1450841 Robert E. Kahn, "Resource-Sharing Computer Communications Networks" (1972)]:'''
* [http://video.google.com/videoplay?docid=4989933629762859961 Computer Networks - The Heralds of Resource Sharing] (video - optional).

===February 8, 2011===

* Karlin et al. (2008), [http://dx.doi.org.proxy.library.carleton.ca/10.1016/j.comnet.2008.06.012 Autonomous security for autonomous systems].

Optional readings:

* O'Donnell (2009), [http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=5350725 Prolog to A Survey of BGP Security Issues and Solutions]
* Butler et al. (2009), [http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=5357585 A Survey of BGP Security Issues and Solutions]

===February 10, 2011===

* Savage et al. (2000), [http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-8-4.pdf Practical Network Support For IP Traceback].

===February 15, 2011===

* Satyanarayanan et al. (1990), [http://dx.doi.org.proxy.library.carleton.ca/10.1109/12.54838 Coda: a highly available file system for a distributed workstation environment].
* Ghemawat et al. (2003), [http://labs.google.com/papers/gfs.html The Google File System].

===February 17, 2011===

* Weil et al. (2006), [http://www.usenix.org/events/osdi06/tech/weil.html Ceph: A Scalable, High-Performance Distributed File System].

===March 1, 2011===
* Oda et al. (2008), [http://people.scs.carleton.ca/~soma/pubs/oda-ccs-08.pdf SOMA: Mutual Approval for Included Content in Web Pages].
* Oda & Somayaji (2008), [http://people.scs.carleton.ca/~soma/pubs/oda-asia-08.pdf Content Provider Conflict on the Modern Web].

===Problems to Solve===
*Attack computers with almost no consequences
**DDoS
**botnets
**capture and analyze private traffic
**distribute malware
**tampering with traffic
**Unauthorized access to data and resources
**Impersonate computers, individuals, applications
**Fraud, theft
**regulate behavior

===Design Principles===
*subjects of governance: programs and computers
*bind programs and computers to humans & human organizations, but recognize binding is imperfect
*recognize that "bad" behavior is always possible. "good" behavior is enforced through incentives and sanctions.
*rules will change. Even rules for rule changes will change. Need a "living document" governing how rules are chosen and enforced.

==Scenarios==

===1: Stopping DDoS===
Group members: Seyyed, Andrew Schoenrock, Thomas McMahon, Lester Mundt, AbdelRahman, Rakhim Davletkaliyev

*Have the machine routing packets(could be ISP provider) detect suspicious packets, if the packets are signed, then those suspicious packets could be blocked,
the sender could be put on a black list.

* (AS) Stopping DDoS against files, services, programs, etc
** (AS) Have file replication built into the system (similar to OceanStore) so that files are always available from different servers
** (AS) If files are not replicated then we could have a tiered messaging system (at the top level would be OS messages) and servers could then prioritize the incoming traffic. If a given server is experiencing an overload, it could send out a distress signal to its neighbours and then distribute what it is has to them. The system should have a built-in mechanism to re-balance the overall load after something like this happens. This would then mean that any DDoS attack would result in the service being more available.
*** I like this idea of having service fallover
*** Expanding on the idea of file replication and sending distress signals to it's neighbours, I could envision a group of servers that would learn to help each other out. Lending processing and storage when they are under utilized. The would sort of form a collective, club or gang. Members who didn't contribute ( always fully utilized ) would eventually be identified and banned. It would be these other computers that the targeted server would rely on for help in this situation. However cool this is it isn' really a solution because one could suppose the attackers might utilize the same strategy to recruit additional help in there attack.

* (AS) Stopping DDoS against specific machines
** (AS) I don't think that this should be specifically addressed. I think measures introduced to guard against this will ultimately negatively impact the overall system in terms of performance.
*** I don't like the idea of sacrificing the one for the many though.
**** (AS) The main thing with what I've proposed is that the motivation behind doing a DDoS attack is completely gone (by doing one a service would either maintain or increase its overall availability). I think by eliminating the main result of a DDoS attack would mean that there would be no reason to guard against DDoS attacks on a specific machine.

*Stopping DDoS
** Many of the DDoS attacks utilize the property of anonymity. These services serve anyone who requests there service. Many DDoS attacks then ensure sufficient traffic that the computer behind the service can no longer cope. If we remove anonymity and only serve 'known' parties the spurious requests would be ignored. So we need to 'know' who our friends are.
*** This of course requires a form of unspoofable authentication unlike IP.
**** (RD) Serving only 'known' parties reduces the distribution of information, or at least its rate. I was thinking of removing anonymity on a lower level, so that any party that's not anonymous while sending a packet to your machine is considered 'known', and anything unknown (unsigned, unrepresented in some way) is blocked. So, we don't really need to 'know' who our friends are, we just need to know who aren't.
**** (RD) Another thing I had in mind is punishment in case a 'known' party participates in DDoS-attack: not punishing the owner of that machine (who probably is a victim as well), but the software or hardware in some sense.

*Stopping DDoS
** (RD) How about developing such a network topology and protocols that make DDoS attacks less efficient or harder to perform? Some sort of CAPTCHA, but for machines and protocols, to distinguish them from bots, maybe?

*Stopping DDoS
** I'm not sure what it means by stopping, I don't think we can stop DDos given the way things are currently ran, we can only block it. From my knowledge most softwares that stop DDoS do so by blocking, or even complete shut down like Mccolo.

*Stopping DDos
**One method is to use the same way of eliminating DoS by rejecting a specific rate of subsequent requests but from irrelevant sources.

*How we could stop DDoS would be to have each connection to the internet assigned to a particular identity. This identity would be used to verify who is attempting connections. The reason DDoS works is because currently, IP addresses can be spoofed. The only way to verify an identity is to request a response, but by then the damage is done. With a verified identity, connection attempts being routed can be verified during transmission, so that the request may not necessarily even reach the destination host.

Basically, we need some encryption system using keys so that as the packets are being routed, the identity of the packet's sender can be verified. Ideally the decryption would be trivial so as to prevent noticeable latency. Because an identity is verified, if there is spoofing of packets, they would be dropped during the routing. If all the identities are verified and are still attempting a DDoS attack, the attacker's identity will be traced back to the attacker.

(RD) (I think we're not looking low enough. We're trying to find a solution for this problem assuming the system that made that problem possible is still unchanged. We enforce more security by identification, encryption, etc, but the system is still problem-prone. This will allow to identify an attacker, but after the attack was started (or even finished). It's like trying to eliminate theft from a society of poor, unemployed, uneducated people by enforcing more security and punishment. Which will help to reduce the rate and motivation, but can't stop the possible attack. It is pretty stupid analogy, but rather than policing that society, I want to make them rich, employed and educated, so that thefts are just not efficient way of getting goods for them. So, rather than protecting machines from attacks, I want to make the system where DDoS-attacks are just inappropriate.)

===2: Stopping phishing===
Group members: Waheed Ahmed, Nicolas Lessard, Raghad Al-Awwad, Tarjit Komal

* A way of automatically checking the signature of a message to make sure it really is from a trusted source.
** ie: "Nation of Banks, did your member TD send me a message to reset my password?"

*There should be filters to ensure where the message is coming from.If the message is coming from unknown source , it should be blocked.
*Don't use the links in an email to get to any web page, if you suspect the message might not be authentic.
*Avoid filling out forms in email messages that ask for personal financial information. Phishers can make exact forms which you can find on financial institution.
*Make is so a machine needs to be authorized to use your information -- A machine that you don't own can't use your information to do anything, regardless of whether he has it or not.
*Ensure that any website that requires the filling of personal information be a secure website which can be traced to the original organisation.
*Ensure that whatever browser you are using is up to date with the most recent security patches applied.
*Obviously, report and suspected phishing to the appropriate authorities so that proper action can be taken
*"three strikes and you're out"
**Each machine is responsible for the massages it releases. When a machine is a repeat offender it loses access privileges
*Revamp the security login process to something similar to:
**User enters username and clicks next.
**Server returns a user predefined image to the User.
**If image is the right image then user enters password to logon.

===3: Limiting the spread of malware===
Group members: keith, Andrew Luczak, David Barrera, Trevor Gelowsky, Scott Lyons
*(KM) Heterogenous systems - it is much easier to write code to attack a single type of system
*(KM) Individualized security policies
**(AL) A baseline security level would help prevent malware spreading to/from a system with "individual non-security"
*(KM) Identify all programs through digital signatures
*(KM) Peer rating system for programs, customize security policies based on peer ratings
**(SL) Need some way to keep rating system from being "gamed"
***(AL) Maybe a program gets flagged if it experiences a rapid approval increase?
**(AL) Need to protect against benign programs with good ratings being updated into malware
*(KM) System level forensics on program execution and resource/file modification
*(KM) Customizable user and program blacklists
*(SL) Sandboxing with breach management - know what files have been modified by a process
*(SL) Trending - what does the application spend most of its time doing?

*(DB)Multiple control/chokepoints where malware is looked for. This way, it's more difficult for attackers to take over several control points and for malware to remain unnoticed.
*(DB)Heterogeneous systems help limit the spread of malware too. There's 2 points here. (1) If we're designing this system where we're all masters of our own domains, then we're likely to have different system configurations. However (2), if we want to communicate and interact with other domains, we need some standardized communication layer or mechanism. Standardization is very closely tied to homogeneous.
*(DB)There should be consequences if you harbor malware or if malware originates from within your domain. This could be and incentive to help people be more proactive in terms of security.

===4: Bandwidth hogs===
Group members: Mike Preston, Fahim Rahman, Michael Du Plessis, Matthew Chou, Ahmad Yafawi

*limit bandwidth for each user
*if user has significant bandwidth demands for a certain period of time
**add them to a watch list
**monitor their behaviour
**divert communication to other hosts that can satisfy requests.
***if there are no other hosts that can satisfy the request, then distribute data to other idle and capable hosts. Load is now reduced on the one link.
*QoS
*Tiered Bandwidth Distribution
**The main idea is you get more bandwidth to your machine as much as you give back to the community.
***It's similar to some trackers and dark net programs in which they wont increase your download speed unless you contribute X amount of Bytes back to your peers.
**Tier 1, Basic privileges i.e. all machines have minimal bandwidth.
**Tier n, we define some requirements to be met then we increase bandwidth accordingly.
***Drop a Tier if machine doesn't maintain the specified requirements of that specific tier.
***Advantage, monitoring bandwidth on the network is cheap while implementing what is stated above is not.
*As a metaphor to our "real world society", bandwidth control can be treated as we do speed for cars.
**Certain areas need more free flowing traffic, so speed limits are increased. Others require a slower pace which is enforced. These "areas" can be translated to users or programs in our distributed OS model
**There are repercussions to breaking any of these imposed limits
**Throttling provides once possible implementation of these constraints

====Bandwidth Hog Additional Sources and Information====
1. [http://repository.lib.ncsu.edu/ir/bitstream/1840.16/1197/1/etd.pdf A Solution to Bandwidth Hogs in a Cable Network]
*Starting at page 120 of this thesis is a proposed solution to bandwidth hogs on a cable network. In general, the proposal suggests a solution essentially equal to throttling however I did find the description of the solution to be helpful. I feel it may go well with our tiered suggestion if we were to keep the "earned trust" approach to bandwidth access but at the same time allow users in low congestion times to go above their tier. For example, if congestion is low, why not allow the people on the network to occupy much larger bandwidths. On the network include some form of monitoring protocol which can decide how much access a user is allowed. If more bandiwdth is available, let them have it if it is needed for their request. On the other hand, if congestion is high, the user will be capped at the upper limit of their bandwidth capacity if they are doing something that requires a large amount of bandwidth. In this manner each user will be guaranteed the amount they have earned at their tier, however if they do not want to earn a higher level for high usage timeframes they can instead opt to make use of low congestion timeframes and run their bandwidth heavy applications at that time. The network could also include live data regarding the current bandwidth usage levels as well as trending data so that people can plan when to start bandwidth heavy applications.

2. [http://yuba.stanford.edu/rcp/flowCompTime-dukkipati.pdf Why Flow-Completion Time is the Right Metric for Congestion Control]
*This is a short article which raises an interesting question related to our topic, how should we determine what is considered "bandwidth hogging". For example, do we look at the strain on the network in some capacity (i.e. dropped packets, usage level of the capacity of the pipe,etc.) which is important information for those who build the network; or do we make use of the time it takes for some transaction to occur when a user requests it? This article argues that from a user's point of view, they do not care how much bandwidth they get as long as the task they are requesting is completed as quickly as possible. In our discussion in class we had talked about how majority of people currently do not require large bandwidth needs for normal transactions ( email, web searching, wikis ;-) ), and a much smaller percentage of the population are the ones who actually eat up the larger bandwidth through hog-like applications. Maybe instead of focusing on the bandwidth as the main issue, we should think about how long it takes to complete tasks. Maybe our tiered system would also incorporate some aspect of this train of thought, i.e. people who only send email and surf the web are at tier one, people who use online storage and FTP are on level 2, people who stream movies and other data are at level 3, etc. Then, we could have each tier cost a separate amount and apply some form of control on the technologies available at each tier so that the restrictions of a tier are adhered to.

3. [http://research.microsoft.com/en-us/people/asellen/pap0209-chetty.pdf Who’s Hogging The Bandwidth?: The Consequences Of Revealing The Invisible In The Home]
*This article is from Micrsoft reasearch and it is an interesting look into controlling bandwidth usage by providing people with a tool to monitor the usage and alter how bandwidth is allocated. This tool essentially boils down to the social control idea that we discussed in class. If you know that your neighbours are hogging the bandwidth for very low priority issues then should you not be able to appeal to their conscience in order to gain usage of resources you need? The article provides some examples of homes they provided this control to and how the household politcs factored into the usage of the bandwidth. When usage was no longer hidden it seems as though it became easier to openly discuss how to divide the finite amount of bandwidth. Initial concerns revolved around people just hogging the bandwidth for themselves or playing practical jokes on others in the house by reducing their usage when they were in the middle of some task. Another issue that this type of control brings up is how to prioritize what tasks are "more important". One example given was if a Skype call to family and friends is more important than watching YouTube videos for a work related task. Interestingly the field studies provided some other examples of a "bandwidth etiqutte" that emerged. For example, it was considered very rude to limit somone's bandwidth when he/she was on a Skype call due to the immediate and negative effect but it was deemed acceptable to limit bandwidth during a file transfer as it just meant a few extra minutes for the transfer to complete.

DistOS-2011W Attribution

2011-03-08T16:41:20Z

Omi: /* Surveyed Papers */

==Members==
* Abdelrahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

=Meeting Briefings=
==Tuesday, March 1st==
After 20 minutes of brainstorming, we agreed on:
* Current internet infrastructure lacks the ability of achieving highly scalable and efficient attribution mechanism.
* Attribution must be implemented in a distributed manner and must be automated and not owned.
* Threats that should be addressed include (but not limited to):
** Computers, individuals and applications impersonation
** All types of electronic spoofing.
* The skeleton of our project will constitute four main aspects:
** Tracing/Tracking: baseline for attribution.
** Human identification: a MUST to include!
** Machine identification: to be dissolved with human identification.
** Storage: how and where to store data traces and the identification stamps.
==Thursday, March 3rd==
Coming Soon!

=Surveyed Papers=

[1]Marco Gruteser, Suman Banerjee, Marco Gruteser, Vladimir Barik, Wireless device identification with radiometric signatures, University of Wisconsin at Madison, Madison, WI, USA, 2008. [http://portal.acm.org/citation.cfm?id=1409959 PDF]

*ABSTRACT
We design, implement, and evaluate a technique to identify the source network interface card (NIC) of an IEEE 802.11 frame through passive radio-frequency analysis. This technique, called PARADIS, leverages minute imperfections of transmitter hardware that are acquired at manufacture and are present even in otherwise identical NICs. These imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals. In PARADIS, we measure differentiating artifacts of individual wireless frames in the modulation domain, apply suitable machine-learning classification tools to achieve significantly higher degrees of NIC identification accuracy than prior best known schemes.
We experimentally demonstrate effectiveness of PARADIS in differentiating between more than 130 identical 802.11 NICs with accuracy in excess of 99%. Our results also show that the accuracy of PARADIS is resilient against ambient noise and fluctuations of the wireless channel.
Although our implementation deals exclusively with IEEE 802.11, the approach itself is general and will work with any digital modulation scheme.

[2] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, AT&T Labs-Research, Florham Park, NJ, 2004. [http://portal.acm.org/citation.cfm?id=988672.988742 PDF]

*ABSTRACT
The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

=Milestones=
(Under Construction)
* Problem definition
* Literature review
* ??

=Project Progress=
Coming Soon!

==Requirements==
* incremental deployability
* privacy

==Readings==
''really hard to find anything not from psychology''

DistOS-2011W Attribution

2011-03-08T16:38:53Z

Omi: /* Surveyed Papers */

==Members==
* Abdelrahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

=Meeting Briefings=
==Tuesday, March 1st==
After 20 minutes of brainstorming, we agreed on:
* Current internet infrastructure lacks the ability of achieving highly scalable and efficient attribution mechanism.
* Attribution must be implemented in a distributed manner and must be automated and not owned.
* Threats that should be addressed include (but not limited to):
** Computers, individuals and applications impersonation
** All types of electronic spoofing.
* The skeleton of our project will constitute four main aspects:
** Tracing/Tracking: baseline for attribution.
** Human identification: a MUST to include!
** Machine identification: to be dissolved with human identification.
** Storage: how and where to store data traces and the identification stamps.
==Thursday, March 3rd==
Coming Soon!

=Surveyed Papers=

[1] Barik,Vladimir, Wireless device identification with radiometric signatures, 2008. [http://portal.acm.org/citation.cfm?id=1409959 PDF]

*ABSTRACT
We design, implement, and evaluate a technique to identify the source network interface card (NIC) of an IEEE 802.11 frame through passive radio-frequency analysis. This technique, called PARADIS, leverages minute imperfections of transmitter hardware that are acquired at manufacture and are present even in otherwise identical NICs. These imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals. In PARADIS, we measure differentiating artifacts of individual wireless frames in the modulation domain, apply suitable machine-learning classification tools to achieve significantly higher degrees of NIC identification accuracy than prior best known schemes.
We experimentally demonstrate effectiveness of PARADIS in differentiating between more than 130 identical 802.11 NICs with accuracy in excess of 99%. Our results also show that the accuracy of PARADIS is resilient against ambient noise and fluctuations of the wireless channel.
Although our implementation deals exclusively with IEEE 802.11, the approach itself is general and will work with any digital modulation scheme.

[2] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, AT&T Labs-Research, Florham Park, NJ, 2004. [http://portal.acm.org/citation.cfm?id=988672.988742 PDF]

*ABSTRACT
The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

=Milestones=
(Under Construction)
* Problem definition
* Literature review
* ??

=Project Progress=
Coming Soon!

==Requirements==
* incremental deployability
* privacy

==Readings==
''really hard to find anything not from psychology''

DistOS-2011W Attribution

2011-03-08T16:29:03Z

Omi: /* Surveyed Papers */

DistOS-2011W Distributed File System Access

2011-03-01T04:42:13Z

Omi: /* Abstract */

=Abstract=

=Introduction=

=Conclusion=

=References=

DistOS-2011W Distributed File System Access

2011-03-01T04:39:59Z

Omi: Created page with "=Abstract="

=Abstract=

Distributed OS: Winter 2011

2011-03-01T04:36:29Z

Omi: /* Literature review paper (graduate students) */

==Evaluation==

Grades in this class will be determined based on the following criteria.

Undergraduate Students:
* 20% Class participation
* 20% Wiki participation
* 10% Group project oral presentation (April 5th in class)
* 30% Group project written report (Due April 11th)
* 20% Implementation report (Due March 1st)

Graduate Students:
* 15% Class participation
* 20% Wiki participation
* 10% Group project oral presentation (April 5th in class)
* 30% Group project written report (Due April 11th)
* 25% Literature review paper (Due March 1st)

Proposals for Implementation reports & Literature reviews should be emailed to Prof. Somayaji by '''February 1st'''.

===Implementation report (undergrads)===

An implementation report is a 5-10 page paper that either
# describes in detail one existing software system with distributed OS-like properties,
# compare and contrasts an important characteristic of 3 or more software systems with distributed OS-like properties, or
# reports on experiences setting up and using a software system with distributed OS-like properties.
Topics for an implementation report must be approved by Prof. Somayaji.

Implementation reports for Winter 2011:
* [[DistOS-2011W NTP |NTP]]
* [[DistOS-2011W Globus |Globus Toolkit]]
* [[DistOS-2011W Implementation Template|Implementation Template]]
* [[DistOS-2011W BigTable|BigTable]]
* [[DistOS-2011W Cassandra and Hamachi|Cassandra and Hamachi]]
* [[DistOS-2011W Wuala |Wuala]]
* [[DistOS-2011W FWR |FWR]]
* [[DistOS-2011W Plan 9| Plan 9]]
* [[DistOS-2011W Akamai and CDN| Akamai and CDN]]
* [[DistOS-2011W Diaspora| Diaspora]]
* [[DistOS-2011W Eucalyptus |Eucalyptus]]

Students: please add your report above following the template.

===Literature review paper (graduate students)===

The literature review paper should be a 8-12 page paper that reviews research and well-known commercial work in an area of distributed operating systems research or a closely related area.

Literature Review papers for Winter 2011:
* [[DistOS-2011W Naming and Locating Objects in Distributed Systems|Naming and Locating Objects in Distributed Systems]]
* [[DistOS-2011W Distributed File System Access|Distributed File System Access]]
* [[DistOS-2011W User Controlled Bandwidth: How Social Protocols Affect Network Protocols and Our Need for Speed|User Controlled Bandwidth]]
* [[DistOS-2011W General Purpose Frameworks for Performance-Portable Code|General Purpose Frameworks for Performance-Portable Code]]
* [[DistOS-2011W Distributed Data Structures: a survey|Distributed Data Structures: a survey]]
* [[DistOS-2011W Distributed File System Security|Distributed File System Security]]

Students: please add your paper above.

==Readings==

===January 13, 2011===
[http://keys.ccrcentral.net/ccr/writing/ CCR] (two papers)

===January 18, 2011===
[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/oceanstore-sigplan.pdf OceanStore] and [http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/fast2003-pond.pdf Pond]

===February 3, 2011===

*'''[http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=1450841 Robert E. Kahn, "Resource-Sharing Computer Communications Networks" (1972)]:'''
* [http://video.google.com/videoplay?docid=4989933629762859961 Computer Networks - The Heralds of Resource Sharing] (video - optional).

===February 8, 2011===

* Karlin et al. (2008), [http://dx.doi.org.proxy.library.carleton.ca/10.1016/j.comnet.2008.06.012 Autonomous security for autonomous systems].

Optional readings:

* O'Donnell (2009), [http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=5350725 Prolog to A Survey of BGP Security Issues and Solutions]
* Butler et al. (2009), [http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=5357585 A Survey of BGP Security Issues and Solutions]

===February 10, 2011===

* Savage et al. (2000), [http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-8-4.pdf Practical Network Support For IP Traceback].

===February 15, 2011===

* Satyanarayanan et al. (1990), [http://dx.doi.org.proxy.library.carleton.ca/10.1109/12.54838 Coda: a highly available file system for a distributed workstation environment].
* Ghemawat et al. (2003), [http://labs.google.com/papers/gfs.html The Google File System].

===February 17, 2011===

* Weil et al. (2006), [http://www.usenix.org/events/osdi06/tech/weil.html Ceph: A Scalable, High-Performance Distributed File System].

===March 1, 2011===
* Oda et al. (2008), [http://people.scs.carleton.ca/~soma/pubs/oda-ccs-08.pdf SOMA: Mutual Approval for Included Content in Web Pages].
* Oda & Somayaji (2008), [http://people.scs.carleton.ca/~soma/pubs/oda-asia-08.pdf Content Provider Conflict on the Modern Web].

===March 3, 2011===
Authentication
* OpenID
* non-password authentication (OTP, biometrics, graphical pass)

===Problems to Solve===
*Attack computers with almost no consequences
**DDoS
**botnets
**capture and analyze private traffic
**distribute malware
**tampering with traffic
**Unauthorized access to data and resources
**Impersonate computers, individuals, applications
**Fraud, theft
**regulate behavior

===Design Principles===
*subjects of governance: programs and computers
*bind programs and computers to humans & human organizations, but recognize binding is imperfect
*recognize that "bad" behavior is always possible. "good" behavior is enforced through incentives and sanctions.
*rules will change. Even rules for rule changes will change. Need a "living document" governing how rules are chosen and enforced.

==Scenarios==

===1: Stopping DDoS===
Group members: Seyyed, Andrew Schoenrock, Thomas McMahon, Lester Mundt, AbdelRahman, Rakhim Davletkaliyev

*Have the machine routing packets(could be ISP provider) detect suspicious packets, if the packets are signed, then those suspicious packets could be blocked,
the sender could be put on a black list.

* (AS) Stopping DDoS against files, services, programs, etc
** (AS) Have file replication built into the system (similar to OceanStore) so that files are always available from different servers
** (AS) If files are not replicated then we could have a tiered messaging system (at the top level would be OS messages) and servers could then prioritize the incoming traffic. If a given server is experiencing an overload, it could send out a distress signal to its neighbours and then distribute what it is has to them. The system should have a built-in mechanism to re-balance the overall load after something like this happens. This would then mean that any DDoS attack would result in the service being more available.
*** I like this idea of having service fallover
*** Expanding on the idea of file replication and sending distress signals to it's neighbours, I could envision a group of servers that would learn to help each other out. Lending processing and storage when they are under utilized. The would sort of form a collective, club or gang. Members who didn't contribute ( always fully utilized ) would eventually be identified and banned. It would be these other computers that the targeted server would rely on for help in this situation. However cool this is it isn' really a solution because one could suppose the attackers might utilize the same strategy to recruit additional help in there attack.

* (AS) Stopping DDoS against specific machines
** (AS) I don't think that this should be specifically addressed. I think measures introduced to guard against this will ultimately negatively impact the overall system in terms of performance.
*** I don't like the idea of sacrificing the one for the many though.
**** (AS) The main thing with what I've proposed is that the motivation behind doing a DDoS attack is completely gone (by doing one a service would either maintain or increase its overall availability). I think by eliminating the main result of a DDoS attack would mean that there would be no reason to guard against DDoS attacks on a specific machine.

*Stopping DDoS
** Many of the DDoS attacks utilize the property of anonymity. These services serve anyone who requests there service. Many DDoS attacks then ensure sufficient traffic that the computer behind the service can no longer cope. If we remove anonymity and only serve 'known' parties the spurious requests would be ignored. So we need to 'know' who our friends are.
*** This of course requires a form of unspoofable authentication unlike IP.
**** (RD) Serving only 'known' parties reduces the distribution of information, or at least its rate. I was thinking of removing anonymity on a lower level, so that any party that's not anonymous while sending a packet to your machine is considered 'known', and anything unknown (unsigned, unrepresented in some way) is blocked. So, we don't really need to 'know' who our friends are, we just need to know who aren't.
**** (RD) Another thing I had in mind is punishment in case a 'known' party participates in DDoS-attack: not punishing the owner of that machine (who probably is a victim as well), but the software or hardware in some sense.

*Stopping DDoS
** (RD) How about developing such a network topology and protocols that make DDoS attacks less efficient or harder to perform? Some sort of CAPTCHA, but for machines and protocols, to distinguish them from bots, maybe?

*Stopping DDoS
** I'm not sure what it means by stopping, I don't think we can stop DDos given the way things are currently ran, we can only block it. From my knowledge most softwares that stop DDoS do so by blocking, or even complete shut down like Mccolo.

*Stopping DDos
**One method is to use the same way of eliminating DoS by rejecting a specific rate of subsequent requests but from irrelevant sources.

*How we could stop DDoS would be to have each connection to the internet assigned to a particular identity. This identity would be used to verify who is attempting connections. The reason DDoS works is because currently, IP addresses can be spoofed. The only way to verify an identity is to request a response, but by then the damage is done. With a verified identity, connection attempts being routed can be verified during transmission, so that the request may not necessarily even reach the destination host.

Basically, we need some encryption system using keys so that as the packets are being routed, the identity of the packet's sender can be verified. Ideally the decryption would be trivial so as to prevent noticeable latency. Because an identity is verified, if there is spoofing of packets, they would be dropped during the routing. If all the identities are verified and are still attempting a DDoS attack, the attacker's identity will be traced back to the attacker.

(RD) (I think we're not looking low enough. We're trying to find a solution for this problem assuming the system that made that problem possible is still unchanged. We enforce more security by identification, encryption, etc, but the system is still problem-prone. This will allow to identify an attacker, but after the attack was started (or even finished). It's like trying to eliminate theft from a society of poor, unemployed, uneducated people by enforcing more security and punishment. Which will help to reduce the rate and motivation, but can't stop the possible attack. It is pretty stupid analogy, but rather than policing that society, I want to make them rich, employed and educated, so that thefts are just not efficient way of getting goods for them. So, rather than protecting machines from attacks, I want to make the system where DDoS-attacks are just inappropriate.)

===2: Stopping phishing===
Group members: Waheed Ahmed, Nicolas Lessard, Raghad Al-Awwad, Tarjit Komal

* A way of automatically checking the signature of a message to make sure it really is from a trusted source.
** ie: "Nation of Banks, did your member TD send me a message to reset my password?"

*There should be filters to ensure where the message is coming from.If the message is coming from unknown source , it should be blocked.
*Don't use the links in an email to get to any web page, if you suspect the message might not be authentic.
*Avoid filling out forms in email messages that ask for personal financial information. Phishers can make exact forms which you can find on financial institution.
*Make is so a machine needs to be authorized to use your information -- A machine that you don't own can't use your information to do anything, regardless of whether he has it or not.
*Ensure that any website that requires the filling of personal information be a secure website which can be traced to the original organisation.
*Ensure that whatever browser you are using is up to date with the most recent security patches applied.
*Obviously, report and suspected phishing to the appropriate authorities so that proper action can be taken
*"three strikes and you're out"
**Each machine is responsible for the massages it releases. When a machine is a repeat offender it loses access privileges
*Revamp the security login process to something similar to:
**User enters username and clicks next.
**Server returns a user predefined image to the User.
**If image is the right image then user enters password to logon.

===3: Limiting the spread of malware===
Group members: keith, Andrew Luczak, David Barrera, Trevor Gelowsky, Scott Lyons
*(KM) Heterogenous systems - it is much easier to write code to attack a single type of system
*(KM) Individualized security policies
**(AL) A baseline security level would help prevent malware spreading to/from a system with "individual non-security"
*(KM) Identify all programs through digital signatures
*(KM) Peer rating system for programs, customize security policies based on peer ratings
**(SL) Need some way to keep rating system from being "gamed"
***(AL) Maybe a program gets flagged if it experiences a rapid approval increase?
**(AL) Need to protect against benign programs with good ratings being updated into malware
*(KM) System level forensics on program execution and resource/file modification
*(KM) Customizable user and program blacklists
*(SL) Sandboxing with breach management - know what files have been modified by a process
*(SL) Trending - what does the application spend most of its time doing?

*(DB)Multiple control/chokepoints where malware is looked for. This way, it's more difficult for attackers to take over several control points and for malware to remain unnoticed.
*(DB)Heterogeneous systems help limit the spread of malware too. There's 2 points here. (1) If we're designing this system where we're all masters of our own domains, then we're likely to have different system configurations. However (2), if we want to communicate and interact with other domains, we need some standardized communication layer or mechanism. Standardization is very closely tied to homogeneous.
*(DB)There should be consequences if you harbor malware or if malware originates from within your domain. This could be and incentive to help people be more proactive in terms of security.

===4: Bandwidth hogs===
Group members: Mike Preston, Fahim Rahman, Michael Du Plessis, Matthew Chou, Ahmad Yafawi

*limit bandwidth for each user
*if user has significant bandwidth demands for a certain period of time
**add them to a watch list
**monitor their behaviour
**divert communication to other hosts that can satisfy requests.
***if there are no other hosts that can satisfy the request, then distribute data to other idle and capable hosts. Load is now reduced on the one link.
*QoS
*Tiered Bandwidth Distribution
**The main idea is you get more bandwidth to your machine as much as you give back to the community.
***It's similar to some trackers and dark net programs in which they wont increase your download speed unless you contribute X amount of Bytes back to your peers.
**Tier 1, Basic privileges i.e. all machines have minimal bandwidth.
**Tier n, we define some requirements to be met then we increase bandwidth accordingly.
***Drop a Tier if machine doesn't maintain the specified requirements of that specific tier.
***Advantage, monitoring bandwidth on the network is cheap while implementing what is stated above is not.
*As a metaphor to our "real world society", bandwidth control can be treated as we do speed for cars.
**Certain areas need more free flowing traffic, so speed limits are increased. Others require a slower pace which is enforced. These "areas" can be translated to users or programs in our distributed OS model
**There are repercussions to breaking any of these imposed limits
**Throttling provides once possible implementation of these constraints

====Bandwidth Hog Additional Sources and Information====
1. [http://repository.lib.ncsu.edu/ir/bitstream/1840.16/1197/1/etd.pdf A Solution to Bandwidth Hogs in a Cable Network]
*Starting at page 120 of this thesis is a proposed solution to bandwidth hogs on a cable network. In general, the proposal suggests a solution essentially equal to throttling however I did find the description of the solution to be helpful. I feel it may go well with our tiered suggestion if we were to keep the "earned trust" approach to bandwidth access but at the same time allow users in low congestion times to go above their tier. For example, if congestion is low, why not allow the people on the network to occupy much larger bandwidths. On the network include some form of monitoring protocol which can decide how much access a user is allowed. If more bandiwdth is available, let them have it if it is needed for their request. On the other hand, if congestion is high, the user will be capped at the upper limit of their bandwidth capacity if they are doing something that requires a large amount of bandwidth. In this manner each user will be guaranteed the amount they have earned at their tier, however if they do not want to earn a higher level for high usage timeframes they can instead opt to make use of low congestion timeframes and run their bandwidth heavy applications at that time. The network could also include live data regarding the current bandwidth usage levels as well as trending data so that people can plan when to start bandwidth heavy applications.

2. [http://yuba.stanford.edu/rcp/flowCompTime-dukkipati.pdf Why Flow-Completion Time is the Right Metric for Congestion Control]
*This is a short article which raises an interesting question related to our topic, how should we determine what is considered "bandwidth hogging". For example, do we look at the strain on the network in some capacity (i.e. dropped packets, usage level of the capacity of the pipe,etc.) which is important information for those who build the network; or do we make use of the time it takes for some transaction to occur when a user requests it? This article argues that from a user's point of view, they do not care how much bandwidth they get as long as the task they are requesting is completed as quickly as possible. In our discussion in class we had talked about how majority of people currently do not require large bandwidth needs for normal transactions ( email, web searching, wikis ;-) ), and a much smaller percentage of the population are the ones who actually eat up the larger bandwidth through hog-like applications. Maybe instead of focusing on the bandwidth as the main issue, we should think about how long it takes to complete tasks. Maybe our tiered system would also incorporate some aspect of this train of thought, i.e. people who only send email and surf the web are at tier one, people who use online storage and FTP are on level 2, people who stream movies and other data are at level 3, etc. Then, we could have each tier cost a separate amount and apply some form of control on the technologies available at each tier so that the restrictions of a tier are adhered to.

3. [http://research.microsoft.com/en-us/people/asellen/pap0209-chetty.pdf Who’s Hogging The Bandwidth?: The Consequences Of Revealing The Invisible In The Home]
*This article is from Micrsoft reasearch and it is an interesting look into controlling bandwidth usage by providing people with a tool to monitor the usage and alter how bandwidth is allocated. This tool essentially boils down to the social control idea that we discussed in class. If you know that your neighbours are hogging the bandwidth for very low priority issues then should you not be able to appeal to their conscience in order to gain usage of resources you need? The article provides some examples of homes they provided this control to and how the household politcs factored into the usage of the bandwidth. When usage was no longer hidden it seems as though it became easier to openly discuss how to divide the finite amount of bandwidth. Initial concerns revolved around people just hogging the bandwidth for themselves or playing practical jokes on others in the house by reducing their usage when they were in the middle of some task. Another issue that this type of control brings up is how to prioritize what tasks are "more important". One example given was if a Skype call to family and friends is more important than watching YouTube videos for a work related task. Interestingly the field studies provided some other examples of a "bandwidth etiqutte" that emerged. For example, it was considered very rude to limit somone's bandwidth when he/she was on a Skype call due to the immediate and negative effect but it was deemed acceptable to limit bandwidth during a file transfer as it just meant a few extra minutes for the transfer to complete.