Soma-notes - User contributions [en]

Internet Attribution: Between Privacy and Cruciality

2011-04-11T21:06:01Z

Raghad: /* Practice */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who is using the network. And in case a crime is committed, and the agent of some act need to be determined, then the recorded ID will be searched for in the police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T21:02:30Z

Raghad: /* Practice */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of people into traceability database. It makes much more sense to assign unique IDs to everyone who are using the network. And in case a crime is committed, and the need to identify the agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:51:20Z

Raghad: /* Deployment */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to discuss the design of the system, than it is to implement the design. The deployment of they system does not need to be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely, the underlying network should still remain functional, even if the attribution system goes down. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:37:07Z

Raghad: /* Cookies */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:35:35Z

Raghad: /* Background */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:35:01Z

Raghad: /* Scope */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:33:11Z

Raghad: /* Problem Motivation */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:32:08Z

Raghad: /* Problem Motivation */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In today's world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:20:43Z

Raghad: /* The attribution dilemma */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

Internet Attribution: Between Privacy and Cruciality

2011-04-11T20:15:21Z

Raghad: /* The attribution dilemma */

Abstract 
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

A link to the paper

2011-04-11T17:36:37Z

Raghad: /* Why do we need Attribution */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have a social insurance number, he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it the act will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the costs are to deploy that system it should still be less than the costs of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

A link to the paper

2011-04-11T17:36:13Z

Raghad: /* Why is it difficult to achieve attribution? */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.
Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have a social insurance number, he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it the act will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the costs are to deploy that system it should still be less than the costs of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

A link to the paper

2011-04-11T17:34:07Z

Raghad: /* Why is it difficult to achieve attribution? */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.
Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have a social insurance number, he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it the act will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the costs are to deploy that system it should still be less than the costs of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

A link to the paper

2011-04-11T17:33:31Z

Raghad: /* Why is it difficult to achieve attribution? */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.
Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.
The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.
In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used.
There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.
In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have a social insurance number, he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it the act will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the costs are to deploy that system it should still be less than the costs of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

A link to the paper

2011-04-11T17:32:35Z

Raghad: /* Why do we need Attribution */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.
Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have a social insurance number, he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it the act will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the costs are to deploy that system it should still be less than the costs of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

A link to the paper

2011-04-11T17:32:04Z

Raghad: /* The attribution dilemma */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:
* Either having a false unique chip identifier that refers to an imaginary Md.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==

The proposed framework enjoys the following advantages:
* It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
* It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
* Attribution information is not publicly available to everyone, only available to trustful entities.
** Hence, it retains personal privacy.
* The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
* The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:
* The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
* The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
* Since attribution is not public to everyone, custom content generation cannot be achievable.
* Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.

The proposed framework is vulnerable to:
* Botnets
** The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
** Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
* A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.
* For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

==Discussion==
The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have a social insurance number, he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it the act will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the costs are to deploy that system it should still be less than the costs of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

=Conclusion=
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

=References=
<references/>

[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

A link to the paper

2011-04-03T23:10:52Z

Raghad: /* IP Addresses */

A link to the paper

2011-04-03T23:10:05Z

Raghad: /* IP Addresses */

A link to the paper

2011-04-03T22:59:38Z

Raghad: /* IP Addressing */

A link to the paper

2011-04-03T22:59:03Z

Raghad: /* Cookies */

A link to the paper

2011-04-03T22:56:43Z

Raghad: /* Cookies */

A link to the paper

2011-04-03T22:52:05Z

Raghad: /* Cookies */

A link to the paper

2011-04-03T22:51:08Z

Raghad: /* Cookies */

A link to the paper

2011-04-03T22:28:29Z

Raghad: /* Introduction */

A link to the paper

2011-03-22T16:46:39Z

Raghad: /* Definition */

A link to the paper

2011-03-21T06:49:13Z

Raghad: /* Raghad */

A link to the paper

2011-03-21T06:36:49Z

Raghad: /* The attribution dilemma */

A link to the paper

2011-03-21T03:57:43Z

Raghad: /* Definition */

A link to the paper

2011-03-17T18:23:19Z

Raghad: /* Raghad */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=
===Definition===
Binding and act to an agent (person or device)

=The attribution dilemma=
While designing an attribution system one needs to consider balancing between attribution and privacy.

==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

===Omi===
===Raghad===
The issue of lack of attribution on the web mostly arises whenever security is compromised. When your bombarded with spam, or when a system under a DoS attack attribution becomes a more appealing notion.

===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
For identifying persons/devices when any of these attacks are detected:
* DoS and DDos
* Forgery and theft
* Sniffing private traffic
* Distributing illegal content
* Sending spam
* Illegal/undesired intrusion
For marketing purposes (privacy?)
* custom (client-based) content generation

==Attacks to prevent correct attribution of actions ==
* Stepping stone attack
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)
* Attribution system should be adoptable to different set of rules and principles (laws of countries, organizations' policies, etc), yet remain universal

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

A link to the paper

2011-03-17T18:15:39Z

Raghad: /* Raghad */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=
===Definition===
Binding and act to an agent (person or device)

=The attribution dilemma=
While designing an attribution system one needs to consider balancing between attribution and privacy.

==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

===Omi===
===Raghad===
The issue of lack of attribution on the web mostly arises whenever security is compromised.

===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
For identifying persons/devices when any of these attacks are detected:
* DoS and DDos
* Forgery and theft
* Sniffing private traffic
* Distributing illegal content
* Sending spam

For marketing purposes (privacy?)

==Attacks to prevent correct attribution of actions ==
* Stepping stone attack
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)
* Attribution system should be adoptable to different set of rules and principles (laws of countries, organizations' policies, etc), yet remain universal

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

A link to the paper

2011-03-15T18:01:01Z

Raghad:

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=

=The attribution dilemma=
==What is the attribution problem==
==Why we need Attribution==
*DoS
==Attribution Attacks==
* Stepping stone attack
* Forgery
** Identity theft

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

DistOS-2011W Attribution

2011-03-15T17:38:24Z

Raghad: /* Surveyed Papers */

==Members==
* AbdelRahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

=Meeting Briefings=
==Tuesday, March 1st==
After 20 minutes of brainstorming, we agreed on:
* Current internet infrastructure lacks the ability of achieving highly scalable and efficient attribution mechanism.
* Attribution must be implemented in a distributed manner and must be automated and not owned.
* Threats that should be addressed include (but not limited to):
** Computers, individuals and applications impersonation
** All types of electronic spoofing.
* The skeleton of our project will constitute four main aspects:
** Tracing/Tracking: baseline for attribution.
** Human identification: a MUST to include!
** Machine identification: to be dissolved with human identification.
** Storage: how and where to store data traces and the identification stamps.
==Thursday, March 3rd==
Decided Task Distribution:
* Tracing/Tracking: Omi
* Human identification: Raghad
* Machine identification: AbdelRahman
* Storage: Rakhim
==Thursday, March 10th==
Basic Proposal: 
Upon questioning the capabilities of the currently deployed global network, it was agreed that it lacks the ability of achieving a relatively high attribution property. By "relatively", we mean in comparison to the "world's" attribution standards (i.e., the percentage of success in binding an act to a person in the real world). Moreover, any system (h/w or s/w) that is to operate at the end systems is useless because it can be messed with.
As a result, a proposed model was basically discussed. It employs the rule: 
"An act cannot use network resources nor can it be routed if it is anonymously bound." 
Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.
The proposed system requires the following:
# Globally trustful entity(s) (e.g., government)
# Any newly bought (or even handmade/privately manufactured) device that has access capabilities must be licensed from the trustful entity (defined in 1), or else, it will not be able to benefit from global routing services.
# The licensing mechanism occurs by binding a human's unique feature (e.g., iris intricate structure) with a machine unique feature (e.g., MAC address) generating a chunk called identification stamp. (The inclusion of the passport number in the identification stamps is still under investigation for the sake of tracking the punishing the prime committer).
# A DNS-like world-wide distributed system is to be encrypted and deployed that acts as a database for storing all identification stamps. The system can ONLY be accessible for READ operations by the routers, and can ONLY be accessible for WRITE operations by the trustful entity(s) defined in 1.
# Within the frame format of the IP protocol, a header is to be added including the identification stamp of the packet owner.
# Attribution mapping should not be bijection, in other words action should map to persons, but not vice versa.
Upon achieving these requirements, the mentioned rule will apply. When a router receives a packet, it should first consult the global database for verifying the identification stamp of the packet. If it was not verified, the router drops the packet.

As can be noticed the proposed system still lacks lots of definitions in its functionality. For example, it can't prevent the creation of botnets, forgery and other similar attacks. In principle, a web server provides a service on behalf of someone, should web servers have permanent identification stamps (as a replacement of certificates)? In addition, factors like router latencies, DB protection, who to elect as global trustful entity still needs to be addressed.

To be done: 
* Strictly define the requirements of a good attribution system.
* Analyzing what the currently implemented attribution systems lack.
* (optional) Proposing a model that arguably employs attribution.

Attribution Definition: 
"Binding an act to a person" - Prof. Anil

==Tuesday, March 15th==

=Surveyed Papers=

[1]Marco Gruteser, Suman Banerjee, Marco Gruteser, Vladimir Barik, Wireless device identification with radiometric signatures, University of Wisconsin at Madison, Madison, WI, USA, 2008. [http://portal.acm.org/citation.cfm?id=1409959 PDF]

*ABSTRACT
We design, implement, and evaluate a technique to identify the source network interface card (NIC) of an IEEE 802.11 frame through passive radio-frequency analysis. This technique, called PARADIS, leverages minute imperfections of transmitter hardware that are acquired at manufacture and are present even in otherwise identical NICs. These imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals. In PARADIS, we measure differentiating artifacts of individual wireless frames in the modulation domain, apply suitable machine-learning classification tools to achieve significantly higher degrees of NIC identification accuracy than prior best known schemes.
We experimentally demonstrate effectiveness of PARADIS in differentiating between more than 130 identical 802.11 NICs with accuracy in excess of 99%. Our results also show that the accuracy of PARADIS is resilient against ambient noise and fluctuations of the wireless channel.
Although our implementation deals exclusively with IEEE 802.11, the approach itself is general and will work with any digital modulation scheme.

[2] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, AT&T Labs-Research, Florham Park, NJ, 2004. [http://portal.acm.org/citation.cfm?id=988672.988742 PDF]

*ABSTRACT
The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

[3] Roger Clarke, Human Identification in Information Systems: Management Challenges and Public Policy Issues [http://www.emeraldinsight.com/journals.htm?articleid=883434&show=abstract PDF/HTML]

*ABSTRACT
Many information systems involve data about people. In order reliably to associate data with particular individuals, it is necessary that an effective and efficient identification scheme be established and maintained. There is remarkably little in the information technology literature concerning human identification. Seeks to overcome that deficiency by undertaking a survey of human identity and human identification. Discusses techniques including names, codes, knowledge-based and token-based identification, and biometrics. Identifies the key challenge to management as being to devise a scheme which is practicable and economic, and of sufficiently high integrity to address the risks the organization confronts in its dealings with people. Proposes that much greater use be made of schemes which are designed to afford people anonymity, or which enable them to use multiple identities or pseudonyms, while at the same time protecting the organization's own interest. Describes multi-purpose and inhabitant registration schemes, and notes the recurrence of proposals to implement and extend them. Identifies public policy issues. Of especial concern is the threat to personal privacy that the general-purpose use of an inhabitant registrant scheme represents. Speculates that, where such schemes are pursued energetically, the reaction may be strong enough to threaten the social fabric.

[4]Matt Bishop, Carrie Gates and Jerrey Hunker The Sisterhood of the Traveling Packets [http://jeffreyhunker.com/gallery/20/nspw09-1.pdf PDF]
*ABSTRACT
From a cyber-security perspective, attribution is considered to be
the ability to determine the originating location for an attack.
However, should such an attribution system be developed and
deployed, it would provide attribution for all traffic, not just attack
traffic. This has several implications for both the senders and
receivers of traffic, as well as the intervening organizations,
Internet service providers and nation-states. In this paper we
examine the requirements for an attribution system, identifying all
of the actors, their potential interests, and the resulting policies
they might therefore have. We provide a general framework that
represents the attribution problem, and outline the technical and
policy requirements for a solution. We discuss the inevitable
policy conflicts due to the social, legal and cultural issues that
would surround such a system.

=Milestones=
* Problem definition
* Literature review
* Comparison of literature
* Requirements for a proper attribution scheme
* Discussions
* Conclusion and Future Work

=Paper=
[[A link to the paper]]

=Project Progress=
Coming Soon!

=Requirements=
* incremental deployability
* privacy

=Readings=
''really hard to find anything not from psychology''

DistOS-2011W Attribution

2011-03-15T17:37:15Z

Raghad: /* Surveyed Papers */

==Members==
* AbdelRahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

=Meeting Briefings=
==Tuesday, March 1st==
After 20 minutes of brainstorming, we agreed on:
* Current internet infrastructure lacks the ability of achieving highly scalable and efficient attribution mechanism.
* Attribution must be implemented in a distributed manner and must be automated and not owned.
* Threats that should be addressed include (but not limited to):
** Computers, individuals and applications impersonation
** All types of electronic spoofing.
* The skeleton of our project will constitute four main aspects:
** Tracing/Tracking: baseline for attribution.
** Human identification: a MUST to include!
** Machine identification: to be dissolved with human identification.
** Storage: how and where to store data traces and the identification stamps.
==Thursday, March 3rd==
Decided Task Distribution:
* Tracing/Tracking: Omi
* Human identification: Raghad
* Machine identification: AbdelRahman
* Storage: Rakhim
==Thursday, March 10th==
Basic Proposal: 
Upon questioning the capabilities of the currently deployed global network, it was agreed that it lacks the ability of achieving a relatively high attribution property. By "relatively", we mean in comparison to the "world's" attribution standards (i.e., the percentage of success in binding an act to a person in the real world). Moreover, any system (h/w or s/w) that is to operate at the end systems is useless because it can be messed with.
As a result, a proposed model was basically discussed. It employs the rule: 
"An act cannot use network resources nor can it be routed if it is anonymously bound." 
Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.
The proposed system requires the following:
# Globally trustful entity(s) (e.g., government)
# Any newly bought (or even handmade/privately manufactured) device that has access capabilities must be licensed from the trustful entity (defined in 1), or else, it will not be able to benefit from global routing services.
# The licensing mechanism occurs by binding a human's unique feature (e.g., iris intricate structure) with a machine unique feature (e.g., MAC address) generating a chunk called identification stamp. (The inclusion of the passport number in the identification stamps is still under investigation for the sake of tracking the punishing the prime committer).
# A DNS-like world-wide distributed system is to be encrypted and deployed that acts as a database for storing all identification stamps. The system can ONLY be accessible for READ operations by the routers, and can ONLY be accessible for WRITE operations by the trustful entity(s) defined in 1.
# Within the frame format of the IP protocol, a header is to be added including the identification stamp of the packet owner.
# Attribution mapping should not be bijection, in other words action should map to persons, but not vice versa.
Upon achieving these requirements, the mentioned rule will apply. When a router receives a packet, it should first consult the global database for verifying the identification stamp of the packet. If it was not verified, the router drops the packet.

As can be noticed the proposed system still lacks lots of definitions in its functionality. For example, it can't prevent the creation of botnets, forgery and other similar attacks. In principle, a web server provides a service on behalf of someone, should web servers have permanent identification stamps (as a replacement of certificates)? In addition, factors like router latencies, DB protection, who to elect as global trustful entity still needs to be addressed.

To be done: 
* Strictly define the requirements of a good attribution system.
* Analyzing what the currently implemented attribution systems lack.
* (optional) Proposing a model that arguably employs attribution.

Attribution Definition: 
"Binding an act to a person" - Prof. Anil

==Tuesday, March 15th==

=Surveyed Papers=

[1]Marco Gruteser, Suman Banerjee, Marco Gruteser, Vladimir Barik, Wireless device identification with radiometric signatures, University of Wisconsin at Madison, Madison, WI, USA, 2008. [http://portal.acm.org/citation.cfm?id=1409959 PDF]

*ABSTRACT
We design, implement, and evaluate a technique to identify the source network interface card (NIC) of an IEEE 802.11 frame through passive radio-frequency analysis. This technique, called PARADIS, leverages minute imperfections of transmitter hardware that are acquired at manufacture and are present even in otherwise identical NICs. These imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals. In PARADIS, we measure differentiating artifacts of individual wireless frames in the modulation domain, apply suitable machine-learning classification tools to achieve significantly higher degrees of NIC identification accuracy than prior best known schemes.
We experimentally demonstrate effectiveness of PARADIS in differentiating between more than 130 identical 802.11 NICs with accuracy in excess of 99%. Our results also show that the accuracy of PARADIS is resilient against ambient noise and fluctuations of the wireless channel.
Although our implementation deals exclusively with IEEE 802.11, the approach itself is general and will work with any digital modulation scheme.

[2] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, AT&T Labs-Research, Florham Park, NJ, 2004. [http://portal.acm.org/citation.cfm?id=988672.988742 PDF]

*ABSTRACT
The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

[3] Roger Clarke, Human Identification in Information Systems: Management Challenges and Public Policy Issues [http://www.emeraldinsight.com/journals.htm?articleid=883434&show=abstract PDF/HTML]

*ABSTRACT
Many information systems involve data about people. In order reliably to associate data with particular individuals, it is necessary that an effective and efficient identification scheme be established and maintained. There is remarkably little in the information technology literature concerning human identification. Seeks to overcome that deficiency by undertaking a survey of human identity and human identification. Discusses techniques including names, codes, knowledge-based and token-based identification, and biometrics. Identifies the key challenge to management as being to devise a scheme which is practicable and economic, and of sufficiently high integrity to address the risks the organization confronts in its dealings with people. Proposes that much greater use be made of schemes which are designed to afford people anonymity, or which enable them to use multiple identities or pseudonyms, while at the same time protecting the organization's own interest. Describes multi-purpose and inhabitant registration schemes, and notes the recurrence of proposals to implement and extend them. Identifies public policy issues. Of especial concern is the threat to personal privacy that the general-purpose use of an inhabitant registrant scheme represents. Speculates that, where such schemes are pursued energetically, the reaction may be strong enough to threaten the social fabric.

[4]Matt Bishop, Carrie Gates and Jerrey Hunker The Sisterhood of the Traveling Packets [http://jeffreyhunker.com/gallery/20/nspw09-1.pdf PDF]
*ABSTRACT
From a cyber-security perspective, attribution is considered to be
the ability to determine the originating location for an attack.
However, should such an attribution system be developed and
deployed, it would provide attribution for all traffic, not just attack
traffic. This has several implications for both the senders and
receivers of traffic, as well as the intervening organizations,
Internet service providers and nation-states. In this paper we
examine the requirements for an attribution system, identifying all
of the actors, their potential interests, and the resulting policies
they might therefore have. We provide a general framework that
represents the attribution problem, and outline the technical and
policy requirements for a solution. We discuss the inevitable
policy conflicts due to the social, legal and cultural issues that
would surround such a system.

=Milestones=
* Problem definition
* Literature review
* Comparison of literature
* Requirements for a proper attribution scheme
* Discussions
* Conclusion and Future Work

=Paper=
[[A link to the paper]]

=Project Progress=
Coming Soon!

=Requirements=
* incremental deployability
* privacy

=Readings=
''really hard to find anything not from psychology''

DistOS-2011W Real-Time Distributed Operating Systems

2011-03-01T23:49:09Z

Raghad: Created page with "Coming Soon"

Coming Soon

Distributed OS: Winter 2011

2011-03-01T23:48:35Z

Raghad: /* Literature review paper (graduate students) */

==Evaluation==

Grades in this class will be determined based on the following criteria.

Undergraduate Students:
* 20% Class participation
* 20% Wiki participation
* 10% Group project oral presentation (April 5th in class)
* 30% Group project written report (Due April 11th)
* 20% Implementation report (Due March 1st)

Graduate Students:
* 15% Class participation
* 20% Wiki participation
* 10% Group project oral presentation (April 5th in class)
* 30% Group project written report (Due April 11th)
* 25% Literature review paper (Due March 1st)

Proposals for Implementation reports & Literature reviews should be emailed to Prof. Somayaji by '''February 1st'''.

==Implementation reports (undergrads)==

An implementation report is a 5-10 page paper that either
# describes in detail one existing software system with distributed OS-like properties,
# compare and contrasts an important characteristic of 3 or more software systems with distributed OS-like properties, or
# reports on experiences setting up and using a software system with distributed OS-like properties.
Topics for an implementation report must be approved by Prof. Somayaji.

Implementation reports for Winter 2011:
* [[DistOS-2011W NTP |NTP]]
* [[DistOS-2011W Globus |Globus Toolkit]]
* [[DistOS-2011W Implementation Template|Implementation Template]]
* [[DistOS-2011W BigTable|BigTable]]
* [[DistOS-2011W Cassandra and Hamachi|Cassandra and Hamachi]]
* [[DistOS-2011W Wuala |Wuala]]
* [[DistOS-2011W FWR |FWR]]
* [[DistOS-2011W Plan 9| Plan 9]]
* [[DistOS-2011W Akamai and CDN| Akamai and CDN]]
* [[DistOS-2011W Diaspora| Diaspora]]
* [[DistOS-2011W Eucalyptus |Eucalyptus]]
* [[DistOS-2011W Jolicloud |Jolicloud]]

Students: please add your report above following the template.

==Literature review paper (graduate students)==

The literature review paper should be a 8-12 page paper that reviews research and well-known commercial work in an area of distributed operating systems research or a closely related area.

Literature Review papers for Winter 2011:
* [[DistOS-2011W Naming and Locating Objects in Distributed Systems|Naming and Locating Objects in Distributed Systems]]
* [[DistOS-2011W Distributed File System Access|Distributed File System Access]]
* [[DistOS-2011W User Controlled Bandwidth: How Social Protocols Affect Network Protocols and Our Need for Speed|User Controlled Bandwidth]]
* [[DistOS-2011W General Purpose Frameworks for Performance-Portable Code|General Purpose Frameworks for Performance-Portable Code]]
* [[DistOS-2011W Distributed Data Structures: a survey|Distributed Data Structures: a survey]]
* [[DistOS-2011W Distributed File System Security|Distributed File System Security]]
* [[DistOS-2011W Real-Time Distributed Operating Systems|Real-Time Distributed Operating Systems]]
Students: please add your paper above.

==Group Projects==
# [[DistOS-2011W Observability & Contracts|Observability & Contracts]]: How do I observe the acts of other agents, particularly "public" acts? How can make contracts between computers (promises to exchange actions in present for actions in the future)?
# [[DistOS-2011W Attribution|Attribution]]: How do we know who did what?
# [[DistOS-2011W Reputation|Reputation]]: How do we remember and disseminate knowledge of past actions?
# [[DistOS-2011W Justice|Justice]]: Given that we can gather evidence of misbehavior, how can that evidence be assembled, judged, and the resulting decision enforced?
# [[DistOS-2011W Public Goods|Public Goods]]: How can we build and maintain public goods (e.g., indices, caches)?

==Readings==

===January 13, 2011===
[http://keys.ccrcentral.net/ccr/writing/ CCR] (two papers)

===January 18, 2011===
[http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/oceanstore-sigplan.pdf OceanStore] and [http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/fast2003-pond.pdf Pond]

===February 3, 2011===

*'''[http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=1450841 Robert E. Kahn, "Resource-Sharing Computer Communications Networks" (1972)]:'''
* [http://video.google.com/videoplay?docid=4989933629762859961 Computer Networks - The Heralds of Resource Sharing] (video - optional).

===February 8, 2011===

* Karlin et al. (2008), [http://dx.doi.org.proxy.library.carleton.ca/10.1016/j.comnet.2008.06.012 Autonomous security for autonomous systems].

Optional readings:

* O'Donnell (2009), [http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=5350725 Prolog to A Survey of BGP Security Issues and Solutions]
* Butler et al. (2009), [http://ieeexplore.ieee.org.proxy.library.carleton.ca/xpls/abs_all.jsp?arnumber=5357585 A Survey of BGP Security Issues and Solutions]

===February 10, 2011===

* Savage et al. (2000), [http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-8-4.pdf Practical Network Support For IP Traceback].

===February 15, 2011===

* Satyanarayanan et al. (1990), [http://dx.doi.org.proxy.library.carleton.ca/10.1109/12.54838 Coda: a highly available file system for a distributed workstation environment].
* Ghemawat et al. (2003), [http://labs.google.com/papers/gfs.html The Google File System].

===February 17, 2011===

* Weil et al. (2006), [http://www.usenix.org/events/osdi06/tech/weil.html Ceph: A Scalable, High-Performance Distributed File System].

===March 1, 2011===
* Oda et al. (2008), [http://people.scs.carleton.ca/~soma/pubs/oda-ccs-08.pdf SOMA: Mutual Approval for Included Content in Web Pages].
* Oda & Somayaji (2008), [http://people.scs.carleton.ca/~soma/pubs/oda-asia-08.pdf Content Provider Conflict on the Modern Web].

===Problems to Solve===
*Attack computers with almost no consequences
**DDoS
**botnets
**capture and analyze private traffic
**distribute malware
**tampering with traffic
**Unauthorized access to data and resources
**Impersonate computers, individuals, applications
**Fraud, theft
**regulate behavior

===Design Principles===
*subjects of governance: programs and computers
*bind programs and computers to humans & human organizations, but recognize binding is imperfect
*recognize that "bad" behavior is always possible. "good" behavior is enforced through incentives and sanctions.
*rules will change. Even rules for rule changes will change. Need a "living document" governing how rules are chosen and enforced.

==Scenarios==

===1: Stopping DDoS===
Group members: Seyyed, Andrew Schoenrock, Thomas McMahon, Lester Mundt, AbdelRahman, Rakhim Davletkaliyev

*Have the machine routing packets(could be ISP provider) detect suspicious packets, if the packets are signed, then those suspicious packets could be blocked,
the sender could be put on a black list.

* (AS) Stopping DDoS against files, services, programs, etc
** (AS) Have file replication built into the system (similar to OceanStore) so that files are always available from different servers
** (AS) If files are not replicated then we could have a tiered messaging system (at the top level would be OS messages) and servers could then prioritize the incoming traffic. If a given server is experiencing an overload, it could send out a distress signal to its neighbours and then distribute what it is has to them. The system should have a built-in mechanism to re-balance the overall load after something like this happens. This would then mean that any DDoS attack would result in the service being more available.
*** I like this idea of having service fallover
*** Expanding on the idea of file replication and sending distress signals to it's neighbours, I could envision a group of servers that would learn to help each other out. Lending processing and storage when they are under utilized. The would sort of form a collective, club or gang. Members who didn't contribute ( always fully utilized ) would eventually be identified and banned. It would be these other computers that the targeted server would rely on for help in this situation. However cool this is it isn' really a solution because one could suppose the attackers might utilize the same strategy to recruit additional help in there attack.

* (AS) Stopping DDoS against specific machines
** (AS) I don't think that this should be specifically addressed. I think measures introduced to guard against this will ultimately negatively impact the overall system in terms of performance.
*** I don't like the idea of sacrificing the one for the many though.
**** (AS) The main thing with what I've proposed is that the motivation behind doing a DDoS attack is completely gone (by doing one a service would either maintain or increase its overall availability). I think by eliminating the main result of a DDoS attack would mean that there would be no reason to guard against DDoS attacks on a specific machine.

*Stopping DDoS
** Many of the DDoS attacks utilize the property of anonymity. These services serve anyone who requests there service. Many DDoS attacks then ensure sufficient traffic that the computer behind the service can no longer cope. If we remove anonymity and only serve 'known' parties the spurious requests would be ignored. So we need to 'know' who our friends are.
*** This of course requires a form of unspoofable authentication unlike IP.
**** (RD) Serving only 'known' parties reduces the distribution of information, or at least its rate. I was thinking of removing anonymity on a lower level, so that any party that's not anonymous while sending a packet to your machine is considered 'known', and anything unknown (unsigned, unrepresented in some way) is blocked. So, we don't really need to 'know' who our friends are, we just need to know who aren't.
**** (RD) Another thing I had in mind is punishment in case a 'known' party participates in DDoS-attack: not punishing the owner of that machine (who probably is a victim as well), but the software or hardware in some sense.

*Stopping DDoS
** (RD) How about developing such a network topology and protocols that make DDoS attacks less efficient or harder to perform? Some sort of CAPTCHA, but for machines and protocols, to distinguish them from bots, maybe?

*Stopping DDoS
** I'm not sure what it means by stopping, I don't think we can stop DDos given the way things are currently ran, we can only block it. From my knowledge most softwares that stop DDoS do so by blocking, or even complete shut down like Mccolo.

*Stopping DDos
**One method is to use the same way of eliminating DoS by rejecting a specific rate of subsequent requests but from irrelevant sources.

*How we could stop DDoS would be to have each connection to the internet assigned to a particular identity. This identity would be used to verify who is attempting connections. The reason DDoS works is because currently, IP addresses can be spoofed. The only way to verify an identity is to request a response, but by then the damage is done. With a verified identity, connection attempts being routed can be verified during transmission, so that the request may not necessarily even reach the destination host.

Basically, we need some encryption system using keys so that as the packets are being routed, the identity of the packet's sender can be verified. Ideally the decryption would be trivial so as to prevent noticeable latency. Because an identity is verified, if there is spoofing of packets, they would be dropped during the routing. If all the identities are verified and are still attempting a DDoS attack, the attacker's identity will be traced back to the attacker.

(RD) (I think we're not looking low enough. We're trying to find a solution for this problem assuming the system that made that problem possible is still unchanged. We enforce more security by identification, encryption, etc, but the system is still problem-prone. This will allow to identify an attacker, but after the attack was started (or even finished). It's like trying to eliminate theft from a society of poor, unemployed, uneducated people by enforcing more security and punishment. Which will help to reduce the rate and motivation, but can't stop the possible attack. It is pretty stupid analogy, but rather than policing that society, I want to make them rich, employed and educated, so that thefts are just not efficient way of getting goods for them. So, rather than protecting machines from attacks, I want to make the system where DDoS-attacks are just inappropriate.)

===2: Stopping phishing===
Group members: Waheed Ahmed, Nicolas Lessard, Raghad Al-Awwad, Tarjit Komal

* A way of automatically checking the signature of a message to make sure it really is from a trusted source.
** ie: "Nation of Banks, did your member TD send me a message to reset my password?"

*There should be filters to ensure where the message is coming from.If the message is coming from unknown source , it should be blocked.
*Don't use the links in an email to get to any web page, if you suspect the message might not be authentic.
*Avoid filling out forms in email messages that ask for personal financial information. Phishers can make exact forms which you can find on financial institution.
*Make is so a machine needs to be authorized to use your information -- A machine that you don't own can't use your information to do anything, regardless of whether he has it or not.
*Ensure that any website that requires the filling of personal information be a secure website which can be traced to the original organisation.
*Ensure that whatever browser you are using is up to date with the most recent security patches applied.
*Obviously, report and suspected phishing to the appropriate authorities so that proper action can be taken
*"three strikes and you're out"
**Each machine is responsible for the massages it releases. When a machine is a repeat offender it loses access privileges
*Revamp the security login process to something similar to:
**User enters username and clicks next.
**Server returns a user predefined image to the User.
**If image is the right image then user enters password to logon.

===3: Limiting the spread of malware===
Group members: keith, Andrew Luczak, David Barrera, Trevor Gelowsky, Scott Lyons
*(KM) Heterogenous systems - it is much easier to write code to attack a single type of system
*(KM) Individualized security policies
**(AL) A baseline security level would help prevent malware spreading to/from a system with "individual non-security"
*(KM) Identify all programs through digital signatures
*(KM) Peer rating system for programs, customize security policies based on peer ratings
**(SL) Need some way to keep rating system from being "gamed"
***(AL) Maybe a program gets flagged if it experiences a rapid approval increase?
**(AL) Need to protect against benign programs with good ratings being updated into malware
*(KM) System level forensics on program execution and resource/file modification
*(KM) Customizable user and program blacklists
*(SL) Sandboxing with breach management - know what files have been modified by a process
*(SL) Trending - what does the application spend most of its time doing?

*(DB)Multiple control/chokepoints where malware is looked for. This way, it's more difficult for attackers to take over several control points and for malware to remain unnoticed.
*(DB)Heterogeneous systems help limit the spread of malware too. There's 2 points here. (1) If we're designing this system where we're all masters of our own domains, then we're likely to have different system configurations. However (2), if we want to communicate and interact with other domains, we need some standardized communication layer or mechanism. Standardization is very closely tied to homogeneous.
*(DB)There should be consequences if you harbor malware or if malware originates from within your domain. This could be and incentive to help people be more proactive in terms of security.

===4: Bandwidth hogs===
Group members: Mike Preston, Fahim Rahman, Michael Du Plessis, Matthew Chou, Ahmad Yafawi

*limit bandwidth for each user
*if user has significant bandwidth demands for a certain period of time
**add them to a watch list
**monitor their behaviour
**divert communication to other hosts that can satisfy requests.
***if there are no other hosts that can satisfy the request, then distribute data to other idle and capable hosts. Load is now reduced on the one link.
*QoS
*Tiered Bandwidth Distribution
**The main idea is you get more bandwidth to your machine as much as you give back to the community.
***It's similar to some trackers and dark net programs in which they wont increase your download speed unless you contribute X amount of Bytes back to your peers.
**Tier 1, Basic privileges i.e. all machines have minimal bandwidth.
**Tier n, we define some requirements to be met then we increase bandwidth accordingly.
***Drop a Tier if machine doesn't maintain the specified requirements of that specific tier.
***Advantage, monitoring bandwidth on the network is cheap while implementing what is stated above is not.
*As a metaphor to our "real world society", bandwidth control can be treated as we do speed for cars.
**Certain areas need more free flowing traffic, so speed limits are increased. Others require a slower pace which is enforced. These "areas" can be translated to users or programs in our distributed OS model
**There are repercussions to breaking any of these imposed limits
**Throttling provides once possible implementation of these constraints

====Bandwidth Hog Additional Sources and Information====
1. [http://repository.lib.ncsu.edu/ir/bitstream/1840.16/1197/1/etd.pdf A Solution to Bandwidth Hogs in a Cable Network]
*Starting at page 120 of this thesis is a proposed solution to bandwidth hogs on a cable network. In general, the proposal suggests a solution essentially equal to throttling however I did find the description of the solution to be helpful. I feel it may go well with our tiered suggestion if we were to keep the "earned trust" approach to bandwidth access but at the same time allow users in low congestion times to go above their tier. For example, if congestion is low, why not allow the people on the network to occupy much larger bandwidths. On the network include some form of monitoring protocol which can decide how much access a user is allowed. If more bandiwdth is available, let them have it if it is needed for their request. On the other hand, if congestion is high, the user will be capped at the upper limit of their bandwidth capacity if they are doing something that requires a large amount of bandwidth. In this manner each user will be guaranteed the amount they have earned at their tier, however if they do not want to earn a higher level for high usage timeframes they can instead opt to make use of low congestion timeframes and run their bandwidth heavy applications at that time. The network could also include live data regarding the current bandwidth usage levels as well as trending data so that people can plan when to start bandwidth heavy applications.

2. [http://yuba.stanford.edu/rcp/flowCompTime-dukkipati.pdf Why Flow-Completion Time is the Right Metric for Congestion Control]
*This is a short article which raises an interesting question related to our topic, how should we determine what is considered "bandwidth hogging". For example, do we look at the strain on the network in some capacity (i.e. dropped packets, usage level of the capacity of the pipe,etc.) which is important information for those who build the network; or do we make use of the time it takes for some transaction to occur when a user requests it? This article argues that from a user's point of view, they do not care how much bandwidth they get as long as the task they are requesting is completed as quickly as possible. In our discussion in class we had talked about how majority of people currently do not require large bandwidth needs for normal transactions ( email, web searching, wikis ;-) ), and a much smaller percentage of the population are the ones who actually eat up the larger bandwidth through hog-like applications. Maybe instead of focusing on the bandwidth as the main issue, we should think about how long it takes to complete tasks. Maybe our tiered system would also incorporate some aspect of this train of thought, i.e. people who only send email and surf the web are at tier one, people who use online storage and FTP are on level 2, people who stream movies and other data are at level 3, etc. Then, we could have each tier cost a separate amount and apply some form of control on the technologies available at each tier so that the restrictions of a tier are adhered to.

3. [http://research.microsoft.com/en-us/people/asellen/pap0209-chetty.pdf Who’s Hogging The Bandwidth?: The Consequences Of Revealing The Invisible In The Home]
*This article is from Micrsoft reasearch and it is an interesting look into controlling bandwidth usage by providing people with a tool to monitor the usage and alter how bandwidth is allocated. This tool essentially boils down to the social control idea that we discussed in class. If you know that your neighbours are hogging the bandwidth for very low priority issues then should you not be able to appeal to their conscience in order to gain usage of resources you need? The article provides some examples of homes they provided this control to and how the household politcs factored into the usage of the bandwidth. When usage was no longer hidden it seems as though it became easier to openly discuss how to divide the finite amount of bandwidth. Initial concerns revolved around people just hogging the bandwidth for themselves or playing practical jokes on others in the house by reducing their usage when they were in the middle of some task. Another issue that this type of control brings up is how to prioritize what tasks are "more important". One example given was if a Skype call to family and friends is more important than watching YouTube videos for a work related task. Interestingly the field studies provided some other examples of a "bandwidth etiqutte" that emerged. For example, it was considered very rude to limit somone's bandwidth when he/she was on a Skype call due to the immediate and negative effect but it was deemed acceptable to limit bandwidth during a file transfer as it just meant a few extra minutes for the transfer to complete.

Distributed OS: Winter 2011

2011-01-20T17:55:50Z

Raghad: /* 2: Stopping phishing */

==Readings==

January 13, 2011: [http://keys.ccrcentral.net/ccr/writing/ CCR] (two papers)

January 18, 2011: [http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/oceanstore-sigplan.pdf OceanStore] and [http://homeostasis.scs.carleton.ca/~soma/distos/2008-02-25/fast2003-pond.pdf Pond]

==Internet Governance==

===Problems to Solve===
*Attack computers with almost no consequences
**DDoS
**botnets
**capture and analyze private traffic
**distribute malware
**tampering with traffic
**Unauthorized access to data and resources
**Impersonate computers, individuals, applications
**Fraud, theft
**regulate behavior

===Design Principles===
*subjects of governance: programs and computers
*bind programs and computers to humans & human organizations, but recognize binding is imperfect
*recognize that "bad" behavior is always possible. "good" behavior is enforced through incentives and sanctions.
*rules will change. Even rules for rule changes will change. Need a "living document" governing how rules are chosen and enforced.

==Scenarios==

===1: Stopping DDoS===
Group members: Seyyed, Andrew Schoenrock, Thomas McMahon, Lester Mundt, AbdelRahman, Rakhim Davletkaliyev

*Have the machine routing packets(could be ISP provider) detect suspicious packets, if the packets are signed, then those suspicious packets could be blocked,
the sender could be put on a black list.

* (AS) Stopping DDoS against files, services, programs, etc
** (AS) Have file replication built into the system (similar to OceanStore) so that files are always available from different servers
** (AS) If files are not replicated then we could have a tiered messaging system (at the top level would be OS messages) and servers could then prioritize the incoming traffic. If a given server is experiencing an overload, it could send out a distress signal to its neighbours and then distribute what it is has to them. The system should have a built-in mechanism to re-balance the overall load after something like this happens. This would then mean that any DDoS attack would result in the service being more available.
*** I like this idea of having service fallover
*** Expanding on the idea of file replication and sending distress signals to it's neighbours, I could envision a group of servers that would learn to help each other out. Lending processing and storage when they are under utilized. The would sort of form a collective, club or gang. Members who didn't contribute ( always fully utilized ) would eventually be identified and banned. It would be these other computers that the targeted server would rely on for help in this situation. However cool this is it isn' really a solution because one could suppose the attackers might utilize the same strategy to recruit additional help in there attack.

* (AS) Stopping DDoS against specific machines
** (AS) I don't think that this should be specifically addressed. I think measures introduced to guard against this will ultimately negatively impact the overall system in terms of performance.
*** I don't like the idea of sacrificing the one for the many though.
**** (AS) The main thing with what I've proposed is that the motivation behind doing a DDoS attack is completely gone (by doing one a service would either maintain or increase its overall availability). I think by eliminating the main result of a DDoS attack would mean that there would be no reason to guard against DDoS attacks on a specific machine.

*Stopping DDoS
** Many of the DDoS attacks utilize the property of anonymity. These services serve anyone who requests there service. Many DDoS attacks then ensure sufficient traffic that the computer behind the service can no longer cope. If we remove anonymity and only serve 'known' parties the spurious requests would be ignored. So we need to 'know' who our friends are.
*** This of course requires a form of unspoofable authentication unlike IP.
**** Serving only 'known' parties reduces the distribution of information, or at least its rate. I was thinking of removing anonymity on a lower level, so that any party that's not anonymous while sending a packet to your machine is considered 'known', and anything unknown (unsigned, unrepresented in some way) is blocked. So, we don't really need to 'know' who our friends are, we just need to know who aren't.
**** Another thing I had in mind is punishment in case a 'known' party participates in DDoS-attack: not punishing the owner of that machine (who probably is a victim as well), but the software or hardware in some sense.

*Stopping DDoS
**How about developing such a network topology and protocols that make DDoS attacks less efficient or harder to perform? Some sort of CAPTCHA, but for machines and protocols, to distinguish them from bots, maybe?

*Stopping DDoS
** I'm not sure what it means by stopping, I don't think we can stop DDos given the way things are currently ran, we can only block it. From my knowledge most softwares that stop DDoS do so by blocking, or even complete shut down like Mccolo.

*Stopping DDos
**One method is to use the same way of eliminating DoS by rejecting a specific rate of subsequent requests but from irrelevant sources.

*How we could stop DDoS would be to have each connection to the internet assigned to a particular identity. This identity would be used to verify who is attempting connections. The reason DDoS works is because currently, IP addresses can be spoofed. The only way to verify an identity is to request a response, but by then the damage is done. With a verified identity, connection attempts being routed can be verified during transmission, so that the request may not necessarily even reach the destination host.

Basically, we need some encryption system using keys so that as the packets are being routed, the identity of the packet's sender can be verified. Ideally the decryption would be trivial so as to prevent noticeable latency. Because an identity is verified, if there is spoofing of packets, they would be dropped during the routing. If all the identities are verified and are still attempting a DDoS attack, the attacker's identity will be traced back to the attacker.

(I think we're not looking low enough. We're trying to find a solution for this problem assuming the system that made that problem possible is still unchanged. We enforce more security by identification, encryption, etc, but the system is still problem-prone. This will allow to identify an attacker, but after the attack was started (or even finished). It's like trying to eliminate theft from a society of poor, unemployed, uneducated people by enforcing more security and punishment. Which will help to reduce the rate and motivation, but can't stop the possible attack. It is pretty stupid analogy, but rather than policing that society, I want to make them rich, employed and educated, so that thefts are just not efficient way of getting goods for them. So, rather than protecting machines from attacks, I want to make the system where DDoS-attacks are just inappropriate.)

===2: Stopping phishing===
Group members: Waheed Ahmed, Nicolas Lessard, Raghad Al-Awwad, Tarjit Komal

* A way of automatically checking the signature of a message to make sure it really is from a trusted source.
** ie: "Nation of Banks, did your member TD send me a message to reset my password?"

*There should be filters to ensure where the message is coming from.If the message is coming from unknown source , it should be blocked.
*Don't use the links in an email to get to any web page, if you suspect the message might not be authentic.
*Avoid filling out forms in email messages that ask for personal financial information. Phishers can make exact forms which you can find on financial institution.
*Make is so a machine needs to be authorized to use your information -- A machine that you don't own can't use your information to do anything, regardless of whether he has it or not.
*Ensure that any website that requires the filling of personal information be a secure website which can be traced to the original organisation.
*Ensure that whatever browser you are using is up to date with the most recent security patches applied.
*Obviously, report and suspected phishing to the appropriate authorities so that proper action can be taken
*"three strikes and you're out"
**Each machine is responsible for the massages it releases. When a machine is a repeat offender it loses access privileges

===3: Limiting the spread of malware===
Group members: keith, Andrew Luczak, David Barrera, Trevor Gelowsky, Scott Lyons
*(KM) Heterogenous systems - it is much easier to write code to attack a single type of system
*(KM) Individualized security policies
**(AL) A baseline security level would help prevent malware spreading to/from a system with "individual non-security"
*(KM) Identify all programs through digital signatures
*(KM) Peer rating system for programs, customize security policies based on peer ratings
**(SL) Need some way to keep rating system from being "gamed"
***(AL) Maybe a program gets flagged if it experiences a rapid approval increase?
**(AL) Need to protect against benign programs with good ratings being updated into malware
*(KM) System level forensics on program execution and resource/file modification
*(KM) Customizable user and program blacklists
*(SL) Sandboxing with breach management - know what files have been modified by a process
*(SL) Trending - what does the application spend most of its time doing?

===4: Bandwidth hogs===
Group members: Mike Preston, Fahim Rahman, Michael Du Plessis, Matthew Chou, Ahmad Yafawi

*limit bandwidth for each user
*if user has significant bandwidth demands for a certain period of time
**add them to a watch list
**monitor their behaviour
**divert communication to other hosts that can satisfy requests.
***if there are no other hosts that can satisfy the request, then distribute data to other idle and capable hosts. Load is now reduced on the one link.
*QoS
*bandwidth management/scheduling (similar to OS scheduling)
**utilizing a round robin schedule to allow for periodic increases in bandwidth per user
**priority system that allows for more critical operations being done by a user to take precedence over others
*have the bandwidth separated evenly across all users and allow for users to donate their bandwidth amount for others to use, but can revoke it at any time
* Tiered Bandwidth Distribution
** The main idea is you get more bandwidth to your machine as much as you give back to the community. Its similar to some trackers and dark net programs in which they wont increase your download speed unless you contribute X amount of Bytes back to your peers.
** Tier 1, Basic privileges i.e. all machines have minimal bandwidth.
** Tier n, we define some requirements to be met then we increase bandwidth accordingly.
*** Drop a Tier if machine doesn't maintain the specified requirements of that specific tier.
*** Adv, monitoring bandwidth on the network is cheap while implementing what is stated above is not.
*As a metaphor to our "real world society," bandwidth control can be treated as we do speed for cars.
**Certain areas need more free flowing traffic, so speed limits are increased. Others require a slower pace which is enforced. These "areas" can be translated to users or programs in our distributed OS model
**There are repercussions to breaking any of these imposed limits
**Throttling provides once possible implementation of these constraints

Distributed OS: Winter 2011

2011-01-19T21:21:35Z

Raghad: /* 2: Stopping phishing */