Internet Attribution: Between Privacy and Cruciality: Difference between revisions

From Soma-notes
Raghad (talk | contribs)
Raghad (talk | contribs)
Line 35: Line 35:


==Cookies==
==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.   
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.   


If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.  
If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.  
Line 42: Line 42:


===Cookies as an Attribution System===
===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.  
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.


==IP Addresses==
==IP Addresses==

Revision as of 20:37, 11 April 2011

Abstract
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

Introduction

Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick background discussion on the current forms of attribution. Consequently, section 3 presents the dilemma of attribution, resolving the tension between attribution and privacy. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet. In section 5, we propose an abstract framework for achieving attribution that mimics attribution in the real world. And finally, a conclusion is presented in section 6.

What is Attribution

The act of attributing, especially the act of establishing a particular person as the creator of a work of art.<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

Problem Statement

The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

Problem Motivation

In today's world there is a growing need for strong attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

Scope

In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, a basic knowledge of computer science or computer systems will be required, to fully understand some of the concepts and terminology discussed within this paper,

Background

The problem of attribution is not one that just came up; it has been around for decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve. This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

Cookies

Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the user's computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

Cookies as an Attribution System

Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

IP Addresses

IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011<ref>http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars</ref>. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

IP Addresses as an Attribution System

Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

Authentication Systems

In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

Authentication Systems as an Attribution System

Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

The attribution dilemma

There are many facets to designing an attribution system, besides the technological aspects. In addition to the technologies and/or infrastructure available, one must also consider the issue of privacy, because when trying to achieve strong attribution personal privacy is compromised. Any system must try to find a balance between strong attribution and privacy. The balance is influenced by the application of the system. For instance, in the case of financial institutions, the clients as well as the institute will place more emphasis on attribution. Such institutes would like to establish unassailable authorization and authentication systems, so as to guarantee (to some degree) that agents involved in the transactions are who they claim they are. On the opposite side of the spectrum, are situations where privacy takes precedence. Political dissidents and whistle-blowers are relatively protected because there is no strong attribution system in place, which allows them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming such a ratio is found, another issue arises. Can the use of private information to track or punish a person be completely justified? Especially if it oversteps their privacy. One might think that this question is a little bit out of the scope of our paper. However, such ethical arguments must be addressed prior to designing, because a system that compromises individual privacy and protection can not be utilized.

There are other topics that attribution system must answer. Who should have the authority to attribute? What information can they attributed? And why do they need it? How is attribution achieved or measured? How accurate are IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents? How much can intermediate systems' cooperation contribute to achieving attribution? How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

Why do we need Attribution

Attribution system has many useful applications. The identification property can useful at establishing the client’s identity for Online Banking, identifying the involving party in eCommerce transaction, and can be taken advantage by marketers for more targeting Web advertisements.

Financial matters are not the only incentive for a strong attribution system. Establishing strong identification mechanism can provide better protection against cyber attacks. When the source of an attack can be recognize, then the proper authorities can prosecute the perpetrators of such crimes as: DoS, DDos, computer fraud, forgery and identity theft, sniffing private traffic, distributing illegal traffic and malware, spam, illegal and undesirable intrusions.

Why is it difficult to achieve attribution?

The problem rise largely due to how the Internet is designed. It does not have strong identification mechanisms, which makes it relatively anonymous for users. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can completely prevent against destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic. Also, depending on the type of the senders and receivers, different attribution policy will be required.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. There are many existing methods that attempt to identify the source of an act, like IP traceback. There are problems with trying to identify the source by its IP address. For instance, it can be spoofed, which leads to misleads or inconclusive geographical location. IP addresses are not permanently bound to a single account, which makes linking IP to the appropriate person not concrete. IP traceback can be improved but that would require global cooperation of intermediate system, which currently does not exist.

In networks, users are not aware of all packets that are received by their machines, which mean users would not be aware of malware distribution, the creation of botnets and other actions taken by their machine without their approval and triggered by other a network user. Firewalls and packet filters can be used to address such problems, but they are not very efficient. Also, it is not practical to authenticate every single action on the internet.

There are attacks that designed specifically to prevent correct attribution. It is used for identity theft and distribution of malware. Stepping stone attack is a common way of attributing attacks to anonymity by using multiple public random agents(as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>

Requirements for internet attribution system

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

General

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

Deployment

It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

Practice

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

Proposed Framework

In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:

  • Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
  • Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an Network Interface Card, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
  • Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for an Md like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
  • Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
  • Globally Distributed Database (GDDB): a global DNS-like world-wide distributed storage system with an encrypted LUT that has relatively fast retrieval and update capabilities. It will be used to store ISs.
  • Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. This process is simply adding new ISs to the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of applications supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is running a system in the background that performs external (over the internet) system calls or is automated for periodic communication or automatic response to incoming requests. E.g., system clock synchronization (NTP), or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., when a user sends an HTTP request, TCP sends connection initiation packets for handshaking schemes, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that do not use the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the proposed framework.

Assumptions

For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and also the jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way that the GDDB subsystems would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the IS of the Ag owning the packet.

Methodology

Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake IS. A fake IS is defined as:

  • Either having a false unique chip identifier that refers to an imaginary Md.
  • Or having a false unique human identifier that refers to an imaginary human.
  • Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity generates the IS, accesses the GDDB for adding it and provides the user with his IS for being able to add it to the header of his launched packets. The user should preserve his unique IS in a secret place and should deal with it exactly the same way he does with his credit card numbers and social insurance numbers. If a device is not licensed (i.e., its IS was not inserted to the GDDB), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB which is in turn done by sending a copy of the IS printed on the packet to the GDDB. If a packet is found to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

Pros, Cons and Vulnerabilities

The proposed framework enjoys the following advantages:

  • It succeeds to perform an acceptable level of attribution relative to the one achieved in the real world.
  • It avoids anonymous attacks since a non-attributed packet will fail to reach its destination.
  • Attribution information is not publicly available to everyone, only available to trustful entities.
    • Hence, it retains personal privacy.
  • The system enjoys full automation. According to the system's theory of operation, ISS are either provided or not based on the validation of the IS printed on each packet.
  • The system avoids all forms of cyber crimes that are executed by unknown Ags.

The proposed framework suffers from the following disadvantages:

  • The verification process of the IS on each packet creates undesirable delays potential bottlenecks at the routers.
  • The framework is not considered easy to deploy since the assumptions are deemed relatively complex.
  • Since attribution is not public to everyone, custom content generation cannot be achievable.
  • Large number of Mds in University Laboratories, Incorporations, Hospitals, Schools, etc should all be licensed before being able to be used. Normally, in these cases, Mds would be bound to one single person.
  • For security purposes, licenses should be periodically renewable, however, this is not considered an easy topic.

The proposed framework is vulnerable to:

  • Botnets
    • The system requires full user awareness with what lies under the hood. Since they are the sole responsible persons for their Mds, they should be aware with all packets sneaking into their machines for avoiding the distribution of malware, and the later formation of botnets.
    • Users are responsible for strictly securing their Mds exactly the same they do when they lock their care after leaving it in a car park.
  • A successful attack on the GDDB would cause whole system failure. If this attack succeeds to alter, the attacker can append an imaginary IS. If the attack succeeds to read, the attacker can choose to declare his malicious packets under the responsibility of some other Ag, forgery.

Discussion

The proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. Recalling that in the real world, if a person doesn't have an identity (like a social insurance number), he can't benefit from services. For instance, he can't open a bank account, can't buy a house, can't trade, nor can he even get a job. Obviously, the proposed system mimics the behavior of the real world. Of course, the real world is not ideal in criminal tracing and law enforcement, however, it's level of attribution would definitely beat that of the Internet in the meantime. We can say that current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provides as much attribution as the real world attribution does. We could say that the proposed framework would guarantee such level.

The proposed framework succeeds to fulfill all of the general requirements. Clearly, any potentially destructive act is definitely traceable to an Ag or else, it will not take place. The framework also omits violation to any privacy related laws since the attribution information are not publicly available. More specifically, they are only available to the agreed on trusted entity. The framework also fulfills all of the deployment requirements. As can be seen, the more areas the system is deployed in, the better for the public good, hence, it is incrementally deployable. The framework is not very loosely coupled but can still allow the Internet to operate if it is suppressed. It is also adaptable to different rules and regulations since it leaves the punishment decision to the jurisdiction of the country with the source of the crime committer. Whatever the cost is to deploy that system, it should still be less than the cost of the losses due to cyber crimes. That is because the cost of losses due to unknown "future" attacks cannot be easily determined. As for the practice requirements, the proposed framework theory of operation doesn't permit mapping of a certain Ag to a set of actions, it only permits the mapping of a set of actions to an Ag, which satisfies non-bijection. Also, because of the distributed nature of the GDDB, all traceability information is impossible to collect at one place. The trusted entities are only the ones that generate the IS from the personal data, hence, they are the only ones having this piece of information. To conclude, the framework successfully satisfies all the requirements.

Conclusion

The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years after Holguinishburg invention. Licensing started when people began realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is passing through the same phase. People would blindly deny, refuse and object such "wicked" attribution systems, but later on, Internet licensing will be part of everyone's life, just like their driving license. Needless to say, the Internet is becoming more crucial to many applications and in the same time more vulnerable to different types of attacks. Obviously, it is being injected in the "blood" of a vast, yet exponentially growing, number of applications which are time and data sensitive, and which don't leave room for cyber crimes, unauthorized intrusion, traffic tampering, bandwidth hogging, etc. In addition, much of the industry and technology based applications are now build over the Internet as their underlying infrastructure, and cannot tolerate being threatened all the time by a completely anonymous person behind the seen seeking the proper moment to strike. Meanwhile, Internet Attribution is no longer an add-on, but an obligation.

In this paper, we have presented some formal definitions of attribution, why is it crucial to attribute, level of attribution would be considered acceptable and where the roots of difficulty lies behind achieving such level. Moreover, we have proposed a background about current attribution systems and a brief discussion about the reason of their survival and their point of failure as well. We also populated a list of requirements that must be fulfilled by any system aiming to acquire Internet attribution. Finally, we proposed a potential framework for a system that has that should fulfills the mentioned requirements and that should have the ability to achieve an acceptable level of Internet attribution. Pros, and Vulnerabilities of the proposed framework are also discussed.

References

<references/>