A link to the paper

From Soma-notes
Revision as of 16:33, 10 April 2011 by Omi (talk | contribs)
Jump to navigation Jump to search

Title

Proposed titles:

  • Requirements for Attribution on the Internet
  • Internet Attribution: Between Privacy and Cruciality

Abstract

Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

Introduction

Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

What is Attribution

The act of attributing, especially the act of establishing a particular person as the creator of a work of art.<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

Problem Statement

The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

Problem Motivation

In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

Scope

In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

Background

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve. This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

Cookies

Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

Cookies as an Attribution System

Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

IP Addresses

IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISP, who in turn allocates them to their users.


Pros

Cons

Authentication Systems

Pros

Cons


The attribution dilemma

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

  • While designing an attribution system one needs to consider balancing between attribution and privacy.
    • Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
  • When to decide to track a person and when not to (so as not to intrude privacy)?
  • How to make sure attribution is properly achieved?
  • Who should attribute who/what and why?
  • How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
  • How much can intermediate systems' cooperation contribute to achieving attribution?
  • Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
  • How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

Why do we need Attribution

  • For identifying purposes
    • Web Banking
    • eCommerce
    • Web advertisements
  • For better protection against cyber attacks:
    • DoS and DDos
    • Forgery and theft
    • Sniffing private traffic
    • Distributing illegal content/malware
    • Sending spam
    • Illegal/undesired intrusion
  • For marketing purposes (privacy?)
    • custom (client-based) content generation

Why is it difficult to achieve attribution?

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

  • The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
  • Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:

  • IP addresses can be spoofed and hence, misleads the geographical location.
  • For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
  • IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
  • Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
  • Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
  • It is not applicable to authenticate every single action on the internet.

Attacks to prevent correct attribution of actions

  • Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
  • Forgery
    • Identity theft (impersonation)
    • Distribution of malware

Requirements for internet attribution system

General

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither – main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not a good idea, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and mostly have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

Deployment

It is much easier to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc). The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

Practice

Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should know the answer, it should be possible to know the answer, the answer "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Proposed Framework

In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies:

  • "Identification Stamp": An identification stamp is a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros,cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution.

Assumptions

For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the identification stamp of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

Methodology

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

First, Access devices must be licensed from the trustful entity   If not, it will not be able to benefit from global routing services. 2.  Licensing: binding a human's unique feature with a machine’s unique feature   Human unique feature: iris intricate structure   Machine unique feature: MAC address 3.  Licensing generates identification stamps


Pros, Cons and Vulnerabilities

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system. - restrictive assumptions (not easily deployable) - different regulative flavors - Custom content generation (not found) + attribution + attack avoidance + attribution not available to anyone + automated. services are either stopped or continued. + avoids attacks: DDoS, DoS, ... + Privacy V Botnets V attack on the distributed system which would cause whole system failure.


Privacy and Attribution Tradeoff

The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

Conclusion

References

<references/>