Soma-notes - User contributions [en]

A link to the paper

2011-04-11T05:57:55Z

Freetonik: /* Requirements for internet attribution system */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISPs, who in turn allocates them to their users.

Any device that goes online and communicates using IP needs an IP address. Over the years there has been a growing number of users going online and a number of devices owned by the users to go online. One of the more common examples of this is the increase of Internet ready mobile phones. Our current addressing system used by our current Internet Protocol Version 4 (IPv4) contains only 32bits which means it is only able to uniquely address 232 addresses (4,294,967,296) which is less that the number of people on this planet today. The very last batch of IP addresses was assigned out to the five RIRs early February 2011 [1]. This was foreseen since the 90s, which sprung the development of a new Internet Protocol version, IPv6, which uses a 128bit addressing system began.

IP addresses can either be static or dynamic. Static IP address is an address permanently assigned to a user due to configuration. A dynamic IP address is one in which a new address is assigned at every boot up. A Dynamic Host Configuration Protocol (DHCP) Server is usually responsible for assigning dynamic IP addresses to users. There are two main advantages for dynamic addressing; it eliminates the administrative cost involved with assigning static IP addresses, and it helps solve the issue of limited addressing space by allowing many devices “share” a single address if they go online at different times. Given the limited addressing space the ISPs have to work with, and to save administration costs, most ISPs assign dynamic IP addresses as standard a offer static IP address for a higher fee.

===IP Addresses as an Attribution System===
Although Internet addresses can be used to attribute packets to its sender, it will fail as an effective attribution system for a few reasons, but mainly that attackers can spoof their IP addresses. Spoofing IP addresses will even foil the efforts of IP trace backs.

==Authentication Systems==
In order for a website to make sure of the identity of who ever is visiting some pages, it provides an authentication system. This is usually a login name and password either assigned by the web server or chosen by the user. The biggest advantage this has is that now, attribution can be performed across different computers. The task of storing and securing login information is left to the web server, which is subject to attackers hacking into the server to stealing login information.

Login systems are attached to user accounts that sometime require private information in order to be setup. If the web server’s security is not good enough, security breeches may in turn lead to identity theft.

The process behind authentication systems is simple; using a typical web banking authentication system for instance, the process may go as follows. A user requests for a web account or one is automatically assigned to the user. The user sets up a password for accessing the account. When the user now goes to the website he is requested to “identify himself”, the user enters in his personal login information, the web server verifies this information with what it has stored in its database and either grants or denies access to the users personal page.

Authentication systems are only ever used when it involves the users wanting some privacy on the webserver, or when the user wishes to store some form of information on the web server.

===Authentication Systems as an Attribution System===
Authentication systems are very precise in the identification of people over the Internet and as such used by many companies. However it will have a serious privacy drawback if it were to be used as a global identification system. It will mean that virtually every web server will need to hold enough information about you to be able to identify you as an attacker. This would mean that to even a user randomly search for a cooking recipe online would need to login somehow to access the web server. People generally like the anonymity of surfing the web and a system like this will completely destroy this.

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, achieves high degree of attribution and remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet and human society Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency: main requirement for internet attribution system is simple: it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be discussed later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies that will be used within the scope of this section:
* Agent (Ag): the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Machines/Devices (Md): any piece of hardware that has access capability. It can either be a PDA, a laptop, a notebook, a PC, an NIC, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Identification Stamp (IS): a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an Md. So, for a device like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an Ag.
* Intermediate System Services (ISS): Services provided by intermediate systems (routers). For e.g., routing (main service), error checking, etc.
* Globally Distributed Database (GDDB): a global DNS-like world-wide distributed system that has relatively fast retrieval and update capabilities.
* Licensing: a process of giving the permission to intermediate systems to provide ISS to all packets that are launched from the agent that is requesting the license. All licenses are stored in the GDDB.

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of application supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is a running a system in the background that performs external (over the internet) system calls (global clock synchronization ) or is automated for periodic communication or automatic response to incoming requests. E.g., NTP, or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., TCP connection initiation packets and handshaking, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that are do not user the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution regarding the proposed framework.
==Assumptions==
For starters: Jurisdiction. This frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity should act as the Internet's law enforcement, which will be deemed as the primary inspector and jurisdiction for regulating all kinds of cyber crimes and misbehavior. This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions.

Secondly: GDDB. We assume that a GDDB is deployed, which acts as a "database" for storing ISs. Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access for read/write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system. A standardization protocol would be required to define the syntax and semantics as well as the nature of the way these GDDB would communicate with.

Thirdly: Ownership. We assume that every Md should be officially owned by a human. This owner is deemed as the official responsible for that Md, and who would also be accused if his Md found to misbehave or to launch malicious packets. The owning relationship between persons and machines is one-to-many. That is to say, a person can officially own one or more machines but a machine can only be owned by one person.

Finally: IP packets. Our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the "identification stamp" of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake identification stamp. A "fake identification stamp" is defined as:
* Either having a false unique chip identifier that refers to an imaginary device.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

A description of the system, based on the chronological order, is as follows. First, any newly bought machine or even a home made device, must be licensed from the trustful entity. The trustful entity access the globally distributed database of "identification stamps" and adds the new identification stamp of the agent that asked for license. If a device is not licensed (i.e., its "identification stamp" was not inserted to the distributed database), it doesn't not benefit from ISS.

From the intermediate system's perspective, when a router receives a packet, it verifies its IS. This is done by consulting the GDDB and sending it a copy of the IS found on the packet. If a packet founds to be not having an IS, the packet is prevented from benefiting from ISS and is simply dropped. If the GDDB replies with an invalid IS, again, the packet is dropped. If the GDDB replies with a success, this will mean that the packet's printed IS is verified. Thus, the packet benefits from the ISS and gets routed throughout the way.

==Pros, Cons and Vulnerabilities==
Obviously, the proposed framework's main focus is to ensure that any leaping packet is moving because it is known who does it belong to. If not, it is prevented from locomotion.

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
- Public PCs (in labs...), bound to whom?
- Full awareness of users with their systems
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Discussion==
The human nature refuses any change at the first sight. In 1769, Jonathan Holguinisburg finalized the invention of the first Cugnot Steam Trolly <ref>ckermann, Erik (2001). World History of the Automobile. SAE Press, p.14. [Online]. Available: http://books.google.com/books?id=yLZeQwqNmdgC&printsec=frontcover&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false </ref> which became nowadays automobiles. In 1903, car licensing began in North America, that is 134 years afterwards Holguinishburg invention. Licensing was triggered when people started realizing that a car could act as a lethal weapon, which therefore must be approved by the government to be driven by some person, and must also be formally linked to an owner who is considered the primary responsible for it.

Meanwhile, the Internet is surpassing the same phase. Obviously, the proposed system mimics the behavior of the real world in law enforcement and tracing criminals. As the reader might think of, current internet attribution in comparison to the real world attribution is considered a failure. An acceptable form of internet attribution would be considered basically acceptable if it, at least, provide as much attribution as the real world does. Consequently, we argue that our proposed frame work would guarantee at least as much precise results as those in the real world.

=Conclusion=

=References=
<references/>

[1] http://arstechnica.com/tech-policy/news/2011/02/river-of-ipv4-addresses-officially-runs-dry.ars

[2] Wikipedia Website

A link to the paper

2011-04-11T01:46:10Z

Freetonik: /* Requirements for internet attribution system */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISP, who in turn allocates them to their users.

Pros

Cons

==Authentication Systems==

Pros

Cons

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, yet remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet. Practice requirements define the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither: main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be added later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies:
* Agent: the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Identification Stamp: a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an agent.
* Licensing: a process of giving the permission to intermediate systems to provide service (mainly routing services) to all packets that are launched from the agent that is requesting the license.
* Machines/Devices: within the scope of this section, a machine/device are those which have access capabilities. It can either be a PDA, a laptop, a notebook, a PC, a NIC, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Distributed DB:

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of application supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is a running a system in the background that performs external (over the internet) system calls (global clock synchronization ) or is automated for periodic communication or automatic response to incoming requests. E.g., NTP, or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., TCP connection initiation packets and handshaking, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that are do not user the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution regarding the proposed framework.
==Assumptions==
For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Thirdly, we assume that a person can officially own multiple machines b

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the identification stamp of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake identification stamp. A fake identification stamp is defined as:
* Either having a false unique chip identifier that refers to an imaginary device.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

# First, Access devices must be licensed from the trustful entity
** If not, it will not be able to benefit from global routing services.
# Licensing: binding a human's unique feature with a machine’s unique feature
** Human unique feature: iris intricate structure
** Machine unique feature: MAC address
# Licensing generates identification stamps

==Pros, Cons and Vulnerabilities==

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
- Public PCs (in labs...), bound to whom?
- Full awareness of users with their systems
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Privacy and Attribution Tradeoff==
The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

=Conclusion=

=References=
<references/>

A link to the paper

2011-04-11T01:44:49Z

Freetonik: /* Requirements for internet attribution system */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISP, who in turn allocates them to their users.

Pros

Cons

==Authentication Systems==

Pros

Cons

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

It is hard to describe some hypothetical attribution system in detail, because there are many issues and complicated dependencies, and a lot of questions to answer or at least to try to answer before one can even think of implementing such a system. In this section we are trying to define high-level requirements for a good attribution system, while definition of good attribution system is not so clear, we take into account everything we have talked above. That is, the following requirements try to define the system in a way that avoids current problems, yet remains realistic.

We have separated those requirements in three sections: general requirements define the idea and overall goal of the system in high level, abstract terms. Deployment requirements set ground rules for deployability that makes sense in such a huge network as internet. Practice requirements defines the way system works, behaves and interacts with other bodies.

==General==

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither: main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be added later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies:
* Agent: the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Identification Stamp: a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an agent.
* Licensing: a process of giving the permission to intermediate systems to provide service (mainly routing services) to all packets that are launched from the agent that is requesting the license.
* Machines/Devices: within the scope of this section, a machine/device are those which have access capabilities. It can either be a PDA, a laptop, a notebook, a PC, a NIC, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Distributed DB:

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of application supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is a running a system in the background that performs external (over the internet) system calls (global clock synchronization ) or is automated for periodic communication or automatic response to incoming requests. E.g., NTP, or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., TCP connection initiation packets and handshaking, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that are do not user the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution regarding the proposed framework.
==Assumptions==
For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Thirdly, we assume that a person can officially own multiple machines b

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the identification stamp of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake identification stamp. A fake identification stamp is defined as:
* Either having a false unique chip identifier that refers to an imaginary device.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

# First, Access devices must be licensed from the trustful entity
** If not, it will not be able to benefit from global routing services.
# Licensing: binding a human's unique feature with a machine’s unique feature
** Human unique feature: iris intricate structure
** Machine unique feature: MAC address
# Licensing generates identification stamps

==Pros, Cons and Vulnerabilities==

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
- Public PCs (in labs...), bound to whom?
- Full awareness of users with their systems
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Privacy and Attribution Tradeoff==
The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

=Conclusion=

=References=
<references/>

A link to the paper

2011-04-11T01:38:43Z

Freetonik: /* Practice */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISP, who in turn allocates them to their users.

Pros

Cons

==Authentication Systems==

Pros

Cons

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

==General==

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither: main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be added later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to person, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should not be possible to know the answer; the question "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance. The goal is not only to make attribution system attribute, but also to make it impossible to use it in other way – for surveillance, spying, etc.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

Of course, it is not always the case that some trusted (by everyone) body exists, but generally we have governments and/or agencies we trust. It is important to divide the information between public and trusted body in a way allowing them to cooperate in time of need and not allowing to misuse the system from any side.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies:
* Agent: the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Identification Stamp: a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an agent.
* Licensing: a process of giving the permission to intermediate systems to provide service (mainly routing services) to all packets that are launched from the agent that is requesting the license.
* Machines/Devices: within the scope of this section, a machine/device are those which have access capabilities. It can either be a PDA, a laptop, a notebook, a PC, a NIC, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Distributed DB:

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of application supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is a running a system in the background that performs external (over the internet) system calls (global clock synchronization ) or is automated for periodic communication or automatic response to incoming requests. E.g., NTP, or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., TCP connection initiation packets and handshaking, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that are do not user the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution regarding the proposed framework.
==Assumptions==
For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Thirdly, we assume that a person can officially own multiple machines b

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the identification stamp of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake identification stamp. A fake identification stamp is defined as:
* Either having a false unique chip identifier that refers to an imaginary device.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

# First, Access devices must be licensed from the trustful entity
** If not, it will not be able to benefit from global routing services.
# Licensing: binding a human's unique feature with a machine’s unique feature
** Human unique feature: iris intricate structure
** Machine unique feature: MAC address
# Licensing generates identification stamps

==Pros, Cons and Vulnerabilities==

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
- Public PCs (in labs...), bound to whom?
- Full awareness of users with their systems
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Privacy and Attribution Tradeoff==
The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

=Conclusion=

=References=
<references/>

A link to the paper

2011-04-11T01:09:35Z

Freetonik: /* Deployment */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISP, who in turn allocates them to their users.

Pros

Cons

==Authentication Systems==

Pros

Cons

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

==General==

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither: main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be added later.

==Deployment==
It is relatively easy to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed while on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times by different bodies.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way. The big decision designers will have to make is the one regarding this line between dynamic adoptability and universality. Luckily, this sort of deepness goes beyond the scope of our paper.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc), or, in other words, spending more money. The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should be possible to know the answer, the answer "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies:
* Agent: the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Identification Stamp: a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an agent.
* Licensing: a process of giving the permission to intermediate systems to provide service (mainly routing services) to all packets that are launched from the agent that is requesting the license.
* Machines/Devices: within the scope of this section, a machine/device are those which have access capabilities. It can either be a PDA, a laptop, a notebook, a PC, a NIC, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Distributed DB:

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of application supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is a running a system in the background that performs external (over the internet) system calls (global clock synchronization ) or is automated for periodic communication or automatic response to incoming requests. E.g., NTP, or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., TCP connection initiation packets and handshaking, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that are do not user the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution regarding the proposed framework.
==Assumptions==
For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Thirdly, we assume that a person can officially own multiple machines b

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the identification stamp of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake identification stamp. A fake identification stamp is defined as:
* Either having a false unique chip identifier that refers to an imaginary device.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

# First, Access devices must be licensed from the trustful entity
** If not, it will not be able to benefit from global routing services.
# Licensing: binding a human's unique feature with a machine’s unique feature
** Human unique feature: iris intricate structure
** Machine unique feature: MAC address
# Licensing generates identification stamps

==Pros, Cons and Vulnerabilities==

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
- Public PCs (in labs...), bound to whom?
- Full awareness of users with their systems
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Privacy and Attribution Tradeoff==
The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

=Conclusion=

=References=
<references/>

A link to the paper

2011-04-11T00:51:20Z

Freetonik: /* General */

=Title=
Proposed titles:
* Requirements for Attribution on the Internet
* Internet Attribution: Between Privacy and Cruciality

=Abstract=
Present and past situations show a need for improved attribution systems, and arguably, scientific basis for properly functioning attribution systems are not yet defined. Lots of research have been focusing on attributing documents to authors for the sake of securing authorship rights and rapid identification of plagiarism. Many of those were revolving around the notion of using machine learning for linking articles to humans. Others proposed text classification and feature selection as a mean of detecting the author of a document. Unfortunately, not that much research is addressing the problem of lack of robust attribution system over the internet. Authentication, as a mean of attribution, has proved its efficiency but, needless to say, it is not applicable to authenticate every single packet hopping over the intermediate systems. This paper presents limits and advances in the attribution of actions to agents over the internet. It reviews current attribution technologies as well as the limits of those technologies. It also identifies the requirements of a proper attribution system and proposes a distributed (yet cooperative) approach for performing attribution over the internet.

=Introduction=
Currently the Internet infrastructure provides users partial anonymity. Unfortunately, that anonymity weakens the security for its users, because it incites advance users to exploit that feature. The lack of online identification married with bad intentions entices criminals to commit a number of Cyber Crimes without being caught, crimes which include: fraud, theft, forgery, impersonation, the distribution of Malware (and hence, botnets), traffic tampering, DoS, bandwidth hogging, etc. Consequently, internet attribution is a highly sensitive field that constitutes a cornerstone position within internet security. Needless to say, current solutions don't guarantee sufficient attribution nor are considered always applicable in most of the time, hence, current system suffers the lack of a relatively robust attribution mechanism. In the light of this context, we need better methodologies for reaching an acceptable success level for attributing actions to persons.

In principle, attribution can be defined as the mechanism of binding a system-defined act to an agent. An agent is typically an entity that has the ability to commit what constitutes an act. Within our focus, an agent could either be a person or a machine. It can also be defined as "determining the identity or location of an attacker or an attacker’s intermediary"<ref> [Institute for Defense Analyses, 2003]</ref>. Problems like IP address spoofing, lack of interoperability in intermediate systems, dynamic nature of IP addresses, unawareness of system users with lots of unknown packets sneaking to their machines and poor efficiency of firewalls and IDSs make this determination operation considerably difficult. In addition, some types of attacks are carried out to conceal the real agent behind an act. For instance, malware distribution (and hence the creation of botnets), and stepping stones aim to inflict vagueness around the correct human source behind the scene.

In this paper, we focus on defining what it takes to achieve an acceptably working attribution mechanism over the internet. To do that, we review past research works in attribution and discuss their common limitations as well as flaws and what can be done in common to enhance such schemes. We also argue that the lack of a globally deployed registration system that registers system users and grants them LICENSED access to the system enfeebles proper attribution and motivates illegitimate intrusions and irregular behavior. We show that employing the mentioned system would reduce the incentive of irregular behavior as well as remove the blaze of tempting anonymity, putting attackers under the risk of being easily caught. We also discuss how privacy, as a counter force to attribution, plays a big role in the internet and within its users and propose a framework that achieves relatively robust attribution mechanism and retains the privacy of users.

Much of the research done in literature focuses on attribution that is done for keeping track of authorship, i.e., attributing text to authors. In this paper, we don't question the cruciality of attribution in this field, but rather we address a higher level of attribution of all possible actions to agents, which is sadly deemed slightly obsolete from the current research perspective.

This paper starts by a quick discussion on the dilemma of attribution, resolving the tension between attribution and privacy. Consequently, section 3 argues about the reasons behind the essentiality of implementing proper attribution systems. Section 4 presents a fundamental set of requirements for achieving an acceptable level of attribution over the internet as well as proposes an abstract framework for achieving attribution. In section 5, a review on the currently implemented systems that achieve attribution is presented as well as flaws and points of failure of the surveyed papers. In section 6, the reasons behind the difficulty of achieving a proper attribution system. And finally, a conclusion is presented in section 7.

==What is Attribution==
''The act of attributing, especially the act of establishing a particular person as the creator of a work of art.''<ref> The American Heritage® Dictionary of the English Language, Fourth Edition copyright ©2000 by Houghton Mifflin Company. Updated in 2009. Published by Houghton Mifflin Company.</ref>

We are concerned with one particular type of attribution - binding an act to a person. This may include intermediate attributions, for example, an act to an agent (software, device, etc.) and then attribution of an agent to person. Narrowing the problem further, we're only concerned about attribution in large, dynamic networks, like internet. For sake of simplicity, in this paper we're going to reference to "binding an act to a person on the internet" as "attribution", while other types of attributing will be defined separately.

==Problem Statement==
The anonymity of the Internet makes it virtually impossible most of the time to properly identify who is who online. This is a double-ended sword as it not only provides a high level of privacy, but also makes it hard to identify people with malicious intent and cyber attackers.

==Problem Motivation==
In todays world there grows a strong need for attribution over the Internet mainly due to increased numbers of cyber attacks since its introduction in the 90’s. Many attackers have succeeded in causing both physical and financial damage to many companies over the Internet and going Scott free; due to the anonymity of the Internet, the attackers cannot be identified.

==Scope==
In this paper we are addressing the issue of attribution by providing a list of requirements that need to be met in order to have a fully stable and efficient attribution system over the Internet.

Although this is not a technical paper, in order to fully understand some of the concepts and terminology within this paper, a small knowledge of computer science or computer systems will be required.

=Background=

The problem of attribution is not one that just came up; it has been around fro decades but mostly to address identification issues as it pertained to websites or Internet service providers. A lot of different approaches towards attribution have been taken but mainly just to the extent of what that particular system stems to achieve.
This section gives an introduction to three of todays current attribution systems and discusses their pros and cons as they pertain to the type of global attribution we discuss in this paper.

==Cookies==
Websites will sometimes need to remember information about a visit or a visitor in order to improve viewer experience. Cookies are text files that are created by a web server, and stored by a web browser on the users computer. Cookies are used for many reasons; mainly authentication, remembering shopping cart information, and storing site preference. In actuality, they can be used to store any type of information that can be stored in a text file. When a page is requested the users web browser sends the request with the webserver’s cookie in the header part of the packet. All this is an automated process between the web browser and web server.

If at all the webserver receives a request without a cookie attached to it, it takes it as the first access that the browser is making to the server and sends one as part of the response which will be saved by the browser and resent on the next request. Cookies are usually encrypted in order for data security and information privacy, however they are still subject to the users control as they can be decrypted, modified and even deleted completely. It is also possible for a user to change their browser setting to not accept cookies at all.

Cookies can either have an expiration date or not; this is the date that the browser deletes the cookie. Cookies without an expiration date are deleted when the browser is closed. Some browsers allow you to automatically set how long you want cookies to be stored.

===Cookies as an Attribution System===
Looking at cookies as the type of attribution system we are looking for over the Internet, we will be able to achieve high precision on identifying computers that access a web server. However, The biggest draw back cookies have is that they can be deleted and manipulated. As such the use of cookies is not an effective attribution system.

==IP Addresses==
IP or Internet Protocol Addresses are 32 bit numerical identifiers for devices (ie computer, printer, scanner etc..) on a network. The users Internet Service Provider (ISP) provide this number. The Internet Assigned Numbers Authority (IANA) is responsible for managing IP Address space allocation globally. It does this with the help of five Regional Internet registries (RIRs), responsible for allocating IP address blocks to their assigned regions ISP, who in turn allocates them to their users.

Pros

Cons

==Authentication Systems==

Pros

Cons

=The attribution dilemma=

Designing an attribution system is not a trivial task, because, regardless of technologies and/or infrastructure available, one needs to consider controversial question of balancing between strong attribution and privacy. This hypothetical line between attribution and privacy is not straight, and crucially depends on application. For instance, large financial institutions as well as its clients are interested in strong attribution system, which would solve many authorization and authentication problems, as well as will guarantee (to some degree) that agents of transactions are who they claim they are. On the other hand, political dissidents and whistle-blowers do exist primarily because there is no 100% effective attribution system in place and it is possible for them to distribute information (regardless of actual usefulness or goodness of it) and keep their identity secret. It is clear that single universal set of rules cannot satisfy these two cases. It is also clear that, in pretty abstract fashion, privacy is inversely proportional to attribution. While designing an attribution system one needs not only to decide on this ration for some particular case, but rather make this ratio dynamically changed depending on the case.

Assuming this ratio is found, another question is when to decide to use private information to track or punish a person, as to directly intrude their privacy? One might think that this question is a little bit out of the scope of our paper. This is true, however, these and a lot of less obviously related questions should be answered prior to designing, because in such an important thing as protection and privacy, designing of solution should not make too many assumptions and should guarantee something not only to operators of the system, but for users as well. In other words, even though system should be dynamic and adaptable to all potential use cases, it should remain universal to some extent and guarantee some law-related and moral principles.

(here go other questions. will show connection to requirements)

* While designing an attribution system one needs to consider balancing between attribution and privacy.
**Sometimes non-attribution is very crucial,to protect political dissidents and whistle-blowers
* When to decide to track a person and when not to (so as not to intrude privacy)?
* How to make sure attribution is properly achieved?
* Who should attribute who/what and why?
* How far can we trust IP-traceback, stepping stone authentications, link identifications and packet filtering in wedging packets to agents?
* How much can intermediate systems' cooperation contribute to achieving attribution?
* Should there be consequences upon attributing an action(s) to an agent? What are they? (punishment, rewarding, etc)
* How to deal with misleading data sources hiding behind botnets and concealing identities via stepping stones?

==Why do we need Attribution==

* For identifying purposes
** Web Banking
** eCommerce
** Web advertisements

* For better protection against cyber attacks:
** DoS and DDos
** Forgery and theft
** Sniffing private traffic
** Distributing illegal content/malware
** Sending spam
** Illegal/undesired intrusion

*For marketing purposes (privacy?)
** custom (client-based) content generation

==Why is it difficult to achieve attribution?==

The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

*The issue of lack of attribution on the web mostly arises whenever security is compromised. When you're bombarded with spam, or when a system is under a DoS attack attribution becomes a more appealing notion. Getting a balance between security and privacy is tricky, because once attacks are tracked so will all other traffic.
*Depending on the type of sender and receiver, different attribution policy will be requested.

In the ideal world, every action on the internet could be bound to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to personnel, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

===Attacks to prevent correct attribution of actions===

* Stepping stone attack: a common way of attributing attacks to anonymity by using multiple public random agents (as stepping stones) to reach the victim in order to conceal the attacking source. <ref name="ref1">S. Staniford-Chen and L. T. Heberlein. Holding intruders accountable on the internet. In SP ’95: Proceedings of the 1995 IEEE Symposium on Security and Privacy, page 39, Washington, DC, USA, 1995. IEEE Computer Society.</ref>
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=

==General==

First and most obvious requirement for any system is usually put for sake of consistency, rather than useful information, so we shall not avoid this notion neither: main requirement for internet attribution system is that it needs to attribute, or, more formally, any potentially destructive act should be traceable to an agent (person and/or organization, group, etc). It is important to consider different natures of agents, because the goal of attribution system is not necessarily to narrow down the search to one particular person, but rather to find the body responsible for an act(s) regardless of their actual structure. It might be one person after all. In other words, even though actions are done mostly by a single person, they are not necessarily the ones who's responsible for a decision to do so. A good real world analogy is an assassin and some body (person or a group) paying him. Good attribution system should not lead to assassin alone, but rather should be designed the way that responsible bodies are the ones to be discovered. Yet, we accept the notion that by the end of the day there is some person or several persons, human brings, responsible for an action. It is essential, because, as practice shows, for example, determining the source of DoS-attack is relatively simple, but most of the time this source is not the one who is responsible, but rather a victim itself.

It is easy to imagine a system in which less crime and misuse is the only acceptable way to do things, and many writers and movie directors exploit this idea in futuristic, science fiction and anti-utopian plots. Unfortunately, applying any of this sort of ideas to real world today is not possible, because a lot of laws and moral principles are already in place, some of which are not perfect, but widely accepted and most have reasons to exist. Attribution system that we're looking for should take legal and moral issues into account, naturally, should not violate and/or contradict any of them. This important requirement comes somewhat together with incremental deployability that we're going to discuss later.

In general, an attribution system should be universal and global, and details of these terms will be added later.

==Deployment==
It is much easier to just design a system, it is much harder to design a system, deployment of which need not be instant and massive. Even though a global attribution system will have a lot of pressure on it, internet should not depend on it entirely and in case attribution system goes down, underlying network should still remain functional. In other words, attribution system should be loosely coupled to the system it works in.

As discussed before, (and this could be said about any global system on the internet) such a system should be incrementally deployable, so that smooth, step-by-step, subnetwork-by-subnetwork integration is possible. This is important not only because of virtual impossibility to restart or reconfigure the whole internet at once. This incremental way of embedding an attribution system should be more secure (bugs in software and mistakes in design can be fixed on a small scale), so that by the end of cycle, when the whole internet is wired, the attribution system is field-tested and analyzed several times.

Very important, but controversial subject, is adoption of the system within some set of rules or laws (state laws, government regulations, corporate rules and principles, etc.). System should allow easy adoption for different cases, at the same time it should remain universal and global. It should act like a public tool any group can use, but nobody should be able to misuse it or use in non-legal way.

Companies and organizations sometimes loose millions of dolars due to attacks and other cyber-crimes done to them, and some issues can be dealt with by spending more resources (memory, bandwidth on servers, etc). The overall cost of setting up and maintaining the attribution system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc).

==Practice==

Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa. That it, nobody should be able to use the system to answer the question "what person X did/does?". Not only the system should not know the answer, it should be possible to know the answer, the answer "who did act X" is the one should be answered. This can be thought of as part of requirement about not violating current laws and moral principles, but is put as a separate requirement, since it is very important to draw a line between attribution system and surveillance.

Since this global system operates on the internet, it might not be a great idea to put names of persons into traceability database. It makes much more sense to put some unique IDs for any body, who uses the network, and in case a crime committed, or, in general, it is a case where an agent of some act should be determined, recorded ID will be searched for in police or government database. It should be some trusted entity (government, corporation, police, some public good-like system, etc) that stores the mapping between IDs and real names. This mapping should only be revealed when needed and when there is enough evidence or motivation to do so. Traceability information (namely, unique iDs) should be distributed and it is crucial to make it impossible to collect all the information in one place.

=Proposed Framework=
In this section, we will propose a potential framework and argue that it is able to fulfill the requirements listed in the former section. The proposed framework works under the core principle "An act cannot use network resources nor can it be routed if it is anonymously bound". Firstly, we start by defining some terminologies:
* Agent: the human-device pairing that sits on an end system and keeps transmitting/receiving packets.
* Identification Stamp: a series of bits that binds a human unique identification (iris intricate structure or fingerprint) with a unique feature of an access-capable device. So, for a device like a Network Interface Card, the MAC address would be that feature. This biding is a particular representation for the official owner of the device, and who is deemed the primary responsible for any outgoing packet launched by his owned device. In other words, it is a unique identifier for an agent.
* Licensing: a process of giving the permission to intermediate systems to provide service (mainly routing services) to all packets that are launched from the agent that is requesting the license.
* Machines/Devices: within the scope of this section, a machine/device are those which have access capabilities. It can either be a PDA, a laptop, a notebook, a PC, a NIC, or even a mere home made chip that can externally communicate wired or wireless to send or receive digital packets.
* Distributed DB:

In principle, every leaping packet has a human owner that is either directly or indirectly responsible for it. Directly responsible when he is running an application that sends requests or initiates communication sessions to another end system. E.g., using the client side of application supporting the protocols: HTTP, FTP, SIP, RTP, VoIP, etc. Indirect responsibility is when a user is a running a system in the background that performs external (over the internet) system calls (global clock synchronization ) or is automated for periodic communication or automatic response to incoming requests. E.g., NTP, or the server side of the protocols: HTTP, FTP, etc. In addition, indirect responsibility also includes the responsibility of all packets launched by lower layer protocols that are being manipulated by higher layer ones. E.g., TCP connection initiation packets and handshaking, ICMP packets that aims to seek and identify that status of a specific host, etc.

The scope of this framework only addresses attribution over the internet and not any other "locally" defined networks underlying the IEEE standard definitions of the topologies PAN, LAN, MAN, or a WAN that are do not user the global intermediate systems as their underlying infrastructure for packet delivery.

The following sections show the assumptions for this framework to operate, the methodology of its operation, a list of pros, cons and vulnerabilities of the system and wrap up by a discussion on the tradeoff between privacy and attribution regarding the proposed framework.
==Assumptions==
For starters, this frame work assumes the presence of a globally trustful entity(s) (e.g., government). This entity may either be centralized or distributed. A centralized entity would be easier to deploy, but it will suffer from a single point of failure. A distributed entity would obviously perform better as it will be able to scale with the growth of the system users as well as conform to diverse regional laws, regulations, customs and traditions. However, a standard protocol would be required to define the syntax and semantics as well as the nature of the way these distributed sub-systems would communicate.

Second, we assume that a DNS-like world-wide distributed system is deployed. This system acts as a "database" for storing "identification stamps". Symmetric key encryption should be used to protect that system as it will only be accessed through two types of users. Routers, which should be able to access this database ONLY for read operations. And the trustful entity (defined in the previous assumption) which should be able to access ONLY for write operations. Both users must be strictly authenticated, for being able to decrypt the contents or to append. In addition, this distributed system must guarantee almost zero latency in the read operations as it will be heavily relied on for every single hop made by a packet at the Internet intermediate system.

Thirdly, we assume that a person can officially own multiple machines b

Finally, our proposed frame work assumes that within the frame format of the IP packets, a header is added by the network layer that includes the identification stamp of the packet owner. A packet owner is the person PLUS the machine that are together responsible for launching this packet.

==Methodology==
Basically, this framework works by stalling the propagation of a packet that is either unattributed or forged with a fake identification stamp. A fake identification stamp is defined as:
* Either having a false unique chip identifier that refers to an imaginary device.
* Or having a false unique human identifier that refers to an imaginary human.
* Or having a misleading binding of a human to a machine. i.e., claiming that some machine "X" belongs to some human "Y", but in reality, "Y" is not the owner of "X".

Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.

# First, Access devices must be licensed from the trustful entity
** If not, it will not be able to benefit from global routing services.
# Licensing: binding a human's unique feature with a machine’s unique feature
** Human unique feature: iris intricate structure
** Machine unique feature: MAC address
# Licensing generates identification stamps

==Pros, Cons and Vulnerabilities==

- delays and bottlenecks due to licensing system at the routers for consulting the distributed system.
- restrictive assumptions (not easily deployable)
- different regulative flavors
- Custom content generation (not found)
- Public PCs (in labs...), bound to whom?
- Full awareness of users with their systems
+ attribution
+ attack avoidance
+ attribution not available to anyone
+ automated. services are either stopped or continued.
+ avoids attacks: DDoS, DoS, ...
+ Privacy
V Botnets
V attack on the distributed system which would cause whole system failure.

==Privacy and Attribution Tradeoff==
The human nature refuses any change in the first sight. But, as with cars, they first started without the need for licensing, and then, it licensing systems were applied afterwards. People got used to it slowly then thoroughly.

=Conclusion=

=References=
<references/>

A link to the paper

2011-04-10T20:55:48Z

Freetonik: /* Practice */

A link to the paper

2011-04-10T20:33:32Z

Freetonik: /* General */

A link to the paper

2011-04-10T19:54:43Z

Freetonik: /* Practice */

A link to the paper

2011-04-10T19:42:34Z

Freetonik: /* Deployment */

A link to the paper

2011-04-10T19:23:13Z

Freetonik: /* Requirements for internet attribution system */

A link to the paper

2011-04-10T19:22:58Z

Freetonik: /* General */

A link to the paper

2011-03-29T15:05:58Z

Freetonik: /* Requirements for internet attribution system */

A link to the paper

2011-03-29T14:58:48Z

Freetonik: /* The attribution dilemma */

A link to the paper

2011-03-29T14:56:21Z

Freetonik: /* The attribution dilemma */

A link to the paper

2011-03-29T14:37:08Z

Freetonik: /* What is Attribution */

A link to the paper

2011-03-29T14:36:50Z

Freetonik: /* What is Attribution */

A link to the paper

2011-03-17T18:16:28Z

Freetonik: /* Why we need Attribution */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=
===Definition===
Binding and act to an agent (person or device)

=The attribution dilemma=
While designing an attribution system one needs to consider balancing between attribution and privacy.

==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

===Omi===
===Raghad===
The issue of lack of attribution on the web mostly arises whenever security is compromised.

===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
For identifying persons/devices when any of these attacks are detected:
* DoS and DDos
* Forgery and theft
* Sniffing private traffic
* Distributing illegal content
* Sending spam
* For marketing purposes (privacy?)
** custom (client-based) content generation

==Attacks to prevent correct attribution of actions ==
* Stepping stone attack
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)
* Attribution system should be adoptable to different set of rules and principles (laws of countries, organizations' policies, etc), yet remain universal

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

A link to the paper

2011-03-17T18:11:09Z

Freetonik: /* Requirements for internet attribution system */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=
===Definition===
Binding and act to an agent (person or device)

=The attribution dilemma=
While designing an attribution system one needs to consider balancing between attribution and privacy.

==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

===Omi===
===Raghad===
===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
For identifying persons/devices when any of these attacks are detected:
* DoS and DDos
* Forgery and theft
* Sniffing private traffic
* Distributing illegal content
* Sending spam

For marketing purposes (privacy?)

==Attacks to prevent correct attribution of actions ==
* Stepping stone attack
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)
* Attribution system should be adoptable to different set of rules and principles (laws of countries, organizations' policies, etc), yet remain universal

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

A link to the paper

2011-03-17T18:03:55Z

Freetonik: /* Why we need Attribution */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=
===Definition===
Binding and act to an agent (person or device)

=The attribution dilemma=
While designing an attribution system one needs to consider balancing between attribution and privacy.

==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

===Omi===
===Raghad===
===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
For identifying persons/devices when any of these are detected:
* DoS and DDos
* Forgery and theft
* Sniffing private traffic
* Distributing illegal content

==Attacks to prevent correct attribution of actions ==
* Stepping stone attack
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

A link to the paper

2011-03-17T17:55:32Z

Freetonik: /* The attribution dilemma */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=

=The attribution dilemma=
While designing an attribution system one needs to consider balancing between attribution and privacy.

==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

===Omi===
===Raghad===
===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
For identifying persons/devices when any of these are detected:
* DoS and DDos
* Forgery and theft
* Sniffing private traffic

==Attacks to prevent attributing actions ==
* Stepping stone attack
* Forgery
** Identity theft (impersonation)
** Distribution of malware

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

A link to the paper

2011-03-17T17:39:58Z

Freetonik: /* Rakhim */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=

=The attribution dilemma=
==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences. Of course, no system can prevent 100% of destructive attempts, but some potentially good attribution system should make such attempts highly undesirable and "costly" for an attacker.

===Omi===
===Raghad===
===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
*DoS
==Attribution Attacks==
* Stepping stone attack
* Forgery
** Identity theft

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

A link to the paper

2011-03-17T17:38:00Z

Freetonik: /* Rakhim */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=

=The attribution dilemma=
==What is the attribution problem==
===Rakhim===
The main problem I see is that the way Internet is designed makes it possible and relatively easy to act without compromising identity. Moreover, most current solutions are based on the same structure and work within the same scope, thus, can only reduce the number of potentially destructive acts or just deal with the consequences.

===Omi===
===Raghad===
===AbdelRahman===
In the ideal world, every action on the internet could be bind to a machine and thus to a person. This is done by examining the source IP printed on each moving packet, locating the geographical location of this IP, consulting the ISP covering the location and identifying the person. If an act requires strict attribution (like checking and sending emails), authentication is used. Here is what goes wrong:
* IP addresses can be spoofed and hence, misleads the geographical location.
* For avoiding that problem, IP traceback can be performed BUT it requires global cooperation of intermediate systems... it is not there!
* IPs are not permanently bound to a person, so figuring out the person from the IP is not concrete.
* Network users are not aware of all packets sneaking to their machines, which allows for malware distribution and hence, the creation of botnets... misleading attribution!
* Firewalls and packet filters can be used for avoiding that problem, but they are not 100% efficient.
* It is not applicable to authenticate every single action on the internet.

==Why we need Attribution==
*DoS
==Attribution Attacks==
* Stepping stone attack
* Forgery
** Identity theft

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

A link to the paper

2011-03-17T17:26:43Z

Freetonik: /* Rakhem */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=

=The attribution dilemma=
==What is the attribution problem==
===Rakhim===

===Omi===
===Raghad===
===AbdelRahman===

==Why we need Attribution==
*DoS
==Attribution Attacks==
* Stepping stone attack
* Forgery
** Identity theft

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

A link to the paper

2011-03-17T15:00:09Z

Freetonik: Requirements added (draft)

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=

=The attribution dilemma=
==What is the attribution problem==
==Why we need Attribution==
*DoS
==Attribution Attacks==
* Stepping stone attack
* Forgery
** Identity theft

=Requirements for internet attribution system=
(Unstructured draft)

* Any potentially destructive act should be traceable to a person (and/or organization, group, etc)
* Traceability should not violate any current privacy-related laws and moral principles
* Attribution mapping should not be a bijection, in other words action should map to persons, but not vice versa
* Traceability information should be distributed
* It should be impossible to collect all traceability data in one place
* Personal data should be stored by trusted authorities (e.g. governments)
* Traceability information and personal data should be separated, a connection to be revealed only when needed
* Attribution system should be incrementally deployable
* Cost of setting up and maintaining the system for a particular body (person, organization, network) should be considerably less than average losses under current lack of attribution (e.g. DoS, identity theft, etc)

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

A link to the paper

2011-03-15T17:58:34Z

Freetonik: /* Attribution Attacks */

=Title=
Requirements for Attribution on the Internet

=Abstract=

=Introduction=

=The attribution dilemma=
==What is the attribution problem==
==Attribution Attacks==
* Stepping stone attack
* Forgery
** Identity theft

=Related Work=
2004: [http://ieeexplore.ieee.org.proxy.library.carleton.ca/stamp/stamp.jsp?tp=&arnumber=1437851 This] paper uses both link identification and filtering for achieving IP traceback WITHOUT the presence of high network cooperation.

=Requirements=

DistOS-2011W Attribution

2011-03-15T14:27:36Z

Freetonik: /* Thursday, March 10th */

==Members==
* AbdelRahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

=Meeting Briefings=
==Tuesday, March 1st==
After 20 minutes of brainstorming, we agreed on:
* Current internet infrastructure lacks the ability of achieving highly scalable and efficient attribution mechanism.
* Attribution must be implemented in a distributed manner and must be automated and not owned.
* Threats that should be addressed include (but not limited to):
** Computers, individuals and applications impersonation
** All types of electronic spoofing.
* The skeleton of our project will constitute four main aspects:
** Tracing/Tracking: baseline for attribution.
** Human identification: a MUST to include!
** Machine identification: to be dissolved with human identification.
** Storage: how and where to store data traces and the identification stamps.
==Thursday, March 3rd==
Decided Task Distribution:
* Tracing/Tracking: Omi
* Human identification: Raghad
* Machine identification: AbdelRahman
* Storage: Rakhim
==Thursday, March 10th==
Basic Proposal: 
Upon questioning the capabilities of the currently deployed global network, it was agreed that it lacks the ability of achieving a relatively high attribution property. By "relatively", we mean in comparison to the "world's" attribution standards (i.e., the percentage of success in binding an act to a person in the real world). Moreover, any system (h/w or s/w) that is to operate at the end systems is useless because it can be messed with.
As a result, a proposed model was basically discussed. It employs the rule: 
"An act cannot use network resources nor can it be routed if it is anonymously bound." 
Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.
The proposed system requires the following:
# Globally trustful entity(s) (e.g., government)
# Any newly bought (or even handmade/privately manufactured) device that has access capabilities must be licensed from the trustful entity (defined in 1), or else, it will not be able to benefit from global routing services.
# The licensing mechanism occurs by binding a human's unique feature (e.g., iris intricate structure) with a machine unique feature (e.g., MAC address) generating a chunk called identification stamp. (The inclusion of the passport number in the identification stamps is still under investigation for the sake of tracking the punishing the prime committer).
# A DNS-like world-wide distributed system is to be encrypted and deployed that acts as a database for storing all identification stamps. The system can ONLY be accessible for READ operations by the routers, and can ONLY be accessible for WRITE operations by the trustful entity(s) defined in 1.
# Within the frame format of the IP protocol, a header is to be added including the identification stamp of the packet owner.
# Attribution mapping should not be bijection, in other words action should map to persons, but not vice versa.
Upon achieving these requirements, the mentioned rule will apply. When a router receives a packet, it should first consult the global database for verifying the identification stamp of the packet. If it was not verified, the router drops the packet.

As can be noticed the proposed system still lacks lots of definitions in its functionality. For example, it can't prevent the creation of botnets, forgery and other similar attacks. In principle, a web server provides a service on behalf of someone, should web servers have permanent identification stamps (as a replacement of certificates)? In addition, factors like router latencies, DB protection, who to elect as global trustful entity still needs to be addressed.

To be done: 
* Strictly define the requirements of a good attribution system.
* Analyzing what the currently implemented attribution systems lack.
* (optional) Proposing a model that arguably employs attribution.

Attribution Definition: 
"Binding an act to a person" - Prof. Anil

==Tuesday, March 15th==

=Surveyed Papers=

[1]Marco Gruteser, Suman Banerjee, Marco Gruteser, Vladimir Barik, Wireless device identification with radiometric signatures, University of Wisconsin at Madison, Madison, WI, USA, 2008. [http://portal.acm.org/citation.cfm?id=1409959 PDF]

*ABSTRACT
We design, implement, and evaluate a technique to identify the source network interface card (NIC) of an IEEE 802.11 frame through passive radio-frequency analysis. This technique, called PARADIS, leverages minute imperfections of transmitter hardware that are acquired at manufacture and are present even in otherwise identical NICs. These imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals. In PARADIS, we measure differentiating artifacts of individual wireless frames in the modulation domain, apply suitable machine-learning classification tools to achieve significantly higher degrees of NIC identification accuracy than prior best known schemes.
We experimentally demonstrate effectiveness of PARADIS in differentiating between more than 130 identical 802.11 NICs with accuracy in excess of 99%. Our results also show that the accuracy of PARADIS is resilient against ambient noise and fluctuations of the wireless channel.
Although our implementation deals exclusively with IEEE 802.11, the approach itself is general and will work with any digital modulation scheme.

[2] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, AT&T Labs-Research, Florham Park, NJ, 2004. [http://portal.acm.org/citation.cfm?id=988672.988742 PDF]

*ABSTRACT
The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

[3] Roger Clarke, Human Identification in Information Systems: Management Challenges and Public Policy Issues [http://www.emeraldinsight.com/journals.htm?articleid=883434&show=abstract PDF/HTML]

*ABSTRACT
Many information systems involve data about people. In order reliably to associate data with particular individuals, it is necessary that an effective and efficient identification scheme be established and maintained. There is remarkably little in the information technology literature concerning human identification. Seeks to overcome that deficiency by undertaking a survey of human identity and human identification. Discusses techniques including names, codes, knowledge-based and token-based identification, and biometrics. Identifies the key challenge to management as being to devise a scheme which is practicable and economic, and of sufficiently high integrity to address the risks the organization confronts in its dealings with people. Proposes that much greater use be made of schemes which are designed to afford people anonymity, or which enable them to use multiple identities or pseudonyms, while at the same time protecting the organization's own interest. Describes multi-purpose and inhabitant registration schemes, and notes the recurrence of proposals to implement and extend them. Identifies public policy issues. Of especial concern is the threat to personal privacy that the general-purpose use of an inhabitant registrant scheme represents. Speculates that, where such schemes are pursued energetically, the reaction may be strong enough to threaten the social fabric.

=Milestones=
* Problem definition
* Literature review
* Comparison of literature
* Requirements for a proper attribution scheme
* Discussions
* Conclusion and Future Work

=Paper=
[[A link to the paper]]

=Project Progress=
Coming Soon!

=Requirements=
* incremental deployability
* privacy

=Readings=
''really hard to find anything not from psychology''

DistOS-2011W Attribution

2011-03-15T14:27:20Z

Freetonik: /* Thursday, March 10th */

==Members==
* AbdelRahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

=Meeting Briefings=
==Tuesday, March 1st==
After 20 minutes of brainstorming, we agreed on:
* Current internet infrastructure lacks the ability of achieving highly scalable and efficient attribution mechanism.
* Attribution must be implemented in a distributed manner and must be automated and not owned.
* Threats that should be addressed include (but not limited to):
** Computers, individuals and applications impersonation
** All types of electronic spoofing.
* The skeleton of our project will constitute four main aspects:
** Tracing/Tracking: baseline for attribution.
** Human identification: a MUST to include!
** Machine identification: to be dissolved with human identification.
** Storage: how and where to store data traces and the identification stamps.
==Thursday, March 3rd==
Decided Task Distribution:
* Tracing/Tracking: Omi
* Human identification: Raghad
* Machine identification: AbdelRahman
* Storage: Rakhim
==Thursday, March 10th==
Basic Proposal: 
Upon questioning the capabilities of the currently deployed global network, it was agreed that it lacks the ability of achieving a relatively high attribution property. By "relatively", we mean in comparison to the "world's" attribution standards (i.e., the percentage of success in binding an act to a person in the real world). Moreover, any system (h/w or s/w) that is to operate at the end systems is useless because it can be messed with.
As a result, a proposed model was basically discussed. It employs the rule: 
"An act cannot use network resources nor can it be routed if it is anonymously bound." 
Noticeably, the routers, as primary constituents to the intermediate systems, should refrain from routing any data packets that are not fully attributed. As they are the main driving power behind delivering all malicious or benign packets, they should have great responsibility in achieving highly reliable attribution mechanism.
The proposed system requires the following:
# Globally trustful entity(s) (e.g., government)
# Any newly bought (or even handmade/privately manufactured) device that has access capabilities must be licensed from the trustful entity (defined in 1), or else, it will not be able to benefit from global routing services.
# The licensing mechanism occurs by binding a human's unique feature (e.g., iris intricate structure) with a machine unique feature (e.g., MAC address) generating a chunk called identification stamp. (The inclusion of the passport number in the identification stamps is still under investigation for the sake of tracking the punishing the prime committer).
# A DNS-like world-wide distributed system is to be encrypted and deployed that acts as a database for storing all identification stamps. The system can ONLY be accessible for READ operations by the routers, and can ONLY be accessible for WRITE operations by the trustful entity(s) defined in 1.
# Within the frame format of the IP protocol, a header is to be added including the identification stamp of the packet owner.
Upon achieving these requirements, the mentioned rule will apply. When a router receives a packet, it should first consult the global database for verifying the identification stamp of the packet. If it was not verified, the router drops the packet.
# Attribution mapping should not be bijection, in other words action should map to persons, but not vice versa.

As can be noticed the proposed system still lacks lots of definitions in its functionality. For example, it can't prevent the creation of botnets, forgery and other similar attacks. In principle, a web server provides a service on behalf of someone, should web servers have permanent identification stamps (as a replacement of certificates)? In addition, factors like router latencies, DB protection, who to elect as global trustful entity still needs to be addressed.

To be done: 
* Strictly define the requirements of a good attribution system.
* Analyzing what the currently implemented attribution systems lack.
* (optional) Proposing a model that arguably employs attribution.

Attribution Definition: 
"Binding an act to a person" - Prof. Anil

==Tuesday, March 15th==

=Surveyed Papers=

[1]Marco Gruteser, Suman Banerjee, Marco Gruteser, Vladimir Barik, Wireless device identification with radiometric signatures, University of Wisconsin at Madison, Madison, WI, USA, 2008. [http://portal.acm.org/citation.cfm?id=1409959 PDF]

*ABSTRACT
We design, implement, and evaluate a technique to identify the source network interface card (NIC) of an IEEE 802.11 frame through passive radio-frequency analysis. This technique, called PARADIS, leverages minute imperfections of transmitter hardware that are acquired at manufacture and are present even in otherwise identical NICs. These imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals. In PARADIS, we measure differentiating artifacts of individual wireless frames in the modulation domain, apply suitable machine-learning classification tools to achieve significantly higher degrees of NIC identification accuracy than prior best known schemes.
We experimentally demonstrate effectiveness of PARADIS in differentiating between more than 130 identical 802.11 NICs with accuracy in excess of 99%. Our results also show that the accuracy of PARADIS is resilient against ambient noise and fluctuations of the wireless channel.
Although our implementation deals exclusively with IEEE 802.11, the approach itself is general and will work with any digital modulation scheme.

[2] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, AT&T Labs-Research, Florham Park, NJ, 2004. [http://portal.acm.org/citation.cfm?id=988672.988742 PDF]

*ABSTRACT
The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

[3] Roger Clarke, Human Identification in Information Systems: Management Challenges and Public Policy Issues [http://www.emeraldinsight.com/journals.htm?articleid=883434&show=abstract PDF/HTML]

*ABSTRACT
Many information systems involve data about people. In order reliably to associate data with particular individuals, it is necessary that an effective and efficient identification scheme be established and maintained. There is remarkably little in the information technology literature concerning human identification. Seeks to overcome that deficiency by undertaking a survey of human identity and human identification. Discusses techniques including names, codes, knowledge-based and token-based identification, and biometrics. Identifies the key challenge to management as being to devise a scheme which is practicable and economic, and of sufficiently high integrity to address the risks the organization confronts in its dealings with people. Proposes that much greater use be made of schemes which are designed to afford people anonymity, or which enable them to use multiple identities or pseudonyms, while at the same time protecting the organization's own interest. Describes multi-purpose and inhabitant registration schemes, and notes the recurrence of proposals to implement and extend them. Identifies public policy issues. Of especial concern is the threat to personal privacy that the general-purpose use of an inhabitant registrant scheme represents. Speculates that, where such schemes are pursued energetically, the reaction may be strong enough to threaten the social fabric.

=Milestones=
* Problem definition
* Literature review
* Comparison of literature
* Requirements for a proper attribution scheme
* Discussions
* Conclusion and Future Work

=Paper=
[[A link to the paper]]

=Project Progress=
Coming Soon!

=Requirements=
* incremental deployability
* privacy

=Readings=
''really hard to find anything not from psychology''

DistOS-2011W Attribution

2011-03-10T15:54:47Z

Freetonik: /* Surveyed Papers */

==Members==
* Abdelrahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

=Meeting Briefings=
==Tuesday, March 1st==
After 20 minutes of brainstorming, we agreed on:
* Current internet infrastructure lacks the ability of achieving highly scalable and efficient attribution mechanism.
* Attribution must be implemented in a distributed manner and must be automated and not owned.
* Threats that should be addressed include (but not limited to):
** Computers, individuals and applications impersonation
** All types of electronic spoofing.
* The skeleton of our project will constitute four main aspects:
** Tracing/Tracking: baseline for attribution.
** Human identification: a MUST to include!
** Machine identification: to be dissolved with human identification.
** Storage: how and where to store data traces and the identification stamps.
==Thursday, March 3rd==
Coming Soon!

=Surveyed Papers=

[1]Marco Gruteser, Suman Banerjee, Marco Gruteser, Vladimir Barik, Wireless device identification with radiometric signatures, University of Wisconsin at Madison, Madison, WI, USA, 2008. [http://portal.acm.org/citation.cfm?id=1409959 PDF]

*ABSTRACT
We design, implement, and evaluate a technique to identify the source network interface card (NIC) of an IEEE 802.11 frame through passive radio-frequency analysis. This technique, called PARADIS, leverages minute imperfections of transmitter hardware that are acquired at manufacture and are present even in otherwise identical NICs. These imperfections are transmitter-specific and manifest themselves as artifacts of the emitted signals. In PARADIS, we measure differentiating artifacts of individual wireless frames in the modulation domain, apply suitable machine-learning classification tools to achieve significantly higher degrees of NIC identification accuracy than prior best known schemes.
We experimentally demonstrate effectiveness of PARADIS in differentiating between more than 130 identical 802.11 NICs with accuracy in excess of 99%. Our results also show that the accuracy of PARADIS is resilient against ambient noise and fluctuations of the wireless channel.
Although our implementation deals exclusively with IEEE 802.11, the approach itself is general and will work with any digital modulation scheme.

[2] Subhabrata Sen, Oliver Spatscheck, Dongmei Wang, Accurate, scalable in-network identification of p2p traffic using application signatures, AT&T Labs-Research, Florham Park, NJ, 2004. [http://portal.acm.org/citation.cfm?id=988672.988742 PDF]

*ABSTRACT
The ability to accurately identify the network traffic associated with different P2P applications is important to a broad range of network operations including application-specific traffic engineering, capacity planning, provisioning, service differentiation,etc. However, traditional traffic to higher-level application mapping techniques such as default server TCP or UDP network-port baseddisambiguation is highly inaccurate for some P2P applications.In this paper, we provide an efficient approach for identifying the P2P application traffic through application level signatures. We firstidentify the application level signatures by examining some available documentations, and packet-level traces. We then utilize the identified signatures to develop online filters that can efficiently and accurately track the P2P traffic even on high-speed network links.We examine the performance of our application-level identification approach using five popular P2P protocols. Our measurements show thatour technique achieves less than 5% false positive and false negative ratios in most cases. We also show that our approach only requires the examination of the very first few packets (less than 10packets) to identify a P2P connection, which makes our approach highly scalable. Our technique can significantly improve the P2P traffic volume estimates over what pure network port based approaches provide. For instance, we were able to identify 3 times as much traffic for the popular Kazaa P2P protocol, compared to the traditional port-based approach.

[3] Roger Clarke, Human Identification in Information Systems: Management Challenges and Public Policy Issues [http://www.emeraldinsight.com/journals.htm?articleid=883434&show=abstract PDF/HTML]

*ABSTRACT
Many information systems involve data about people. In order reliably to associate data with particular individuals, it is necessary that an effective and efficient identification scheme be established and maintained. There is remarkably little in the information technology literature concerning human identification. Seeks to overcome that deficiency by undertaking a survey of human identity and human identification. Discusses techniques including names, codes, knowledge-based and token-based identification, and biometrics. Identifies the key challenge to management as being to devise a scheme which is practicable and economic, and of sufficiently high integrity to address the risks the organization confronts in its dealings with people. Proposes that much greater use be made of schemes which are designed to afford people anonymity, or which enable them to use multiple identities or pseudonyms, while at the same time protecting the organization's own interest. Describes multi-purpose and inhabitant registration schemes, and notes the recurrence of proposals to implement and extend them. Identifies public policy issues. Of especial concern is the threat to personal privacy that the general-purpose use of an inhabitant registrant scheme represents. Speculates that, where such schemes are pursued energetically, the reaction may be strong enough to threaten the social fabric.

=Milestones=
(Under Construction)
* Problem definition
* Literature review
* ??

=Project Progress=
Coming Soon!

==Requirements==
* incremental deployability
* privacy

==Readings==
''really hard to find anything not from psychology''

DistOS-2011W Attribution

2011-03-10T15:54:38Z

Freetonik: /* Surveyed Papers */

DistOS-2011W FWR

2011-03-03T02:08:27Z

Freetonik: /* References */

=Introduction=

The idea behind FWR (First Webocratic Republic) is simple: to create a self-governed community on the web.

The first approach was natural: write a CMS where users can register (become citizens of FWR), communicate freely, elect leaders and be elected. The main goal of such a community is to survive, so some source of income is needed to pay hosting provider. CMS needed to have two parts – public and for-citizens-only. Citizens elect leaders, who decide on the strategy of content-generation, then everybody work on public part (tourist site) to attract visitors and get money by ads, referral links, etc. This approach can't be described as fully democratic, since the owner of root-password on the server is still in complete, god-like power over the FWR.

To fix this issue, another idea was added: distribute the copy of the world (filesystem and DB) to all citizens in some p2p fashion (torrent, probably) every time government changes, so that if new government screws things up, everyone has a "backup world". This also contributes to overall distribution of FWR – every copy is fully-functional and can be set as a separate "country".

But this all wasn't distributed enough.

Here is the structure in mind prior to implementation:

[[File:FWR Scheme.png]]

CMS is running on the main server, but in read-only mode. The data (files, databases, etc) is synced by rsync from client machines. Clients are citizens, and only they can write to synced directories.
Local http server on client machine runs fully-functional CMS which can be accessed locally. There is also a set of scripts to work with files, databases and rsync. Clients can sync with each other too.

So, even though we still have main server for public access, the system does not depend on it. Main server can be easily changed.
System still runs being offline and tries to sync everything as soon as it gets back online.

Amount of governance needed has also changed: in the previous model elected leaders had access to server-files and databases, but now everyone has access to it. The only essential thing left is to moderate content being synced. This can be done by adding personal or global filters, not allowing particular people to sync with main server or with clients. Also, local and server-storage can be set as version control system, so that vandalism can be dealt with as in wikis. Moreover, every citizen can separate at any moment and run their own world.

This scheme can be a base of some collaboration system or just as a safe web-development environment.

=Setting up=

Using rsync both ways can lead to inconsistencies and errors, that's why another tool was chosen – [http://www.cis.upenn.edu/~bcpierce/unison/ unison]. It allows to sync files on two machines by issuing one command on either machine. Unison can use sockets or ssh to transfer data and can be used together with SVN or any other version control system.

==Central server==

The following steps are required to set up a central server:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (Debian example):

apt-get install openssh-server openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used. sshd daemon must be running now. If we want server to invoke synchronization, we need to generate keys and give public key to everyone, who wants to join the commmunity:

ssh-keygen -t dsa

File .ssh/id_dsa.pub is created. It is the public key.

3. Add user

adduser username /home/username

This user account is for FWR server only, it will run servers.
Then, create a directory /home/username/fwr. This is where all synced data will be stored.

4. Install Apache (Debian example):

apt-get install apache2

5. Install inotify utility:

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master --no-check-certificate && tar zxvf master

==Client==
Client machine runs local FWR server and syncs data with central server.
The following steps are required to set up a client machine:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (optional)

apt-get install openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used.

3. Create a private key for passwordless connections:

ssh-keygen -t dsa

4. Copy the key to central server:

ssh-copy-id -i .ssh/id_dsa.pub username@remote.machine.com

This will allow to avoid entering password while connecting to central server, so that synchronization can be done seamlessly. Of course, it is safer to give the key to central server administrator (elected official) who will then upload it without sharing the password of 'username' account on central server.

5. Install Apache (Debian example):

apt-get install apache2

6. Install watcher (or another inotify-based utility) (optional):

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

=Configuration=
==Server==

The following daemons should be running at all times on central server:

* sshd (open-ssh daemon allowing remote connections from clients)
* httpd (apache web-server)
* watcher.py (monitoring system, syncs data on every modification)

The following files should be present in /home/username on central server:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* fwr_clients (list of clients' usernames and ip-adresses)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name. The citizen site must be protected on server side, so appropriate settings in /home/username/fwr/www/.htaccess must be added and .htaccess file should be ignored at synchronization. This will be described later.

Much safer approach is to avoid putting fwr-citizen-site on public server altogether (as described in the general scheme in the beginning).

===inotify + pyinotify===

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh csync

And run watcher as daemon:

python watcher.py start -c watcher.ini

Now fwr.sh bash script is executed with parameter 'csync' (clients sync) every time any modification occurs in 'fwr' directory.

==Client==

The following files should be present in /home/username on client:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name.

===inotify + pyinotify (optional)===

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh sync

And run watcher as daemon:

python watcher.py start -c watcher.ini

==fwr.sh==

This simple script will deal with synchronization with server or with clients.

#! /bin/bash
FWR_PATH='/home/username/fwr'
CLIENT_LIST='/home/username/fwr_clients'
FWR_SERVER_PATH='username@centralserver.com'

if [ ! $# == 1 ]; then
echo "Usage: $0 [csync | sync]"
fi

if [ $1 == 'csync' ]; then
if [ ! -f $CLIENT_LIST ]; then
echo "File not found!"
fi
cat $CLIENT_LIST | while read line; do
unison $FWR_PATH ssh://$line/fwr -auto -silent -batch
done
fi

if [ $1 == 'sync' ]; then
unison $FWR_PATH ssh://$FWR_SERVER_PATH -auto -silent -batch
fi

=What now?=
So, now we have some sort of pseudo-distributed web-server. Does it work? The system was tested in local wifi-network with the following setup:
* central server: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]
* 3 clients, each one: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]

Each client was running its own httpd and a every modification triggers fwr.sh script to sync with central server. This modifies data on central server, and these modifications trigger fwr.sh to sync with all clients consecutively. This configuration works pretty fast, but more increase in number of clients will lead to major time delays and potential inconsistencies.

Each client can create its own list of clients in 'fwr_clients' file and synchronize with them without touching central server by issuing 'fwr.sh csync'. Different topologies can be used depending on workload and frequencies of synchronizations.

To deal with inconsistencies and data vandalism, some version control system can be introduced and installed at least on central server, but ideally - on every client as well. It should be pretty straight-forward configuration and there are plenty of documentation on combining rsync or unison with SVN or other version control system. A much more obvious solution is to replace unison with git. Git can synchronize directories, but also keeps versions and branches, and able to merge two or more development histories. You need to install git-core and use something like this to pull in changes:

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=…

function doGitPull {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
#pwd >> $sGitLogFile 2>&1
#whoami >> $sGitLogFile 2>&1
#printenv >> $sGitLogFile 2>&1
git pull –verbose $sPathToRemoteGit HEAD >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo $(date +%F[%T]) ” = STARTING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPull
echo $(date +%F[%T]) ” = ENDING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “” >> $sGitLogFile 2>&1 # Time stamp the end of processing

And something like this to push (sync):

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=….

function doGitPush {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
git add –verbose -A >> $sGitLogFile 2>&1
git commit –verbose -m”bkup” >> $sGitLogFile 2>&1
git push –verbose $sPathToRemoteGit master –receive-pack=’git receive-pack’ >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = STARTING TIME of common-push ” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPush >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = ENDING TIME of common-push\n\n\n” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1

These two scripts must be called consecutively from 'fwr.sh' instead of single unison call.

We already have a pretty useful system that suits at least one use-case: collaborative web-development. But how about democracy and republic? We still need some officials to be elected, but what should they do? Since everything now belongs to everyone, the only responsibility left is to look after central server. This job is important, but not crucial: even if elected official does a bad data synchronization and then issues "rm -rf /" on central server, the data is not lost and the system still works (especially if some interconnection between clients' exist as described above). So we can elect officials, but don't really worry about risks.

The only thing central server's needed for is, actually, internet-access to some data, generated by clients. But if the goal of community doesn't include this requirement, central server is no longer needed. It is important to choose the right of clients' interconnections in this case, but the good thing is: we no longer need officials!

=Conclusions=

I learned that many good and useful things can be built with present tools and technology. Things like Facebook, Twitter, Dropbox only support this thought – none of them contributed anything truly revolutionary technology-wise, yet they are very successful projects. I believe there are many possibilities of building great distributed systems, but it is only a question of engineering, not necessarily invention.

While setting this system up I understood, how easy it is to move towards more distributed configuration, and everyone should be able to do so, at least for the sake of data safety (backups). Even though the technology is here and it is available, people still trust companies or corporations to do this for them. There is no reason why is there are mediators between people and any useful technology.

In this particular instance, I can say that pseudo-distributed web-server works pretty well with small number of clients. Minor changes are being synced almost seamlessly, major changes from different contributors can be stored separately (assuming some version control system is in place) or merged. There are a lot of ideas to try under this setup. Potentially, if all the clients be visible on the internet under single name, then the web-server could really be distributed.

=References=

* Debian http://www.debian.org/
* Unison http://www.cis.upenn.edu/~bcpierce/unison/
* OpenSSH http://www.openssh.com/
* Apache http://www.apache.org/
* Watcher https://github.com/splitbrain/Watcher

DistOS-2011W FWR

2011-03-03T02:06:06Z

Freetonik:

DistOS-2011W Attribution

2011-03-03T02:03:59Z

Freetonik:

==Members==
* Abdelrahman Abdou
* Raghad Al-Awwad
* Omi Iyamu
* Rakhim Davletkaliyev

==Areas==
* identification
* tracing
* storage

==Requirements==
* incremental deployability
* privacy

==Readings==

''really hard to find anything not from psychology''

DistOS-2011W FWR

2011-03-03T01:58:59Z

Freetonik:

DistOS-2011W FWR

2011-03-01T16:55:22Z

Freetonik: /* What now? */

DistOS-2011W FWR

2011-03-01T16:54:56Z

Freetonik: /* What now? */

=Introduction=

The idea behind FWR (First Webocratic Republic) is simple: to create a self-governed community on the web.

The first approach was natural: write a CMS where users can register (become citizens of FWR), communicate freely, elect leaders and be elected. The main goal of such a community is to survive, so some source of income is needed to pay hosting provider. CMS needed to have two parts – public and for-citizens-only. Citizens elect leaders, who decide on the strategy of content-generation, then everybody work on public part (tourist site) to attract visitors and get money by ads, referral links, etc. This approach can't be described as fully democratic, since the owner of root-password on the server is still in complete, god-like power over the FWR.

To fix this issue, another idea was added: distribute the copy of the world (filesystem and DB) to all citizens in some p2p fashion (torrent, probably) every time government changes, so that if new government screws things up, everyone has a "backup world". This also contributes to overall distribution of FWR – every copy is fully-functional and can be set as a separate "country".

But this all wasn't distributed enough.

Here is the structure in mind prior to implementation:

[[File:FWR Scheme.png]]

CMS is running on the main server, but in read-only mode. The data (files, databases, etc) is synced by rsync from client machines. Clients are citizens, and only they can write to synced directories.
Local http server on client machine runs fully-functional CMS which can be accessed locally. There is also a set of scripts to work with files, databases and rsync. Clients can sync with each other too.

So, even though we still have main server for public access, the system does not depend on it. Main server can be easily changed.
System still runs being offline and tries to sync everything as soon as it gets back online.

Amount of governance needed has also changed: in the previous model elected leaders had access to server-files and databases, but now everyone has access to it. The only essential thing left is to moderate content being synced. This can be done by adding personal or global filters, not allowing particular people to sync with main server or with clients. Also, local and server-storage can be set as version control system, so that vandalism can be dealt with as in wikis. Moreover, every citizen can separate at any moment and run their own world.

This scheme can be a base of some collaboration system or just as a safe web-development environment.

=Setting up=

Using rsync both ways can lead to inconsistencies and errors, that's why another tool was chosen – [http://www.cis.upenn.edu/~bcpierce/unison/ unison]. It allows to sync files on two machines by issuing one command on either machine. Unison can use sockets or ssh to transfer data and can be used together with SVN or any other version control system.

==Central server==

The following steps are required to set up a central server:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (Debian example):

apt-get install openssh-server openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used. sshd daemon must be running now. If we want server to invoke synchronization, we need to generate keys and give public key to everyone, who wants to join the commmunity:

ssh-keygen -t dsa

File .ssh/id_dsa.pub is created. It is the public key.

3. Add user

adduser username /home/username

This user account is for FWR server only, it will run servers.
Then, create a directory /home/username/fwr. This is where all synced data will be stored.

4. Install Apache (Debian example):

apt-get install apache2

5. Install inotify utility:

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master --no-check-certificate && tar zxvf master

==Client==
Client machine runs local FWR server and syncs data with central server.
The following steps are required to set up a client machine:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (optional)

apt-get install openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used.

3. Create a private key for passwordless connections:

ssh-keygen -t dsa

4. Copy the key to central server:

ssh-copy-id -i .ssh/id_dsa.pub username@remote.machine.com

This will allow to avoid entering password while connecting to central server, so that synchronization can be done seamlessly. Of course, it is safer to give the key to central server administrator (elected official) who will then upload it without sharing the password of 'username' account on central server.

5. Install Apache (Debian example):

apt-get install apache2

6. Install watcher (or another inotify-based utility) (optional):

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

=Configuration=
==Server==

The following daemons should be running at all times on central server:

* sshd (open-ssh daemon allowing remote connections from clients)
* httpd (apache web-server)
* watcher.py (monitoring system, syncs data on every modification)

The following files should be present in /home/username on central server:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* fwr_clients (list of clients' usernames and ip-adresses)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name. The citizen site must be protected on server side, so appropriate settings in /home/username/fwr/www/.htaccess must be added and .htaccess file should be ignored at synchronization. This will be described later.

Much safer approach is to avoid putting fwr-citizen-site on public server altogether (as described in the general scheme in the beginning).

===inotify + pyinotify===

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh csync

And run watcher as daemon:

python watcher.py start -c watcher.ini

Now fwr.sh bash script is executed with parameter 'csync' (clients sync) every time any modification occurs in 'fwr' directory.

==Client==

The following files should be present in /home/username on client:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name.

===inotify + pyinotify (optional)===

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh sync

And run watcher as daemon:

python watcher.py start -c watcher.ini

==fwr.sh==

This simple script will deal with synchronization with server or with clients.

#! /bin/bash
FWR_PATH='/home/username/fwr'
CLIENT_LIST='/home/username/fwr_clients'
FWR_SERVER_PATH='username@centralserver.com'

if [ ! $# == 1 ]; then
echo "Usage: $0 [csync | sync]"
fi

if [ $1 == 'csync' ]; then
if [ ! -f $CLIENT_LIST ]; then
echo "File not found!"
fi
cat $CLIENT_LIST | while read line; do
unison $FWR_PATH ssh://$line/fwr -auto -silent -batch
done
fi

if [ $1 == 'sync' ]; then
unison $FWR_PATH ssh://$FWR_SERVER_PATH -auto -silent -batch
fi

=What now?=
So, now we have some sort of pseudo-distributed web-server. Does it work? The system was tested in local wifi-network with the following setup:
* central server: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]
* 3 clients, each one: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]

Each client was running its own httpd and a every modification triggers fwr.sh script to sync with central server. This modifies data on central server, and these modifications trigger fwr.sh to sync with all clients consecutively. This configuration works pretty fast, but more increase in number of clients will lead to major time delays and potential inconsistencies.

Each client can create its own list of clients in 'fwr_clients' file and synchronize with them without touching central server by issuing 'fwr.sh csync'. Different topologies can be used depending on workload and frequencies of synchronizations.

To deal with inconsistencies and data vandalism, some version control system can be introduced and installed at least on central server, but ideally - on every client as well. It should be pretty straight-forward configuration and there are plenty of documentation on combining rsync or unison with SVN or other version control system. A much more obvious solution is to replace unison with git. Git can synchronize directories, but also keeps versions and branches, and able to merge two or more development histories. You need to install git-core and use something like this to pull in changes:

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=…

function doGitPull {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
#pwd >> $sGitLogFile 2>&1
#whoami >> $sGitLogFile 2>&1
#printenv >> $sGitLogFile 2>&1
git pull –verbose $sPathToRemoteGit HEAD >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo $(date +%F[%T]) ” = STARTING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPull
echo $(date +%F[%T]) ” = ENDING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “” >> $sGitLogFile 2>&1 # Time stamp the end of processing

And something like this to push (sync):

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=….

function doGitPush {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
git add –verbose -A >> $sGitLogFile 2>&1
git commit –verbose -m”bkup” >> $sGitLogFile 2>&1
git push –verbose $sPathToRemoteGit master –receive-pack=’git receive-pack’ >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = STARTING TIME of common-push ” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPush >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = ENDING TIME of common-push\n\n\n” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1

These two scripts must be called consecutively from 'fwr.sh' instead of single unison call.

We already have a pretty useful system that suits at least for one use-case: collaborative web-development. But how about democracy and republic? We still need some officials to be elected, but what should they do? Since everything now belongs to everyone, the only responsibility left is to look after central server. This job is important, but not crucial: even if elected official does a bad data synchronization and then issues "rm -rf /" on central server, the data is not lost and the system still works (especially if some interconnection between clients' exist as described above). So we can elect officials, but don't really worry about risks.

The only thing central server's needed for is, actually, internet-access to some data, generated by clients. But if the goal of community doesn't include this requirement, central server is no longer needed. It is important to choose the right of clients' interconnections in this case, but the good thing is: we no longer need officials!

DistOS-2011W FWR

2011-03-01T16:54:28Z

Freetonik: /* What now? */

=Introduction=

The idea behind FWR (First Webocratic Republic) is simple: to create a self-governed community on the web.

The first approach was natural: write a CMS where users can register (become citizens of FWR), communicate freely, elect leaders and be elected. The main goal of such a community is to survive, so some source of income is needed to pay hosting provider. CMS needed to have two parts – public and for-citizens-only. Citizens elect leaders, who decide on the strategy of content-generation, then everybody work on public part (tourist site) to attract visitors and get money by ads, referral links, etc. This approach can't be described as fully democratic, since the owner of root-password on the server is still in complete, god-like power over the FWR.

To fix this issue, another idea was added: distribute the copy of the world (filesystem and DB) to all citizens in some p2p fashion (torrent, probably) every time government changes, so that if new government screws things up, everyone has a "backup world". This also contributes to overall distribution of FWR – every copy is fully-functional and can be set as a separate "country".

But this all wasn't distributed enough.

Here is the structure in mind prior to implementation:

[[File:FWR Scheme.png]]

CMS is running on the main server, but in read-only mode. The data (files, databases, etc) is synced by rsync from client machines. Clients are citizens, and only they can write to synced directories.
Local http server on client machine runs fully-functional CMS which can be accessed locally. There is also a set of scripts to work with files, databases and rsync. Clients can sync with each other too.

So, even though we still have main server for public access, the system does not depend on it. Main server can be easily changed.
System still runs being offline and tries to sync everything as soon as it gets back online.

Amount of governance needed has also changed: in the previous model elected leaders had access to server-files and databases, but now everyone has access to it. The only essential thing left is to moderate content being synced. This can be done by adding personal or global filters, not allowing particular people to sync with main server or with clients. Also, local and server-storage can be set as version control system, so that vandalism can be dealt with as in wikis. Moreover, every citizen can separate at any moment and run their own world.

This scheme can be a base of some collaboration system or just as a safe web-development environment.

=Setting up=

Using rsync both ways can lead to inconsistencies and errors, that's why another tool was chosen – [http://www.cis.upenn.edu/~bcpierce/unison/ unison]. It allows to sync files on two machines by issuing one command on either machine. Unison can use sockets or ssh to transfer data and can be used together with SVN or any other version control system.

==Central server==

The following steps are required to set up a central server:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (Debian example):

apt-get install openssh-server openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used. sshd daemon must be running now. If we want server to invoke synchronization, we need to generate keys and give public key to everyone, who wants to join the commmunity:

ssh-keygen -t dsa

File .ssh/id_dsa.pub is created. It is the public key.

3. Add user

adduser username /home/username

This user account is for FWR server only, it will run servers.
Then, create a directory /home/username/fwr. This is where all synced data will be stored.

4. Install Apache (Debian example):

apt-get install apache2

5. Install inotify utility:

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master --no-check-certificate && tar zxvf master

==Client==
Client machine runs local FWR server and syncs data with central server.
The following steps are required to set up a client machine:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (optional)

apt-get install openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used.

3. Create a private key for passwordless connections:

ssh-keygen -t dsa

4. Copy the key to central server:

ssh-copy-id -i .ssh/id_dsa.pub username@remote.machine.com

This will allow to avoid entering password while connecting to central server, so that synchronization can be done seamlessly. Of course, it is safer to give the key to central server administrator (elected official) who will then upload it without sharing the password of 'username' account on central server.

5. Install Apache (Debian example):

apt-get install apache2

6. Install watcher (or another inotify-based utility) (optional):

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

=Configuration=
==Server==

The following daemons should be running at all times on central server:

* sshd (open-ssh daemon allowing remote connections from clients)
* httpd (apache web-server)
* watcher.py (monitoring system, syncs data on every modification)

The following files should be present in /home/username on central server:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* fwr_clients (list of clients' usernames and ip-adresses)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name. The citizen site must be protected on server side, so appropriate settings in /home/username/fwr/www/.htaccess must be added and .htaccess file should be ignored at synchronization. This will be described later.

Much safer approach is to avoid putting fwr-citizen-site on public server altogether (as described in the general scheme in the beginning).

===inotify + pyinotify===

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh csync

And run watcher as daemon:

python watcher.py start -c watcher.ini

Now fwr.sh bash script is executed with parameter 'csync' (clients sync) every time any modification occurs in 'fwr' directory.

==Client==

The following files should be present in /home/username on client:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name.

===inotify + pyinotify (optional)===

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh sync

And run watcher as daemon:

python watcher.py start -c watcher.ini

==fwr.sh==

This simple script will deal with synchronization with server or with clients.

#! /bin/bash
FWR_PATH='/home/username/fwr'
CLIENT_LIST='/home/username/fwr_clients'
FWR_SERVER_PATH='username@centralserver.com'

if [ ! $# == 1 ]; then
echo "Usage: $0 [csync | sync]"
fi

if [ $1 == 'csync' ]; then
if [ ! -f $CLIENT_LIST ]; then
echo "File not found!"
fi
cat $CLIENT_LIST | while read line; do
unison $FWR_PATH ssh://$line/fwr -auto -silent -batch
done
fi

if [ $1 == 'sync' ]; then
unison $FWR_PATH ssh://$FWR_SERVER_PATH -auto -silent -batch
fi

=What now?=
So, now we have some sort of pseudo-distributed web-server. Does it work? The system was tested in local wifi-network with the following setup:
* central server: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]
* 3 clients, each one: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]

Each client was running its own httpd and a every modification triggers fwr.sh script to sync with central server. This modifies data on central server, and these modifications trigger fwr.sh to sync with all clients consecutively. This configuration works pretty fast, but more increase in number of clients will lead to major time delays and potential inconsistencies.

Each client can create its own list of clients in 'fwr_clients' file and synchronize with them without touching central server by issuing 'fwr.sh csync'. Different topologies can be used depending on workload and frequencies of synchronizations.

To deal with inconsistencies and data vandalism, some version control system can be introduced and installed at least on central server, but ideally - on every client as well. It should be pretty straight-forward configuration and there are plenty of documentation on combining rsync or unison with SVN or other version control system. A much more obvious solution is to replace unison with git. Git can synchronize directories, but also keeps versions and branches, and able to merge two or more development histories. You need to install git-core and use something like this to pull in changes:

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=…

function doGitPull {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
#pwd >> $sGitLogFile 2>&1
#whoami >> $sGitLogFile 2>&1
#printenv >> $sGitLogFile 2>&1
git pull –verbose $sPathToRemoteGit HEAD >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo $(date +%F[%T]) ” = STARTING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPull
echo $(date +%F[%T]) ” = ENDING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “” >> $sGitLogFile 2>&1 # Time stamp the end of processing

And something like this to push (sync):

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=….

function doGitPush {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
git add –verbose -A >> $sGitLogFile 2>&1
git commit –verbose -m”bkup” >> $sGitLogFile 2>&1
git push –verbose $sPathToRemoteGit master –receive-pack=’git receive-pack’ >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = STARTING TIME of common-push ” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPush >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = ENDING TIME of common-push\n\n\n” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1

These two scripts must be called consecutively from 'fwr.sh' instead of single unison call.

We already have pretty useful system that suits at least for one use-case: collaborative web-development. But how about democracy and republic? We still need some officials to be elected, but what should they do? Since everything now belongs to everyone, the only responsibility left is to look after central server. This job is important, but not crucial: even if elected official does a bad data synchronization and then issues "rm -rf /" on central server, the data is not lost and the system still works (especially if some interconnection between clients' exist as described above). So we can elect officials, but don't really worry about risks.

The only thing central server's needed for is, actually, internet-access to some data, generated by clients. But if the goal of community doesn't include this requirement, central server is no longer needed. It is important to choose the right of clients' interconnections in this case, but the good thing is: we no longer need officials!

DistOS-2011W FWR

2011-03-01T16:53:46Z

Freetonik:

=Introduction=

The idea behind FWR (First Webocratic Republic) is simple: to create a self-governed community on the web.

The first approach was natural: write a CMS where users can register (become citizens of FWR), communicate freely, elect leaders and be elected. The main goal of such a community is to survive, so some source of income is needed to pay hosting provider. CMS needed to have two parts – public and for-citizens-only. Citizens elect leaders, who decide on the strategy of content-generation, then everybody work on public part (tourist site) to attract visitors and get money by ads, referral links, etc. This approach can't be described as fully democratic, since the owner of root-password on the server is still in complete, god-like power over the FWR.

To fix this issue, another idea was added: distribute the copy of the world (filesystem and DB) to all citizens in some p2p fashion (torrent, probably) every time government changes, so that if new government screws things up, everyone has a "backup world". This also contributes to overall distribution of FWR – every copy is fully-functional and can be set as a separate "country".

But this all wasn't distributed enough.

Here is the structure in mind prior to implementation:

[[File:FWR Scheme.png]]

CMS is running on the main server, but in read-only mode. The data (files, databases, etc) is synced by rsync from client machines. Clients are citizens, and only they can write to synced directories.
Local http server on client machine runs fully-functional CMS which can be accessed locally. There is also a set of scripts to work with files, databases and rsync. Clients can sync with each other too.

So, even though we still have main server for public access, the system does not depend on it. Main server can be easily changed.
System still runs being offline and tries to sync everything as soon as it gets back online.

Amount of governance needed has also changed: in the previous model elected leaders had access to server-files and databases, but now everyone has access to it. The only essential thing left is to moderate content being synced. This can be done by adding personal or global filters, not allowing particular people to sync with main server or with clients. Also, local and server-storage can be set as version control system, so that vandalism can be dealt with as in wikis. Moreover, every citizen can separate at any moment and run their own world.

This scheme can be a base of some collaboration system or just as a safe web-development environment.

=Setting up=

Using rsync both ways can lead to inconsistencies and errors, that's why another tool was chosen – [http://www.cis.upenn.edu/~bcpierce/unison/ unison]. It allows to sync files on two machines by issuing one command on either machine. Unison can use sockets or ssh to transfer data and can be used together with SVN or any other version control system.

==Central server==

The following steps are required to set up a central server:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (Debian example):

apt-get install openssh-server openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used. sshd daemon must be running now. If we want server to invoke synchronization, we need to generate keys and give public key to everyone, who wants to join the commmunity:

ssh-keygen -t dsa

File .ssh/id_dsa.pub is created. It is the public key.

3. Add user

adduser username /home/username

This user account is for FWR server only, it will run servers.
Then, create a directory /home/username/fwr. This is where all synced data will be stored.

4. Install Apache (Debian example):

apt-get install apache2

5. Install inotify utility:

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master --no-check-certificate && tar zxvf master

==Client==
Client machine runs local FWR server and syncs data with central server.
The following steps are required to set up a client machine:

1. Install unison (Debian example):

apt-get install unison

2. Install open-ssh (optional)

apt-get install openssh-client

Since we use ssh to transfer data, open-ssh should be installed on both server and client. Alternatively, sockets can be used.

3. Create a private key for passwordless connections:

ssh-keygen -t dsa

4. Copy the key to central server:

ssh-copy-id -i .ssh/id_dsa.pub username@remote.machine.com

This will allow to avoid entering password while connecting to central server, so that synchronization can be done seamlessly. Of course, it is safer to give the key to central server administrator (elected official) who will then upload it without sharing the password of 'username' account on central server.

5. Install Apache (Debian example):

apt-get install apache2

6. Install watcher (or another inotify-based utility) (optional):

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

=Configuration=
==Server==

The following daemons should be running at all times on central server:

* sshd (open-ssh daemon allowing remote connections from clients)
* httpd (apache web-server)
* watcher.py (monitoring system, syncs data on every modification)

The following files should be present in /home/username on central server:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* fwr_clients (list of clients' usernames and ip-adresses)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name. The citizen site must be protected on server side, so appropriate settings in /home/username/fwr/www/.htaccess must be added and .htaccess file should be ignored at synchronization. This will be described later.

Much safer approach is to avoid putting fwr-citizen-site on public server altogether (as described in the general scheme in the beginning).

===inotify + pyinotify===

inotify-based utility will be monitoring our 'fwr' directory on server side and sync with all the clients on every modification. We could use incron, but it cannot monitor directories recursively. There is a tiny python utility called [https://github.com/splitbrain/Watcher Watcher] which uses Linux kernel's inotify via [http://pyinotify.sourceforge.net/ pyinotify] Python module. Watcher supports everything incron does and adds recursive monitoring.

Install Python, pyinotify, python-argparse

sudo apt-get install python python-pyinotify python-argparse

Download and unpack Watcher

wget https://github.com/splitbrain/Watcher/tarball/master && tar zxvf master

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh csync

And run watcher as daemon:

python watcher.py start -c watcher.ini

Now fwr.sh bash script is executed with parameter 'csync' (clients sync) every time any modification occurs in 'fwr' directory.

==Client==

The following files should be present in /home/username on client:

* fwr/www (citizen-site, not public)
* fwr/www_tourist (tourist-site, public)
* .ssh/authorized_keys (clients' keys)
* .ssh/known_hosts (clients' hosts)

===Apache===
Add the following to httpd.conf:

Alias /fwr /home/username/fwr/www
<Directory /home/username/fwr/www>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

Alias /fwr_tour /home/username/fwr/www_tourist
<Directory /home/username/fwr/www_tourist>
Options FollowSymLinks
AllowOverride Limit Options FileInfo
DirectoryIndex index.php
</Directory>

This can vary depending on CMS you want to use.
Restart apache.

/etc/init.d/apache2 restart

Now the contents of /home/username/fwr/www (which is the main citizen site) is available at http://website.com/fwr, and contents of /home/username/fwr/www_tourist (which is public citizen site) available at http://website.com/fwr_tour, where 'website.com' is server's public domain name.

===inotify + pyinotify (optional)===

Make the following changes to watcher.ini:

watch=/home/username/fwr
events=create,delete,attribute_change,write_close,modify
command=/home/username/fwr.sh sync

And run watcher as daemon:

python watcher.py start -c watcher.ini

==fwr.sh==

This simple script will deal with synchronization with server or with clients.

#! /bin/bash
FWR_PATH='/home/username/fwr'
CLIENT_LIST='/home/username/fwr_clients'
FWR_SERVER_PATH='username@centralserver.com'

if [ ! $# == 1 ]; then
echo "Usage: $0 [csync | sync]"
fi

if [ $1 == 'csync' ]; then
if [ ! -f $CLIENT_LIST ]; then
echo "File not found!"
fi
cat $CLIENT_LIST | while read line; do
unison $FWR_PATH ssh://$line/fwr -auto -silent -batch
done
fi

if [ $1 == 'sync' ]; then
unison $FWR_PATH ssh://$FWR_SERVER_PATH -auto -silent -batch
fi

=What now?=
So, now we have some sort of pseudo-distributed web-server. Does it work? The system was tested in local wifi-network with the following setup:
* central server: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]
* 3 clients, each one: Debian Squeeze + Apache + [http://www.pluck-cms.org/?file=kop1.php Pluck CMS]

Each client was running its own httpd and a every modification triggers fwr.sh script to sync with central server. This modifies data on central server, and these modifications trigger fwr.sh to sync with all clients consecutively. This configuration works pretty fast, but more increase in number of clients will lead to major time delays and potential inconsistencies.

Each client can create its own list of clients in 'fwr_clients' file and synchronize with them without touching central server by issuing 'fwr.sh csync'. Different topologies can be used depending on workload and frequencies of synchronizations.

To deal with inconsistencies and data vandalism, some version control system can be introduced and installed at least on central server, but ideally - on every client as well. It should be pretty straight-forward configuration and there are plenty of documentation on combining rsync or unison with SVN or other version control system. A much more obvious solution is to replace unison with git. Git can synchronize directories, but also keeps versions and branches, and able to merge two or more development histories. You need to install git-core and use something like this to pull in changes:

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=…

function doGitPull {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
#pwd >> $sGitLogFile 2>&1
#whoami >> $sGitLogFile 2>&1
#printenv >> $sGitLogFile 2>&1
git pull –verbose $sPathToRemoteGit HEAD >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo $(date +%F[%T]) ” = STARTING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPull
echo $(date +%F[%T]) ” = ENDING TIME of pull” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “” >> $sGitLogFile 2>&1 # Time stamp the end of processing

And something like this to push (sync):

#!/bin/bash

# GLOBALS
sGitLogFile=~/common-git.log
sPathToLocalDir=~/common
sPathToRemoteGit=….

function doGitPush {
cd $sPathToLocalDir >> $sGitLogFile 2>&1
git add –verbose -A >> $sGitLogFile 2>&1
git commit –verbose -m”bkup” >> $sGitLogFile 2>&1
git push –verbose $sPathToRemoteGit master –receive-pack=’git receive-pack’ >> $sGitLogFile 2>&1
}

echo “{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{{” >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = STARTING TIME of common-push ” >> $sGitLogFile 2>&1 # Time stamp the start of processing
doGitPush >> $sGitLogFile 2>&1
echo $(date +%F[%T]) ” = ENDING TIME of common-push\n\n\n” >> $sGitLogFile 2>&1 # Time stamp the end of processing
echo “}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}” >> $sGitLogFile 2>&1

These two scripts must be called consecutively from 'fwr.sh' instead of single unison call.

We already have pretty useful system that suits at least for one use-case: collaborative web-development. But how about democracy and republic? We still need some officials to be elected, but what should they do? Since everything now belongs to everyone, the only responsibility left is to look after central server. This job is important, but not crucial: even if elected official does a bad data synchronization and then issues "rm -rf /" on central server, the data is not lost and the system still works (especially if some interconnection between clients' exist as described above). So we can elect officials, but don't really worry about risks.

The only thing central server's needed for is, actually, internet-access to some data, generated by clients. But if the goal of community doesn't include this requirement, central server is no longer needed. It is important to choose the right of clients' interconnections in this case, but the good thing is: we no longer need officials!

DistOS-2011W FWR

2011-02-28T07:16:21Z

Freetonik: /* Configuration */

DistOS-2011W FWR

2011-02-28T02:25:55Z

Freetonik: /* Client */

DistOS-2011W FWR

2011-02-28T02:25:13Z

Freetonik: /* Central server */

DistOS-2011W FWR

2011-02-28T02:24:27Z

Freetonik: /* Client */

DistOS-2011W FWR

2011-02-28T02:22:22Z

Freetonik: /* Central server */

DistOS-2011W FWR

2011-02-28T02:17:54Z

Freetonik: /* Introduction */

DistOS-2011W FWR

2011-02-28T02:16:45Z

Freetonik: /* Incron */

DistOS-2011W FWR

2011-02-26T18:49:01Z

Freetonik: /* Apache */

DistOS-2011W FWR

2011-02-26T18:47:56Z

Freetonik: /* Client */

DistOS-2011W FWR

2011-02-26T18:46:00Z

Freetonik: /* Central server */

DistOS-2011W FWR

2011-02-26T18:45:28Z

Freetonik: /* Central server */

DistOS-2011W FWR

2011-02-26T00:24:31Z

Freetonik: /* Configuration */