Distributed OS: Winter 2011 Reputation Systems Paper

From Soma-notes

What is reputation?

In the real world, people are generally quite conscious of certain behavioural actions that make. These actions are expected to fall within the social norms and are scrutinized continuously by the people around us. On a daily basis, Individuals build a personal set of judgment values and opinions on others in the society. When we listen to a politician on the news, or interact with a friends, we are updating this image that we have of the individual or group. It is this image we generate that helps us make conclusions as to whether we like the individual, whether we trust the individual, or whether we can relate to the individual. The global opinions that others have on us is known as reputation.

A reputation system's main purpose is to facilitate in providing a means for assumptions to be made about the level of trust one can have for a particular person or situation in executing a task to our liking. It is important to note the importance of the word assumption. With the gathered information, we are able to generate an estimate of their actions. It is by no means accurate. Furthermore, reputation is not a globally accepted view of an entity. In some cases, an individuals reputation can be quite varied between different observers. Some may have encountered contact with the entity in a different context or had a different level of expectation compared to others <ref name="krukow" />. Likewise, some individuals might be falsely persuaded to confirm to specific opinions by large and powerful groups, whereas others have a crystallized and hard-to-change opinion.

How can reputation be used in a distributed environment?

Reputation can be useful in acquiring an understanding of how congruent one's own goals are from another. If we are to accomplish a desired task that requires the cooperation of others, we carefully analyze whether the individuals we choose will be a good fit or whether they will hinder our progress. Or, worse yet, halt our progress completely.

In a more technical and distributed view, reputation is the process of recording, aggregating, and distributing information about an entity's behaviour in distributed applications. Reputation might be based on the entity's past ability to adhere to a mutual contract with another entity <ref name="krukow">Krukow K. et al. A Logical Framework for Reputation Systems and History-based Access Control. School of Electronics and Computer Science University of Southampton, UK [March 3, 2011]</ref>. As stated above, the validity of acquired reputation is largely subjective and unknown. Clearly, if we are to achieve an optimal reputation system we will need a fixed set of rules or norms that are expected to be followed in certain situations. If we look back to the analogy with human's, we are - to a fairly high degree - able to maintain order in some parts of the world by enforcing rules. It is unreasonable to think that we can prevent all wrong-doing. There are always outliers that will oppose the greater society, but eventually the greater community will overcome those outliers and prevent them from being detrimental to society. There is no perfect solution to maintaining social order in reality, and likewise, there is no perfect solution for maintaining good behaviour of computational entities.

The idea of enforcing rules or generating reputation of other entities to use in a decision-making process are both realistic options. This is known as the Emerge vs. Impose problem. Do we maintain records based on a fixed set of imposed rules? Or do we build rules as the system emerges and reputations are formed. In our opinion, we feel the answer is both.

What systems are currently in place?

Reputation systems are used in a wide array of projects and applications, from e-commerce sites to the web as a whole. Currently, existing distributed systems do not have an ideal reputation system in place. We will discuss two forms of existing systems. Peer-based and policy-based systems. Peer-based systems rely on emergent reputation, while policy-based systems rely on imposed rules.

Peer-based systems are ones in which end-users provide reputation information about a certain subject. Sites such as eBay and Youtube utilize rating and comment systems. Particularly, eBay uses an interaction-based form of reputation to provide information about buyers and sellers<ref name="ebayreputation">Reputation Management. Wikipedia. http://en.wikipedia.org/wiki/Reputation_management [March 28, 2011]</ref>.

Policy-based systems can be found in a variety of application frameworks. Two examples include Java and Android. These systems enforce a developer to state the intentions of the application in what's known as a policy file. The stated intentions are required as a security measure for access to crucial parts of the system<ref name="javapolicy">Default Policy Implementation and Policy File Syntax. Oracle. http://download.oracle.com/javase/1.3/docs/guide/security/PolicyFiles.html [March 7, 2011]</ref>. For mobile devices, if an application needs to acquire the GPS location or read/write contact information this must be stated in the policy file<ref name="android">Android. Google. http://developer.android.com/index.html [March 28, 2011]</ref>. Otherwise, an application cannot be deployed. Furthermore, items on this policy file are presented to the user and if a user is suspicious about an application needing access to unnecessary utilities, they can choose to not install the application. For example, a "stop-watch" application might appear extremely suspicious to a user if it requested access to contact information and internet access. Interestingly, Android and other mobile application frameworks such as iOS<ref name="ios">iOS Developer Guide. Apple. http://developer.apple.com/devcenter/ios/index.action [March 28, 2011]</ref> also use an emergent-based reputation system. They provide a means to rate and review applications similar to the buyer-seller reputation systems provided with eBay. The mentality is that if an application is untrustworthy or of poor quality, the greater public opinion will merge and polarize to negative opinions - eventually leaving the application as a non-threat to potential buyers. For trustworthy applications, the result would be quite the opposite.

How can we improve on existing systems?

Existing systems provide an adequate level of accurate reputation information for their purpose. For closed and centralized systems such as the example provided, eBay, this level of sophistication is sufficient. Buyers are able to favour certain sellers over others based on feedback and ratings left by previous sellers. However, to make this decision easier, these sites convert the data into a more readable and comparable form, a numerical scale. This abstraction process, however, prevents one from truly understanding the reasons behind the values. Buyers and sellers are able to bid with a fair degree of certainty and trust; if one party is unsatisfied with the transaction, eBay will step in to provide order<ref name="ebayreputation" />. This level of justice is not easily attainable in large-distributed systems. Although we can assume we have an adequate level of justice, in order for a reputation system to be plausible in such a large system and for justice systems to work, we need to store sets of event-based histories that can be attributed to each entity that interacts in the system. In the case where reputation data fails to protect machines and the individuals behind them, we can fall back on justice systems and provide them with accurate information.

Our assumptions

<Here we can talk about how we originally wanted to have a section on PKI, but changed our minds because it was veering too far from our core problem of reputation>

Generating reputation

How do we represent reputation?

Reputation data can be stored in a variety of different forms and representations. We start with a summary of previous attempts in creating a solution for representing reputation. A frequently used form is one that utilizes a numerical scale for reputation. These are known as EigenTrust systems<ref name="krukow" />. In their essence, they store and aggregate data into a numerical form. These values are easy to compare and because primitive data types can be used, they require very little storage space. Despite these lucrative advantages, there are some significant negative aspects of such a system. Firstly, information is typically lost in the abstraction process. Concrete data is acquired and then converted down to a minimal form. Once this conversion is done, there is little one can do to understand the concrete data that it was generated from. In other words, this abstraction process is irreversible. Likewise, the process can result in ambiguity among data. For example, a reputation of 0 might be interpreted as having no reputation history or having an average reputation rating of 0. And, of course, as a result of the irreversibility of numerical data, we cannot return the data to its original concrete form to better understand the reasons behind the reputation.

Another interesting form of reputation is one that was proposed by Shmatikov and Talcott<ref name="krukow" />. They attributed reputation to encompass the history of entities as a set of time-stamped events. The key difference between EigenTrust and their solution is that we can store data in its concrete form. Additionally, if we modify their solution to allow for the notion of sessions, we can generate a clear view of related actions that correspond to an entity's computational session. This provides a querying entity or a justice system with crucial information to make their respective decision. Clearly, there are some ethical and privacy issues that arrise from this; we tackle this issue more closely in a following section.

How do we gather reputation?

Where do we store reputation?

How do we maintain reputation?

As stated earlier, we need to store an adequate level of information about interactions between entities. This "adequate" level can be quite large in terms of actual storage space. This brings us to the problem of how to maintain reputation history, since in a distributed system this is crucial to the scalability and success of the entire system. A solution here is to use the notion of Dynamic Model-Checking, by Havelund and Rosu<ref name="krukow" />. They came up with a way to re-evaluate stored reputation history and efficiently aggregate and combine eligible data. This can be thought of as a "reduce" function in the sense of Google's Map/Reduce algorithm<ref name="mapreduce">Dean J. et al. MapReduce: Simplified Data Processing on Large Clusters http://labs.google.com/papers/mapreduce.html [March 3, 2011]</ref>. We generate and store sets of events related to particular entities (this is an append function) and use a reduce function to minimize the storage space required. We realize, however, that some data will not be eligible to be "reduced". Significant negative reputation, for instance, such as DDoS attacks will likely need to be retained indefinitely incase justice systems need sufficient proof of a specific incident. This solution will work quite well as we maintain a sufficient amount of useful concrete information, yet still save space by merging and combining certain types of data. If we can assume that space will never be an issue or that processing time for searching through sets of reputation history items is negligible, then we would clearly not have to worry about implementing this type of "reduce" mechanism.

How is reputation disseminated?

~~updating it now one part at a time~~

The dissemination of reputation information is a core concern of reputation systems in general.

In general, there are three common modes of disseminating information of this type that would need to be supported in order to make a reputation system feasible: Hierarchy, Publish/Subscribe, and Peer-to-Peer.

In a hierarchy, there are pre-set, or elected nodes that are responsible for maintaining an authoritative list. A good example of this in practice is the domain name system (or DNS, for short).

Publish/subscribe is a model of dissemination of information that relies on central repositories, which are then queried by each client when an update is needed.

Finally Peer-to-peer is, perhaps, the newest method of disseminating information.

In application, all of these methods of information dissemination would likely need to be supported in some fashion.

How is reputation queried?

Querying reputation is the problem of how one entity in a reputation system acquires reputation data on another entity in the system that it does not already have. There will need to be an established way of requesting, receiving and finally analyzing the reputation data to decide if a connection should be made or not. This needs to be done because depending on the size of the system it's highly unlikely any given entity will know about another given entity if it has never communicated with it before. In a system like the internet it is unreasonable to expect the regular process of information dissemination to provide every entity information on every other entity. It is even more unreasonable to expect an entity in the system to be able to store all this information.

In the greater scheme of a reputation system, querying assumes some systems need to already exist. There needs to be a means of authenticating messages, as to limit the spread of false information and guarantee the integrity of the system. There needs to be a way of maintaining the history of the system, so that reputation events can be recorded and accessed. There needs to be a means of dissemination, as querying in this sense won't be suited for the gradual distribution of information. In short, for there to be querying of reputation, you need to have something worth querying.

But what does a system for querying need to address? It needs to be able to request information on demand, and receive that information quickly and efficiently. Specifically, the system needs to be able to handle any given entity in sending out a request for reputation information, and have other entities process that request and return a response. There needs to be a way for an entity to handle the likely event that there is no reputation information on another entity. Finally, the entity needs a way to process and interpret the information it receives.

As previously mentioned, in this paper, there are two primary layouts for a reputation system: hierarchical and distributed. Both of which will need to interact with each other. In a hierarchical-centralized system, there is a hierarchy of nodes who defer to each other. Any given node in the system will defer to an authority, known as its authority node. Most, if not all, reputation information will go through this node, and as far as their subordinate nodes are concerned, his 'views', or interpretation of the reputation data, will be absolute. In a distributed, peer to peer system, reputation information will be acquired from trusted peers and analyzed to determine whether to connect or not.

The actual process of querying should be fairly simple. A given entity or node in the system, needs to decide it should contact another node in the system. First, it must check it's local representation of reputation data, to see if it already has both enough, and up-to-date, information on a node. If it does, it can move on toward making a decision, discussed later. If however, the information needed is not already held by the node, it's going to need to query.

This brings us back to the two primary types of reputation systems, hierarchical and distributed. In a hierarchical system the process is incredibly simple: ask your superior node, and wait for a response. The superior node might have enough information on hand to decide, or it might ask it's peers or superiors. Either way whatever is sent back is what the original querying node will do.

-almost there now, at least in draft form.

Making decisions

How do we make decisions based on reputation?

Implementation

Can we achieve this through incremental updates?

<possible we can... we can use imposed rules or existing infrastructure if we don't have adequate emergent information. This way we can incrementally update the system and eventually we will have a full-fledged emergent reputation system. Hope this helps someone... I don't quite know enough to write about this.>

Conclusion

References

<references />

DELETE

Why PKI should be ommitted reputation must be trusted = we get this trust through interactions. We BELEIVE this trust because we assume we have attribution!