DistOS-2011W Public Goods: Difference between revisions

From Soma-notes
Aschoenr (talk | contribs)
Aschoenr (talk | contribs)
Line 81: Line 81:


===Potential Concerns and Disadvantages===
===Potential Concerns and Disadvantages===
===Relevant Papers & Links===
Background stuff
*[http://en.wikipedia.org/wiki/Web_cache Web Cache on Wikipedia]
*[http://en.wikipedia.org/wiki/Proxy_server Proxy Server on Wikipedia]
*[http://www.visolve.com/squid/whitepapers/ViSolve_Web_Caching.pdf Web Caching - A cost effective approach for organizations to address all types of bandwidth management challenges]
*[http://www.cisco.com/web/about/ac123/ac147/ac174/ac199/about_cisco_ipj_archive_article09186a00800c8903.html The Internet Protocol Journal - Volume 2, No. 3: Web Caching]
*[http://www.web-caching.com/welcome.html Web Caching Overview]
*[http://docforge.com/wiki/Web_application/Caching DocForge: Web application/Caching]
Other
*[http://www.acm.org/sigcomm/ccr/archive/1999/oct99/Jia_Wang2.pdf?searchterm=distributed+web+caching A Survey of Web Caching Schemes for the Internet]
*[http://conferences.sigcomm.org/imc/2005/papers/imc05efiles/karagiannis/karagiannis.pdf?searchterm=internet+caching Should Internet Service Providers Fear Peer-Assisted Content Distribution?]
*[http://www.sigmobile.org/mobihoc/2003/papers/p25-nuggehalli.pdf?searchterm=distributed+web+caching Energy-Efficient Caching Strategies in Ad Hoc Wireless Networks] Could tie in with the infrastructure stuff.
*[http://www.cs.utsa.edu/~sdykes/papers/hicss99.pdf Taxonomy and Design Analysis for Distributed Web Caching]
===Misc Notes===
* look into LAN caching. if we proposed a new infrastructure where neighborhoods are networked, distributed caching can be done here.
'''Web Caching as a public good'''
Why would turning web caching into something controlled by the public be a good thing? How could this be done (possible high level implementation options)?
*The ultimate benefit of web caching is realized by keeping popular data close to the user
*what is closer to the end user than their ISP? their neighbours.
*by distributing the cache among people, web requests can be satisfied even closer (and, as a result, even faster)
*possible ways of doing this:
**option 1
***have each person dedicate a certain amount of disk space and cpu cycles to storing and maintaining the cache.
***The ISPs can control the overall placement and tracking of cached data since it is in their interest.
***as users log in and log off, data can be transferred accordingly
***web requests will go to the ISP and the the ISP will determine where the data resides and will mediate a connection between the two users
**option 2
***since there are large incentives for ISPs to do this, they may want to invest in some specialized hardware to help implement this
***this hardware would replace the end user's modem and would include a general purpose processor and some data storage.
***with the ever decreasing hardware costs, a relatively powerful machine could be built (especially on a large scale) for relatively cheap
***since an end user generally leaves their modem on more than their PC, this option would result in greater reliability and aggregated uptime.
'''local and global benefits'''
Local
*data is even closer to the end user = lower latency
*greater amount of total storage = bigger cache
*if an optimal size is found and increasing the cache size does not improve performance (3) then the data can simply be replicated at a higher degree across the network
**decreases impact of users logging off
**also allows for neighbourhood specific, ultra fast caches
global
*if every ISP implemented these neighbourhood caches, the total amount of wasted bandwidth going across the internet will drastically decrease
*after these have been implemented, a cache hierarchy can be impodsed where neighbourhood caches can first talk with each other. If there is still a cache miss, then neighbourhoods of neighbourhoods can satisfy requests
*results in a very large, diverse cache that can satisfy a variety of requests


==DNS (Fahim)==
==DNS (Fahim)==

Revision as of 19:12, 21 March 2011

Members

  • Lester Mundt - lmundt at connect.carleton.ca
  • Fahim Rahman - frahman at connect.carleton.ca
  • Andrew Schoenrock - aschoenr at scs.carleton.ca

Main Goals

Based on the discussion last class, what I think the focus of the project should be is what are the kind of things that fundamentally should be a "public good" as opposed to exactly how to implement them. Some of the ideas we have come up with can be used for various implementations but I think, in general, we can rely on previous work for most of (if not all) of the implementation details. If this assumed direction is correct, then I think we should aim to try and answer the following questions:

  • What are good candidates for public goods (ie. DNS, internet cache, physical connections, etc)? Why should these services be fundamentally controlled by the public? What are the flaws in the way they are currently used or why should they not be centrally controlled by a single entity? What incentives are there for a given user to participate (willingly or unwillingly)?
  • What would be the net benefit for the local community participating in these public goods?
  • What would be the net impact on the entire internet if all local communities created these public goods (more secure, less bandwidth wasted, etc.)
  • Could there be disadvantages if so how does the benefits offset these drawbacks?
  • What would the cost of the public goods be? What sort of tax would the organizers need to levi.
  • After identifying some candidates for public goods, try and determine what is the commonality between these services, in problem, in alternative? What are some things that are fundamentally different about these goods?

Note to prof: Please let us know if you have any comments on the overall direction we are taking the project.

Definition

A public good is:

  • A commodity typically provided by government that cannot, or would not, be separately parceled out to individuals, since no one can be excluded from its benefits.
  • A good that is non-rivalrous and non-excludable. Non-rivalry means that consumption of the good by one individual does not reduce availability of the good for consumption by others; and non-excludability that no one can be effectively excluded from using the good
  • A good that cannot be charged for in relation to use (like the view of a park or survival of a species), so there is no incentive to produce or maintain the good
  • A good that is provided for users collectively, use by one not precluding use of the same units of the good by others
  • A good or service in which the benefit received by any one party does not diminish the availability of the benefits to others, and where access to the good cannot be restricted.

Potential Topics

  • What else occurs on the internet
    • physical infrastructure (phoneline, cable, satellite, etc)
    • DNS, BGP, ----
    • TCP/IP, UDP
    • HTTP, SMTP, POP, IMAP, FTP, SSH
    • email = SPAM, search, internet caching

Note to prof: This is still a working list, but if you notice anything that we should definitely try to cover that we haven't thought of, please let us know.

Current Work

  • Lester - Physical infrastructure
  • Andrew - Caching
  • Fahim - DNS

Candidates for Public Goods (Use this area to post your ongoing work)

Physical Infrastructure (Lester)

Web Caching (Andrew)

Introduction

In general, the idea behind web caching is the temporary storage of web objects that can be used later without having to retrieve the data from the original server again. When a new web request is made, the resulting data is stored in a cache after being delivered to the end user. If another user requests the same data, barring certain conditions, the cached data is returned to the user and the request is not passed on to the originating web server. There are many aspects of many websites that do not change very often (ie. logos, static text, pictures, other multimedia) and hence are good candidates for caching <ref name="visolve">Optimized Bandwidth + Secured Access = Accelerated Data Delivery, Web Caching - A cost effective approach for organizations to address all types of bandwidth management challenges. A ViSolve White Paper. March 2009. link</ref> Web caches can either exist on the end user's machine (in the browser, for instance) or can exist somewhere between the user the servers they wish to communicate with on what is known as a proxy server <ref name="webcaching.com"> Web Caching Overview. visited March 2011. link </ref>. Internet Service Providers have a key interest in web caching and in most cases implement their own caches <ref name="visolve"/><ref name="cisco">Geoff Huston. 2000. Web Caching. The Internet Protocol Journal Volume 2, No. 3. link</ref>. There are a variety of incentives for entities on the internet, including ISPs, to use web caches. In general, these advantages can be summarized as follows:

  • Reduced Bandwidth Usage

One of the main incentives for ISPs to use web caching is the reduction of outgoing web traffic which results in a reduction of overall bandwidth usage <ref name="visolve"/><ref name="webcaching.com"/><ref name="cisco"/><ref name="survey">Jia Wang. 1999. A survey of web caching schemes for the Internet. SIGCOMM Comput. Commun. Rev. 29, 5 (October 1999), 36-46. DOI=10.1145/505696.505701 link</ref><ref name="docforge"> Web application/Caching. visited March 2011. last modified September 2010. link</ref>. For a typical ISP, web based traffic can account for upwards of 70% of the total bandwidth used and, of this web based traffic, the level of similarity of requests can be as high as 50%<ref name="cisco"/>. It is also true that, for many ISPs, transmission costs dominate their overall operating costs and any reduction in requests that must be satisfied outside of the ISP are beneficial<ref name="cisco"/>.

  • Improved End User Experience

Another benefit of web caching is the apparent reduction in latency to the end user <ref name="visolve"/><ref name="webcaching.com"/><ref name="survey"/><ref name="docforge"/>. Instead of web requests traveling all the way to the desired web server, these requests are intercepted by a proxy server who can returned a cached version of the requested data. The fact that the total distance that the data had to travel is cut down significantly (as web caches are intended to be relatively close to the end user) the time deliver the content to the end user can be cut down significantly. It has been found that small performance improvements made by an ISP through the use of caching can result in a significantly better end user experience<ref name="docforge"/>.

  • Reduced Web Server Load

Web servers providing popular data also benefit from web caching. Popular websites translate into a high number of simultaneous connections and a high bandwidth usage by the providing web server <ref name="cisco"/><ref name="survey"/>. A web cache placed in front of a given web server can reduce the number of connections that need to be passed through by providing data it has stored. This can translate into reduced hardware and support costs<ref name="docforge"/>.

Additional advantages include the added robustness that a web cache adds to the internet, allowing users to access documents even if the supplying web server is down and allowing organizations to analyze internet usage patterns <ref name="survey"/>.

Web Caching Schemes

Since web caching has been identified as significant assent to the internet as a whole, it has received it's fair share of research. Many different approaches to web caching have been proposed, many of which utilized distributed or hierarchical elements. These approaches will not be looked into in depth here as they will be considered merely implementation details. A survey of web caching schemes <ref name="survey"/> identified the main architectures that a large scale web cache can have.

One of these is a hierarchical architecture. In such an architecture web caches are placed at different levels of a network, starting with the client's machine, followed by a local then regional and then finally a national level cache. In this type of system, web requests are first sent to the lowest level cache and passed along to higher levels until the request can be satisfied. Once it is satisfied, the data is travels back down the hierarchy leaving a copy at each of the lower levels. Hierarchical web caches benefit from their efficient use of bandwidth by allowing popular web sites to propagate towards the demand.

Another potential architecture is distributed web caching. In such a structure there is only one level of caches that cooperate with each other to satisfy web requests. To do this, each cache retains metadata about the content of all of the other caches it cooperates with and uses it to fulfill web requests it receives from clients. This web caching scheme allows for better load balancing as well as introduces fault tolerance that was not available to strictly hierarchical structures.

Finally, a third option for large scale web caches is a hybrid architecture. In such a system, a hierarchy of caches exists, however there are a number of caches on each level that cooperate with each other in a distributed fashion. This type of system can benefit from the combination of the different advantages that the hierarchical and distributed architectures provide.

Web Caching as a Public Good

Web caching is obviously of enormous importance to the efficient functioning of the internet, and therefore is vitally important to the end users. Web caching ultimately succeeds by keeping relevant data close to the end users. Typically these web caches are currently implemented by ISP, and they do so because it is in their financial interest and not because it is in the interest of their customers. Obviously their customer's satisfaction is important, but it is not their top priority. Transitioning ISP controlled web caches into a public good would allow for a balance between both the financial and end user experience aspects of web caching. This can be achieved by the government actually taking over the proxy servers that host the web caches or though strict regulations on exactly how web caching should be done. A benefit of this is that it allows for the standardization of web caching on all proxies. This doesn't mean that every web cache needs to be implemented in the exact same way, but it could allow for generic interfaces where web caches of all types could communicate with one another. This would then allow end users who are customers of one ISP to be able to be serviced by web caches that used to be available to customers of other ISPs.

Not only would standardizing web caches at the ISP level allow for these previously private, uncooperative proxies to act more like distributed web caches, it would also allow for a natural hierarchy to be built. This hierarchy would be based on geography, where the ISP level caches would now work together to service a relatively small region, which would then be followed by a level of web caches that would service a larger geographical region, followed by provincial/state level web caches and finally a national level. These of course would all be standardized to allow for regional or provincial caches to sever web requests for users in different regions or provinces. Having formalized and standardized web hierarchies would allow for a reduction in wasted bandwidth and an improved end user experience. This would also remove redundant data stored in caches that previously would not or could not communicate with each other, increasing both the overall storage capabilities as well as improving robustness by becoming more fault tolerant.

Once web caching becomes a public good, it would also be in the end user's best interest to participate if they could. This would essentially mean turning the lowest level of web caching (currently done on a user's machine) into a distributed web cache. This would allow for users to share their cache with each other and allow for the building of neighbourhood specific, ultra fast caches. This could be implemented by each user supplying a small amount of hard drive space as well as some computation cycles, similar to that of BOINC projects<ref name="boinc>Public Computing: Reconnecting People to Science. David P. Anderson. Conference on Shared Knowledge and the Web. Residencia de Estudiantes, Madrid, Spain, Nov. 17-19 2003. link</ref>. The end users machines can be simply used as passage storage devices, where the local, publicly owned or ISP controlled proxy server decided what data existed where and could point users to other users to satisfy web requests. On the other hand, the users' machines could be active participants in the caching, receiving their user's requests and actually deciding what other users to contact to try and retrieve the data. In such a situation, any privacy concerns could be mediated by the local proxy server.

Another option to allow for lower level distributed caching would be to extend the capabilities of the currently used cable or DSL modems. These new modems would have a relatively small amount of storage and computing power. This would remove the burden from the users' computers and allow a special purpose device to take over. Since the majority of users would not reset their modems as often as they shut down their computers, this would allow for greater reliability than the previously described solution. As in the previous example, these devices could either participate as active or passive players in the overall web caching scheme, a detail that does not need to be decided upon before hand and can actually vary from neighbourhood to neighbourhood or even house to house depending on the circumstances. Although this would entail an additional investment on the part of the user, which ever decreasing hardware costs a relatively powerful machine could be built, especially on a large scale, relatively inexpensively.

Local and Global Benefits of Having Web Caching as a Public Good

Potential Concerns and Disadvantages

DNS (Fahim)

Introduction

DNS (Domain Name System) is considered by many as the "switchboard" of the internet. To make our internet work that much more user friendly, a user or application needs only supply a name, and the service returns the IP number and hostname. It is essential for the functionality and usability of the internet to have this service.

Given its necessity, the system is a good candidate to be considered a public good. The current provider falls under the responsibility of an user's Internet Service Provider (ISP). A user's ISP maintains the database of names to IP addresses for their users to use.

Implementation Overview

For the sake of simplicity, it will be assumed that the service provided works like a giant dynamic database where a the request of a URL resolves to the returned value of an IP address.

For a standard user, an ISP takes care of the DNS service. It is understood by the user that all internet requests can be filtered or redirected as the ISP sees fit. For example, two of Canada's biggest providers, Bell Canada and Rogers Communications, offer advertising-based redirects when and if a user seeks a non-existant URL. This can be seen as helpful (in the event of typos) or a hinderance (suggestions based on advertising).

More knowledgable users can configure a setup where their DNS requests are processed via any number of alternative options such as Google's public DNS project, or OpenDNS. This can be a healthy approach to avoid the ISP issues, but still imparts significant trust on another corporation or "good samaritans" in a public community.

Issues for further Research and Development

  • With free, public DNS, where is this information about user behaviour going, if anywhere? Is this an example of a good that should be managed by a central/public/democratized authority?

References

<references/>