Difference between revisions of "DistOS-2011W Public Goods"

From Soma-notes
Jump to navigation Jump to search
Line 60: Line 60:


===Web Caching as a Public Good===
===Web Caching as a Public Good===
Web caching is obviously of enormous importance to the efficient functioning of the internet, and therefore is vitally important to the end users. Web caching ultimately succeeds by keeping relevant data close to the end users. Typically these web caches are currently implemented by ISP, and they do so because it is in their financial interest and not because it is in the interest of their customers. Obviously their customer's satisfaction is important, but it is not their top priority. Transitioning ISP controlled web caches into a public good would allow for a balance between both the financial and end user experience aspects of web caching. This can be achieved by the government actually taking over the proxy servers that host the web caches or though strict regulations on exactly how web caching should be done. A benefit of this is that it allows for the standardization of web caching on all proxies. This doesn't mean that every web cache needs to be implemented in the exact same way, but it could allow for generic interfaces where web caches of all types could communicate with one another. This would then allow end users who are customers of one ISP to be able to be serviced by web caches that used to be available to customers of other ISPs.


===Local and Global Benefits of Having Web Caching as a Public Good===
===Local and Global Benefits of Having Web Caching as a Public Good===

Revision as of 14:11, 21 March 2011

Members

  • Lester Mundt - lmundt at connect.carleton.ca
  • Fahim Rahman - frahman at connect.carleton.ca
  • Andrew Schoenrock - aschoenr at scs.carleton.ca

Main Goals

Based on the discussion last class, what I think the focus of the project should be is what are the kind of things that fundamentally should be a "public good" as opposed to exactly how to implement them. Some of the ideas we have come up with can be used for various implementations but I think, in general, we can rely on previous work for most of (if not all) of the implementation details. If this assumed direction is correct, then I think we should aim to try and answer the following questions:

  • What are good candidates for public goods (ie. DNS, internet cache, physical connections, etc)? Why should these services be fundamentally controlled by the public? What are the flaws in the way they are currently used or why should they not be centrally controlled by a single entity? What incentives are there for a given user to participate (willingly or unwillingly)?
  • What would be the net benefit for the local community participating in these public goods?
  • What would be the net impact on the entire internet if all local communities created these public goods (more secure, less bandwidth wasted, etc.)
  • Could there be disadvantages if so how does the benefits offset these drawbacks?
  • What would the cost of the public goods be? What sort of tax would the organizers need to levi.
  • After identifying some candidates for public goods, try and determine what is the commonality between these services, in problem, in alternative? What are some things that are fundamentally different about these goods?

Note to prof: Please let us know if you have any comments on the overall direction we are taking the project.

Potential Topics

  • What else occurs on the internet
    • physical infrastructure (phoneline, cable, satellite, etc)
    • DNS, BGP, ----
    • TCP/IP, UDP
    • HTTP, SMTP, POP, IMAP, FTP, SSH
    • email = SPAM, search, internet caching

Note to prof: This is still a working list, but if you notice anything that we should definitely try to cover that we haven't thought of, please let us know.

Current Work

  • Lester - Physical infrastructure
  • Andrew - Caching
  • Fahim - DNS

Candidates for Public Goods (Use this area to post your ongoing work)

Physical Infrastructure (Lester)

Web Caching (Andrew)

Introduction

In general, the idea behind web caching is the temporary storage of web objects that can be used later without having to retrieve the data from the original server again. When a new web request is made, the resulting data is stored in a cache after being delivered to the end user. If another user requests the same data, barring certain conditions, the cached data is returned to the user and the request is not passed on to the originating web server. There are many aspects of many websites that do not change very often (ie. logos, static text, pictures, other multimedia) and hence are good candidates for caching <ref name="visolve">Optimized Bandwidth + Secured Access = Accelerated Data Delivery, Web Caching - A cost effective approach for organizations to address all types of bandwidth management challenges. A ViSolve White Paper. March 2009. link</ref> Web caches can either exist on the end user's machine (in the browser, for instance) or can exist somewhere between the user the servers they wish to communicate with on what is known as a proxy server <ref name="webcaching.com"> Web Caching Overview. visited March 2011. link </ref>. Internet Service Providers have a key interest in web caching and in most cases implement their own caches <ref name="visolve"/><ref name="cisco">Geoff Huston. 2000. Web Caching. The Internet Protocol Journal Volume 2, No. 3. link</ref>. There are a variety of incentives for entities on the internet, including ISPs, to use web caches. In general, these advantages can be summarized as follows:

  • Reduced Bandwidth Usage

One of the main incentives for ISPs to use web caching is the reduction of outgoing web traffic which results in a reduction of overall bandwidth usage <ref name="visolve"/><ref name="webcaching.com"/><ref name="cisco"/><ref name="survey">Jia Wang. 1999. A survey of web caching schemes for the Internet. SIGCOMM Comput. Commun. Rev. 29, 5 (October 1999), 36-46. DOI=10.1145/505696.505701 link</ref><ref name="docforge"> Web application/Caching. visited March 2011. last modified September 2010. link</ref>. For a typical ISP, web based traffic can account for upwards of 70% of the total bandwidth used and, of this web based traffic, the level of similarity of requests can be as high as 50%<ref name="cisco"/>. It is also true that, for many ISPs, transmission costs dominate their overall operating costs and any reduction in requests that must be satisfied outside of the ISP are beneficial<ref name="cisco"/>.

  • Improved End User Experience

Another benefit of web caching is the apparent reduction in latency to the end user <ref name="visolve"/><ref name="webcaching.com"/><ref name="survey"/><ref name="docforge"/>. Instead of web requests traveling all the way to the desired web server, these requests are intercepted by a proxy server who can returned a cached version of the requested data. The fact that the total distance that the data had to travel is cut down significantly (as web caches are intended to be relatively close to the end user) the time deliver the content to the end user can be cut down significantly. It has been found that small performance improvements made by an ISP through the use of caching can result in a significantly better end user experience<ref name="docforge"/>.

  • Reduced Web Server Load

Web servers providing popular data also benefit from web caching. Popular websites translate into a high number of simultaneous connections and a high bandwidth usage by the providing web server <ref name="cisco"/><ref name="survey"/>. A web cache placed in front of a given web server can reduce the number of connections that need to be passed through by providing data it has stored. This can translate into reduced hardware and support costs<ref name="docforge"/>.

Additional advantages include the added robustness that a web cache adds to the internet, allowing users to access documents even if the supplying web server is down and allowing organizations to analyze internet usage patterns <ref name="survey"/>.

Web Caching Schemes

Since web caching has been identified as significant assent to the internet as a whole, it has received it's fair share of research. Many different approaches to web caching have been proposed, many of which utilized distributed or hierarchical elements. These approaches will not be looked into in depth here as they will be considered merely implementation details. A survey of web caching schemes <ref name="survey"/> identified the main architectures that a large scale web cache can have.

One of these is a hierarchical architecture. In such an architecture web caches are placed at different levels of a network, starting with the client's machine, followed by a local then regional and then finally a national level cache. In this type of system, web requests are first sent to the lowest level cache and passed along to higher levels until the request can be satisfied. Once it is satisfied, the data is travels back down the hierarchy leaving a copy at each of the lower levels. Hierarchical web caches benefit from their efficient use of bandwidth by allowing popular web sites to propagate towards the demand.

Another potential architecture is distributed web caching. In such a structure there is only one level of caches that cooperate with each other to satisfy web requests. To do this, each cache retains metadata about the content of all of the other caches it cooperates with and uses it to fulfill web requests it receives from clients. This web caching scheme allows for better load balancing as well as introduces fault tolerance that was not available to strictly hierarchical structures.

Finally, a third option for large scale web caches is a hybrid architecture. In such a system, a hierarchy of caches exists, however there are a number of caches on each level that cooperate with each other in a distributed fashion. This type of system can benefit from the combination of the different advantages that the hierarchical and distributed architectures provide.

Web Caching as a Public Good

Web caching is obviously of enormous importance to the efficient functioning of the internet, and therefore is vitally important to the end users. Web caching ultimately succeeds by keeping relevant data close to the end users. Typically these web caches are currently implemented by ISP, and they do so because it is in their financial interest and not because it is in the interest of their customers. Obviously their customer's satisfaction is important, but it is not their top priority. Transitioning ISP controlled web caches into a public good would allow for a balance between both the financial and end user experience aspects of web caching. This can be achieved by the government actually taking over the proxy servers that host the web caches or though strict regulations on exactly how web caching should be done. A benefit of this is that it allows for the standardization of web caching on all proxies. This doesn't mean that every web cache needs to be implemented in the exact same way, but it could allow for generic interfaces where web caches of all types could communicate with one another. This would then allow end users who are customers of one ISP to be able to be serviced by web caches that used to be available to customers of other ISPs.

Local and Global Benefits of Having Web Caching as a Public Good

Potential Concerns and Disadvantages

Relevant Papers & Links

Background stuff

Other

Misc Notes

  • look into LAN caching. if we proposed a new infrastructure where neighborhoods are networked, distributed caching can be done here.

Web Caching as a public good

Why would turning web caching into something controlled by the public be a good thing? How could this be done (possible high level implementation options)?

  • The ultimate benefit of web caching is realized by keeping popular data close to the user
  • what is closer to the end user than their ISP? their neighbours.
  • by distributing the cache among people, web requests can be satisfied even closer (and, as a result, even faster)
  • possible ways of doing this:
    • option 1
      • have each person dedicate a certain amount of disk space and cpu cycles to storing and maintaining the cache.
      • The ISPs can control the overall placement and tracking of cached data since it is in their interest.
      • as users log in and log off, data can be transferred accordingly
      • web requests will go to the ISP and the the ISP will determine where the data resides and will mediate a connection between the two users
    • option 2
      • since there are large incentives for ISPs to do this, they may want to invest in some specialized hardware to help implement this
      • this hardware would replace the end user's modem and would include a general purpose processor and some data storage.
      • with the ever decreasing hardware costs, a relatively powerful machine could be built (especially on a large scale) for relatively cheap
      • since an end user generally leaves their modem on more than their PC, this option would result in greater reliability and aggregated uptime.

local and global benefits

Local

  • data is even closer to the end user = lower latency
  • greater amount of total storage = bigger cache
  • if an optimal size is found and increasing the cache size does not improve performance (3) then the data can simply be replicated at a higher degree across the network
    • decreases impact of users logging off
    • also allows for neighbourhood specific, ultra fast caches

global

  • if every ISP implemented these neighbourhood caches, the total amount of wasted bandwidth going across the internet will drastically decrease
  • after these have been implemented, a cache hierarchy can be impodsed where neighbourhood caches can first talk with each other. If there is still a cache miss, then neighbourhoods of neighbourhoods can satisfy requests
  • results in a very large, diverse cache that can satisfy a variety of requests

DNS (Fahim)

  • With free, public DNS, where is this information about user behaviour going, if anywhere? Is this an example of a good that should be managed by a central/public/democratized authority?

References

<references/>