Difference between revisions of "DistOS-2011W Public Goods"

From Soma-notes
Jump to navigation Jump to search
Line 40: Line 40:
==Web Caching (Andrew)==
==Web Caching (Andrew)==
===Introduction===
===Introduction===
In general, the idea behind web caching is the temporary storage of web objects that can be used later without having to retrieve the data from the original server again. When a new web request is made, the resulting data is stored in a cache after being delivered to the end user. If another user requests the same data, barring certain conditions, the cached data is returned to the user and the request is not passed on to the originating web server. There are many aspects of many websites that do not change very often (ie. logos, static text, pictures, other multimedia) and hence are good candidates for caching <ref name="visolve">Optimized Bandwidth + Secured Access = Accelerated Data Delivery, Web Caching - A cost effective approach for organizations to address all types of bandwidth management challenges. A ViSolve White Paper, March 2009 http://www.visolve.com/squid/whitepapers/ViSolve_Web_Caching.pdf</ref> Web caches can either exist on the end user's machine (in the browser, for instance) or can exist somewhere between the user the servers they wish to communicate with on what is known as a proxy server <ref name="webcaching.com"> http://www.web-caching.com/welcome.html, visited March 2011</ref>. Internet Service Providers have a key interest in web caching and in most cases implement their own caches (3)<ref name="visolve"/>. There are a variety of incentives for entities on the internet, including ISPs, to use web caches. In general, these advantages can be summarized as follows:
In general, the idea behind web caching is the temporary storage of web objects that can be used later without having to retrieve the data from the original server again. When a new web request is made, the resulting data is stored in a cache after being delivered to the end user. If another user requests the same data, barring certain conditions, the cached data is returned to the user and the request is not passed on to the originating web server. There are many aspects of many websites that do not change very often (ie. logos, static text, pictures, other multimedia) and hence are good candidates for caching <ref name="visolve">Optimized Bandwidth + Secured Access = Accelerated Data Delivery, Web Caching - A cost effective approach for organizations to address all types of bandwidth management challenges. A ViSolve White Paper, March 2009 http://www.visolve.com/squid/whitepapers/ViSolve_Web_Caching.pdf</ref> Web caches can either exist on the end user's machine (in the browser, for instance) or can exist somewhere between the user the servers they wish to communicate with on what is known as a proxy server <ref name="webcaching.com"> http://www.web-caching.com/welcome.html, visited March 2011</ref>. Internet Service Providers have a key interest in web caching and in most cases implement their own caches <ref name="visolve"/><ref name="cisco">Geoff Huston. 2000. Web Caching. The Internet Protocol Journal Volume 2, No. 3. http://www.cisco.com/web/about/ac123/ac147/ac174/ac199/about_cisco_ipj_archive_article09186a00800c8903.html</ref>. There are a variety of incentives for entities on the internet, including ISPs, to use web caches. In general, these advantages can be summarized as follows:
*'''Reduced Bandwidth Usage'''
*'''Reduced Bandwidth Usage'''
One of the main incentives for ISPs to use web caching is the reduction of outgoing web traffic which results in a reduction of overall bandwidth usage (3,4)<ref name="visolve"/><ref name="webcaching.com"/><ref name="survey">Jia Wang. 1999. A survey of web caching schemes for the Internet. SIGCOMM Comput. Commun. Rev. 29, 5 (October 1999), 36-46. DOI=10.1145/505696.505701 http://doi.acm.org/10.1145/505696.505701</ref>. For a typical ISP, web based traffic can account for upwards of 70% of the total bandwidth used and, of this web based traffic, the level of similarity of requests can be as high as 50% (3). It is also true that, for many ISPs, transmission costs dominate their overall operating costs and any reduction in requests that must be satisfied outside of the ISP are beneficial (3).
One of the main incentives for ISPs to use web caching is the reduction of outgoing web traffic which results in a reduction of overall bandwidth usage (3,4)<ref name="visolve"/><ref name="webcaching.com"/><ref name="survey">Jia Wang. 1999. A survey of web caching schemes for the Internet. SIGCOMM Comput. Commun. Rev. 29, 5 (October 1999), 36-46. DOI=10.1145/505696.505701 http://doi.acm.org/10.1145/505696.505701</ref>. For a typical ISP, web based traffic can account for upwards of 70% of the total bandwidth used and, of this web based traffic, the level of similarity of requests can be as high as 50% (3). It is also true that, for many ISPs, transmission costs dominate their overall operating costs and any reduction in requests that must be satisfied outside of the ISP are beneficial<ref name="cisco"/>.
*'''Improved End User Experience'''
*'''Improved End User Experience'''
Another benefit of web caching is the apparent reduction in latency to the end user (4)<ref name="visolve"/><ref name="webcaching.com"/><ref name="survey"/>. Instead of web requests traveling all the way to the desired web server, these requests are intercepted by a proxy server who can returned a cached version of the requested data. The fact that the total distance that the data had to travel is cut down significantly (as web caches are intended to be relatively close to the end user) the time deliver the content to the end user can be cut down significantly. It has been found that small performance improvements made by an ISP through the use of caching can result in a significantly better end user experience (4).
Another benefit of web caching is the apparent reduction in latency to the end user (4)<ref name="visolve"/><ref name="webcaching.com"/><ref name="survey"/>. Instead of web requests traveling all the way to the desired web server, these requests are intercepted by a proxy server who can returned a cached version of the requested data. The fact that the total distance that the data had to travel is cut down significantly (as web caches are intended to be relatively close to the end user) the time deliver the content to the end user can be cut down significantly. It has been found that small performance improvements made by an ISP through the use of caching can result in a significantly better end user experience (4).
*'''Reduced Web Server Load'''
*'''Reduced Web Server Load'''
Web servers providing popular data also benefit from web caching. Popular websites translate into a high number of simultaneous connections and a high bandwidth usage by the providing web server (3,5). A web cache placed in front of a given web server can reduce the number of connections that need to be passed through by providing data it has stored. This can translate into reduced hardware and support costs (4).
Web servers providing popular data also benefit from web caching. Popular websites translate into a high number of simultaneous connections and a high bandwidth usage by the providing web server (5)<ref name="cisco"/>. A web cache placed in front of a given web server can reduce the number of connections that need to be passed through by providing data it has stored. This can translate into reduced hardware and support costs (4).


Additional advantages include the added robustness that a web cache adds to the internet, allowing users to access documents even if the supplying web server is down and allowing organizations to anaylize internet usage patterns (5).
Additional advantages include the added robustness that a web cache adds to the internet, allowing users to access documents even if the supplying web server is down and allowing organizations to anaylize internet usage patterns (5).

Revision as of 11:19, 21 March 2011

Members

  • Lester Mundt - lmundt at connect.carleton.ca
  • Fahim Rahman - frahman at connect.carleton.ca
  • Andrew Schoenrock - aschoenr at scs.carleton.ca

Main Goals

Based on the discussion last class, what I think the focus of the project should be is what are the kind of things that fundamentally should be a "public good" as opposed to exactly how to implement them. Some of the ideas we have come up with can be used for various implementations but I think, in general, we can rely on previous work for most of (if not all) of the implementation details. If this assumed direction is correct, then I think we should aim to try and answer the following questions:

  • What are good candidates for public goods (ie. DNS, internet cache, physical connections, etc)? Why should these services be fundamentally controlled by the public? What are the flaws in the way they are currently used or why should they not be centrally controlled by a single entity? What incentives are there for a given user to participate (willingly or unwillingly)?
  • What would be the net benefit for the local community participating in these public goods?
  • What would be the net impact on the entire internet if all local communities created these public goods (more secure, less bandwidth wasted, etc.)
  • Could there be disadvantages if so how does the benefits offset these drawbacks?
  • What would the cost of the public goods be? What sort of tax would the organizers need to levi.
  • After identifying some candidates for public goods, try and determine what is the commonality between these services, in problem, in alternative? What are some things that are fundamentally different about these goods?

Note to prof: Please let us know if you have any comments on the overall direction we are taking the project.

Potential Topics

  • What else occurs on the internet
    • physical infrastructure (phoneline, cable, satellite, etc)
    • DNS, BGP, ----
    • TCP/IP, UDP
    • HTTP, SMTP, POP, IMAP, FTP, SSH
    • email = SPAM, search, internet caching

Note to prof: This is still a working list, but if you notice anything that we should definitely try to cover that we haven't thought of, please let us know.

Current Work

  • Lester - Physical infrastructure
  • Andrew - Caching
  • Fahim - DNS

Candidates for Public Goods (Use this area to post your ongoing work)

Physical Infrastructure (Lester)

Web Caching (Andrew)

Introduction

In general, the idea behind web caching is the temporary storage of web objects that can be used later without having to retrieve the data from the original server again. When a new web request is made, the resulting data is stored in a cache after being delivered to the end user. If another user requests the same data, barring certain conditions, the cached data is returned to the user and the request is not passed on to the originating web server. There are many aspects of many websites that do not change very often (ie. logos, static text, pictures, other multimedia) and hence are good candidates for caching <ref name="visolve">Optimized Bandwidth + Secured Access = Accelerated Data Delivery, Web Caching - A cost effective approach for organizations to address all types of bandwidth management challenges. A ViSolve White Paper, March 2009 http://www.visolve.com/squid/whitepapers/ViSolve_Web_Caching.pdf</ref> Web caches can either exist on the end user's machine (in the browser, for instance) or can exist somewhere between the user the servers they wish to communicate with on what is known as a proxy server <ref name="webcaching.com"> http://www.web-caching.com/welcome.html, visited March 2011</ref>. Internet Service Providers have a key interest in web caching and in most cases implement their own caches <ref name="visolve"/><ref name="cisco">Geoff Huston. 2000. Web Caching. The Internet Protocol Journal Volume 2, No. 3. http://www.cisco.com/web/about/ac123/ac147/ac174/ac199/about_cisco_ipj_archive_article09186a00800c8903.html</ref>. There are a variety of incentives for entities on the internet, including ISPs, to use web caches. In general, these advantages can be summarized as follows:

  • Reduced Bandwidth Usage

One of the main incentives for ISPs to use web caching is the reduction of outgoing web traffic which results in a reduction of overall bandwidth usage (3,4)<ref name="visolve"/><ref name="webcaching.com"/><ref name="survey">Jia Wang. 1999. A survey of web caching schemes for the Internet. SIGCOMM Comput. Commun. Rev. 29, 5 (October 1999), 36-46. DOI=10.1145/505696.505701 http://doi.acm.org/10.1145/505696.505701</ref>. For a typical ISP, web based traffic can account for upwards of 70% of the total bandwidth used and, of this web based traffic, the level of similarity of requests can be as high as 50% (3). It is also true that, for many ISPs, transmission costs dominate their overall operating costs and any reduction in requests that must be satisfied outside of the ISP are beneficial<ref name="cisco"/>.

  • Improved End User Experience

Another benefit of web caching is the apparent reduction in latency to the end user (4)<ref name="visolve"/><ref name="webcaching.com"/><ref name="survey"/>. Instead of web requests traveling all the way to the desired web server, these requests are intercepted by a proxy server who can returned a cached version of the requested data. The fact that the total distance that the data had to travel is cut down significantly (as web caches are intended to be relatively close to the end user) the time deliver the content to the end user can be cut down significantly. It has been found that small performance improvements made by an ISP through the use of caching can result in a significantly better end user experience (4).

  • Reduced Web Server Load

Web servers providing popular data also benefit from web caching. Popular websites translate into a high number of simultaneous connections and a high bandwidth usage by the providing web server (5)<ref name="cisco"/>. A web cache placed in front of a given web server can reduce the number of connections that need to be passed through by providing data it has stored. This can translate into reduced hardware and support costs (4).

Additional advantages include the added robustness that a web cache adds to the internet, allowing users to access documents even if the supplying web server is down and allowing organizations to anaylize internet usage patterns (5).

Web Caching as a Public Good

Local and Global Benefits of Having Web Caching as a Public Good

Potential Concerns and Disadvantages

Relevant Papers & Links

Background stuff

Other

Misc Notes

  • look into LAN caching. if we proposed a new infrastructure where neighborhoods are networked, distributed caching can be done here.

Web Caching as a public good

Why would turning web caching into something controlled by the public be a good thing? How could this be done (possible high level implementation options)?

  • The ultimate benefit of web caching is realized by keeping popular data close to the user
  • what is closer to the end user than their ISP? their neighbours.
  • by distributing the cache among people, web requests can be satisfied even closer (and, as a result, even faster)
  • possible ways of doing this:
    • option 1
      • have each person dedicate a certain amount of disk space and cpu cycles to storing and maintaining the cache.
      • The ISPs can control the overall placement and tracking of cached data since it is in their interest.
      • as users log in and log off, data can be transferred accordingly
      • web requests will go to the ISP and the the ISP will determine where the data resides and will mediate a connection between the two users
    • option 2
      • since there are large incentives for ISPs to do this, they may want to invest in some specialized hardware to help implement this
      • this hardware would replace the end user's modem and would include a general purpose processor and some data storage.
      • with the ever decreasing hardware costs, a relatively powerful machine could be built (especially on a large scale) for relatively cheap
      • since an end user generally leaves their modem on more than their PC, this option would result in greater reliability and aggregated uptime.

local and global benefits

Local

  • data is even closer to the end user = lower latency
  • greater amount of total storage = bigger cache
  • if an optimal size is found and increasing the cache size does not improve performance (3) then the data can simply be replicated at a higher degree across the network
    • decreases impact of users logging off
    • also allows for neighbourhood specific, ultra fast caches

global

  • if every ISP implemented these neighbourhood caches, the total amount of wasted bandwidth going across the internet will drastically decrease
  • after these have been implemented, a cache hierarchy can be impodsed where neighbourhood caches can first talk with each other. If there is still a cache miss, then neighbourhoods of neighbourhoods can satisfy requests
  • results in a very large, diverse cache that can satisfy a variety of requests

DNS (Fahim)

  • With free, public DNS, where is this information about user behaviour going, if anywhere? Is this an example of a good that should be managed by a central/public/democratized authority?

References

<references/>