Talk:DistOS-2011W Public Goods

From Soma-notes

Tuesday March 1

Key components:

  • Distributed File System
    • Can use something previously presented in class
  • Distributed computation
  • Administration
    • How much does a person need to contribute to the system?
    • How will users submit (small) services they would like to have run?
    • How can very large services be established (from idea to implementation)?

Todo for March 8th:

    • Find papers on distributed computation and administration

Thursday March 3

  • Seek out papers on specific topic of distributed web cache.
  • Discussed two other interesting public goods or services.
    • Image registry
    • DNA registry - found one article that suggests uncompressed the human genome is between 1.5 and 30 terrabytes but more efficient formats exist that bring it down to 1.5GB
http://www.genetic-future.com/2008/06/how-much-data-is-human-genome-it.html 


One example paper Improving Web Server Performance by Caching Dynamic Data

Tuesday March 8

Currently our idea of public goods consists of:

  • Distributed file storage
    • Ceph seems to be an ideal candidate for this
  • Distributed Computation
    • To avoid a tragedy of the commons situation with both the storage and computation resources, only predetermined, agreed upon computation will take place which has a net benefit to everyone participating. This computation is based on the data stored at each node and the results (metadata) will be stored locally. This computation can be done by using idle cycles, much like BOINC projects.
    • Must allow for querying of metadata to allow users to effectively search processed data.
    • Maybe we need to also consider the movement of data between client machines to fully utilize available resources. For instance, let's assume that a given machine has a relatively low amount of available storage but a lot of free computation cycles. In this situation, once data has been processed it should be moved to a client machine with available storage and data from a machine with a relatively low a mount of free computation cycles should be moved to the original machine.
  • Administration
  • Discussion
    • Why is a public good a good idea?
      • What value

What should be a public good?

      • What brings value to everyone?
      • Examples: Distributed DNS, Spam Filtering, Policing?

Thursday March 10

I kind of did a big overhaul on our page, moved all of the discussion items from the main page here and all of the main work over to the main page. Our main direction has changed considerably after a few conversations between us and Prof. Somayaji. In general we are moving away from the "how to do things" (implementation) and to the "why is it important/better that certain services are in the public's hands". This new direction is now outlined on the main page. Below is an additional note that I couldn't find a place for on the main page.

We are now thinking in analogy. We are trying to identify public goods that are analogous to real world counter parts. In the real world examples of public goods are roads, parks, police, military, water, sewer.

Internet as a public good: Ostracism and the provision of a public good: experimental evidence

Discussion: Definition of Public Good

  • Economic Definitions:

In economics, a public good is a good that is non-rivalrous and non-excludable. Non-rivalry means that consumption of the good by one individual does not reduce availability of the good for consumption by others; and non-excludability that no one can be effectively excluded from using the good [1]

A good that cannot be charged for in relation to use (like the view of a park or survival of a species), so there is no incentive to produce or maintain the good [2]

A good that is provided for users collectively, use by one not precluding use of the same units of the good by others [3]

A good or service in which the benefit received by any one party does not diminish the availability of the benefits to others, and where access to the good cannot be restricted. (Source: Millennium Ecosystem Assessment Glossary ) [4]

In economics, a commodity typically provided by government that cannot, or would not, be separately parceled out to individuals, since no one can be excluded from its benefits. Public goods, such as national defense, clean air, and public safety, are neither divisible nor exclusive. ... http://[www.semp.us/publications/disaster_dictionary.php]

  • Overall Take:

A Good that inherently requires some form of centralized authority to manage and provide. Ideally "uncorruptable."

Thursday March 17

I talked to the professor before class and he encouraged committing thoughts to the wiki to help me organize my thoughts. So I am kind of adding this as a play by play.

In class we talked a fair bit about my topic ( infrastructure ) mainly because we want it to not be at odds with the rest of the paper. I had proposed a wireless mesh network and as a potential infrastructure change to help do away with ISPs. Right now ISPs hold a potentially significant amount of power over their customers they can and do shape our packets ( limiting the speed at which we do things ) they also have the potential to deny access to sites or disallow services with packet inspection. Additionally as we recently saw in Egypt they also present convenient choke points to slow or stop the flow of internet traffic. The mesh provides detection from this with it's distributed nature (and it sounds cool).

The rest of the group were concerned about the speed implications of a mesh style network particularly if it resided completely on wireless. I pondered that there are algorithms that can make large scale meshes reasonably efficient but it was pointed out in counterpoint that if one ignores fibre that the mesh would have to be exponentially slower than what we have now.

The professor pointed out that people have done research on large scale meshes and have gotten reasonable efficiency out of them. Additionally he suggested that a mesh in this context could be wired or not but imagine that it's possible, why is it a good thing and what other things might need to change to support it. The professor also pointed out that it wouldn't have to be a replacement for a faster service that could still come from ISPs.

Ahhhh. This sounds interesting and a little less controversial than a full replacement. Why a mesh overlay? Well it can provide a slower but extraordinarily robust layer of network communication. We can suggest that having working network irrespective of the actions of ISPs could be a great thing. Additionally higher speed service for luxuries such as video streaming could be payed for though ISPs. The meshes would would be organized in urban centres probably with publicly owned backbones between urban centres. The mesh could be self organizing into "neighbourhoods" potentially with with publicly providing infrastructure linking these "neighbourhoods" together.

Now we discussed trying to have common threads for our essay. Meshes are robust since there are many connections that need to be severed to disable a mesh. Caching can also add to the robustess, the professor suggested that the concept of a cache could be extended to include the concept of caching code as well as data allowing web apps to have survivability even when disconnected from the rest of internet. This really suggests that a highly reliable and robust mesh could keep neighbourhoods or even cities up and running even when disconnected from the rest of the internet. So one common thread could be reliability.

Internet caching also provides an increase in speed arguably the robust but slower mesh could free up ISPs so they could provide even greater speed.

Tuesday March 22 (Presentation)

Definitions

  • A commodity typically provided by government that cannot, or would not, be separately parceled out to individuals, since no one can be excluded from its benefits.
  • A good that is non-rivalrous and non-excludable. Non-rivalry means that consumption of the good by one individual does not reduce availability of the good for consumption by others; and non-excludability that no one can be effectively excluded from using the good
  • Resources that are held in common in the sense than no one exercises any property right with respect to these resources or the exclusive right to choose whether the resource is made available to others

Physical Infrastructure

Internet Infrastructure as a public good

For many the internet is important in their daily life as the roads we travel upon. The infrastructure of the internet is thus analogous to the roads we drive and as such should be a public good.

  • Problem

The internets' infrastructure is primarily in the control of private companies ( ISPs ). The ISPs have the ability to alter speeds ( packet shape ), give preferential treatment to services or website ( web neutrality ) and provide convenient choke points ( internet blackout in Egypt ) With the infrastructure of the internet as a public good it would not controlled by any private company or single person and these issues can be avoided

  • possible ways of doing this:
    • option 1
      • legislate the ISPs
    • option 2
      • a public infrastructure - mesh
      • still have ISPs for faster service
      • mesh infrastructure could connect individuals within urban centres
      • urban centres could connect to each other with publicly owned trunks
  • Benefits
    • Would guarantee a level of connectivity for everyone
    • Hard to disrupt with so many points of connection and urban centres could continue to communicate even if they become partitioned from the internet
    • Could provide a speed boost by offloading some types of traffic
  • Challenges
    • Routing via a mesh network with mobile nodes and temporary fixed location nodes

Web Caching

General Web Caching

  • temporary storage of web objects for later use
  • used to reduce wasted bandwidth and to improve end user experience by reducing latency
  • generally implemented by ISPs who have a financial interest in doing so
  • currently each cache is implemented separately and don't necessarily work together
  • transitioning web caches into a public good could balance end user experience and efficiency with financial factors

Web Caching as a Public Good

  • If ISP caches were replaced with publicly owned or heavily regulated data centers then they could be standardized
  • Standardization allows for these data centers to cooperate to further reduce wasted bandwidth
  • a real hierarchy of caches can be implemented using this, ranging from local to regional and then provincial or national web caches
  • caches on each level can work together in a distributed fashion


  • once everyday users have an incentive to participate, caches can be extended to a lower level
  • each user would store and process a small amount of a "neighbourhood cache"
  • this can either be done in a distributed manner or by using the local data center as an central intermediary
  • results in keeping data even closer to the end user

Extending the Definition of Web Caching

  • Once a standardized, regulated and reliable web caching infrastructure is in place, the idea of web caching can be extended
  • this extension can include web code as well as static data
  • this would allow popular websites to operate closer to the user and reduces server load
  • this also would remove the infrastructure burden from web start ups as popular sites would be distributed around the world
  • in times of natural disaster, this would allow for some parts of the internet to remain up locally

Benefits of Web Caching as a Public Good

  • more efficient use of resources (bandwidth)
  • lower latency/better end user experience
  • more robust/reliable

DNS (Naming, etc)

General

  • DNS (Domain Name System) is considered as the "switchboard" of the internet.
    • To make our internet work that much more user friendly, a user or application needs only supply a name, and the service returns the IP number and hostname.
  • Essential for the functionality and usability of the internet to have this service.

DNS as a Public Good

  • Internet does not function the way we are used to without DNS, thereby making it a candidate as an "essential service" or public good
  • Privacy concerns arise when for-profit corporations have control of this service
    • Issues may also abound if Central Authority (government) controlled or managed the service
    • Public options exist, but reliance falls on to a user community or other corporation (Google, OpenDNS)

Further Research

  • Alternative Systems, i.e. Cooperative Domain Name System (CoDoNS)
  • Usability scenario to provide basic service (i.e. disaster recovery)

Aspects of General Public Goods

  • robustness/reliability
  • basic guaranteed level of service
  • general speed
  • making user experience a priority over private interests


Aftermath Discussion with Professor

I talked to the Professor afterwards and he liked where we were going but reminded us he is most concerned that we define the problem well. I think we are heading towards that though. He also said he would be mentioning on Thursday that for both the paper and the presentation that about half the marks would be based on style/polish and that the reputation group had a good easy to follow presentation.



Personal Notes

Web Caching (Andrew)

Relevant Papers & Links

Background stuff

Other

Infrastructure (Lester)

133 US cities now have their own broadband networks

- Just saw this headline and thought it relevant - Fahim

DNS (Fahim)

Presentation Notes

Web Caching as a public good

Why would turning web caching into something controlled by the public be a good thing? How could this be done (possible high level implementation options)?

  • The ultimate benefit of web caching is realized by keeping popular data close to the user
  • what is closer to the end user than their ISP? their neighbours.
  • by distributing the cache among people, web requests can be satisfied even closer (and, as a result, even faster)
  • possible ways of doing this:
    • option 1
      • have each person dedicate a certain amount of disk space and cpu cycles to storing and maintaining the cache.
      • The ISPs can control the overall placement and tracking of cached data since it is in their interest.
      • as users log in and log off, data can be transferred accordingly
      • web requests will go to the ISP and the the ISP will determine where the data resides and will mediate a connection between the two users
    • option 2
      • since there are large incentives for ISPs to do this, they may want to invest in some specialized hardware to help implement this
      • this hardware would replace the end user's modem and would include a general purpose processor and some data storage.
      • with the ever decreasing hardware costs, a relatively powerful machine could be built (especially on a large scale) for relatively cheap
      • since an end user generally leaves their modem on more than their PC, this option would result in greater reliability and aggregated uptime.

local and global benefits

Local

  • data is even closer to the end user = lower latency
  • greater amount of total storage = bigger cache
  • if an optimal size is found and increasing the cache size does not improve performance (3) then the data can simply be replicated at a higher degree across the network
    • decreases impact of users logging off
    • also allows for neighbourhood specific, ultra fast caches

global

  • if every ISP implemented these neighbourhood caches, the total amount of wasted bandwidth going across the internet will drastically decrease
  • after these have been implemented, a cache hierarchy can be impodsed where neighbourhood caches can first talk with each other. If there is still a cache miss, then neighbourhoods of neighbourhoods can satisfy requests
  • results in a very large, diverse cache that can satisfy a variety of requests

Removed Slides

  1. Public Goods have a variety of definitions from varying disciplines of thought and study
   * Economic: 
         o "a  commodity typically provided by government that cannot, or would not, be separately parceled out to individuals, since no one can be excluded from its benefits"
   *  Socio-political:
         o  "a good or service in which the benefit received by any one party does not diminish the availability of the benefits to others, and where access to the good cannot be restricted"
   * Civil:
         o "A concept referring to the general welfare or common well-being of a community. This concept is central to policy makers and government leaders, and should ultimately guide their decision making"

   * The common theme of the definitions is the benefit and necessity that a good in question for all individuals involved
   * It is increasingly obvious how vital the internet is to all aspects of daily life
         o Work and play are both enhanced by the use of the distributed system that is a repository of information and communication device
   * The internet is a giant for commerce and should not be changed from such.
         o However, there are pieces of the machine that qualify as essential, as a public good
         o This is a combination of hardware and software services that have been created, maintained and enhanced by both public and private interests

Sunday April 10: Stuff removed from front page

Main Goals

Based on the discussion last class, what I think the focus of the project should be is what are the kind of things that fundamentally should be a "public good" as opposed to exactly how to implement them. Some of the ideas we have come up with can be used for various implementations but I think, in general, we can rely on previous work for most of (if not all) of the implementation details. If this assumed direction is correct, then I think we should aim to try and answer the following questions:

  • What are good candidates for public goods (ie. DNS, internet cache, physical connections, etc)? Why should these services be fundamentally controlled by the public? What are the flaws in the way they are currently used or why should they not be centrally controlled by a single entity? What incentives are there for a given user to participate (willingly or unwillingly)?
  • What would be the net benefit for the local community participating in these public goods?
  • What would be the net impact on the entire internet if all local communities created these public goods (more secure, less bandwidth wasted, etc.)
  • Could there be disadvantages if so how does the benefits offset these drawbacks?
  • What would the cost of the public goods be? What sort of tax would the organizers need to levy.
  • After identifying some candidates for public goods, try and determine what is the commonality between these services, in problem, in alternative? What are some things that are fundamentally different about these goods?

Note to prof: Please let us know if you have any comments on the overall direction we are taking the project.

Definition

A public good is:

  • A commodity typically provided by government that cannot, or would not, be separately parceled out to individuals, since no one can be excluded from its benefits.
  • A good that is non-rivalrous and non-excludable. Non-rivalry means that consumption of the good by one individual does not reduce availability of the good for consumption by others; and non-excludability that no one can be effectively excluded from using the good
  • A good that cannot be charged for in relation to use (like the view of a park or survival of a species), so there is no incentive to produce or maintain the good
  • A good that is provided for users collectively, use by one not precluding use of the same units of the good by others
  • A good or service in which the benefit received by any one party does not diminish the availability of the benefits to others, and where access to the good cannot be restricted.
  • Resources that are held in common in the sense than no one exercises any property right with respect to these resources or the exclusive right to choose whether the resource is made available to others

Potential Topics

  • What else occurs on the internet
    • physical infrastructure (phoneline, cable, satellite, etc)
    • DNS, BGP, ----
    • TCP/IP, UDP
    • HTTP, SMTP, POP, IMAP, FTP, SSH
    • email = SPAM, search, internet caching

Note to prof: This is still a working list, but if you notice anything that we should definitely try to cover that we haven't thought of, please let us know.

Current Work

  • Lester - Physical infrastructure
  • Andrew - Caching
  • Fahim - DNS