DistOS 2014W Lecture 6
Group Discussion on "The Early Web
Questions to discuss:
- How do you think the web would have been if not like the present way?
 - What kind of infrastructure changes would you like to make?
 
Group 1
- Relatively satisfied with the present structure of the web some changes suggested are in the below areas:
 
- Make use of the greater potential of Protocols
 - More communication and interaction capabilities.
 - Implementation changes in the present payment method systems. Example usage of "Micro-computation" - a discussion we would get back to in future classes. Also, Cryptographic currencies.
 - Augmented reality.
 - More towards individual privacy.
 
Group 2
A large portion of the web serves content that is overwhelmingly concerned about presentation rather than structuring content. Tim Berner-Lees himself bemoaned the death of the semantic web.
- Information to be classified in detail
- Organize things on web. Ex: Yahoo indexers
 - Suggestion for the need of Universal Decimal System an idea by Paul Otlet to be considered.
 - In the end it comes to semantic web
 
 - Information redundancy
 - Information verification
 
Group 3
- What we want to keep
- Linking mechanisms
 - Minimum permissions to publish
 
 - What we don't like
- Relying on one source for document
 - Privacy links for security
 
 - Proposal
- Peer-peer to distributed mechanisms for documenting
 - Reverse links with caching - distributed cache
 - More availability for user - what happens when system fails?
 - Key management to be considered - Is it good to have centralized or distributed mechanism?
 
 
Group 4
- An idea of web searching for us
 - A suggestion of a different web if it would have been implemented by "AI" people
- AI programs searching for data - A notion already being implemented by Google slowly.
 
 - Generate report forums
 - HTML equivalent is inspired by the AI communication
 - Higher semantics apart from just indexing the data
- Problem : "How to bridge the semantic gap?"
 - Search for more data patterns
 
 
Group design exercise — The web that could be
- “The web that wasn't” mentioned the moans of librarians.
 - A universal classification system is needed.
 - The training overhead of classifiers (e.g., librarians) is high. See the master's that a librarian would need.
 - More structured content, both classification, and organization
 - Current indexing by crude brute-force searching for words, etc., rather than searching metadata
 - Information doesn't have the same persistence, see bitrot and Vint Cerf's talk.
 - Too concerned with presentation now.
 - Tim Berner-Lees bemoaning the death of the semantic web.
 - The problem of information duplication when information gets redistributed across the web. However, we do want redundancy.
 - Too much developed by software developers
 - Too reliant on Google for web structure
- See search-engine optimization
 
 - Problem of authentication (of the information, not the presenter)
- Too dependent at times on the popularity of a site, almost in a sophistic manner.
 - See Reddit
 
 - How do you programmatically distinguish satire from fact
 - The web's structure is also “shaped by inbound links but would be nice a bit more”
 - Infrastructure doesn't need to change per se.
- The distributed architecture should still stay. Centralization of control of allowed information and access is terrible power. See China and the Middle-East.
 - Information, for the most part, in itself, exists centrally (as per-page), though communities (to use a generic term) are distributed.
 
 - Need more sophisticated natural language processing.
 
Class discussion
Focusing on vision, not the mechanism.
- Reverse linking
 - Distributed content distribution (glorified cache)
- Both for privacy and redunancy reasons
 - Suggested centralized content certification, but doesn't address the problem of root of trust and distributed consistency checking.
- Distributed key management is a holy grail
 - What about detecting large-scale subversion attempts, like in China
 
 
 - What is the new revenue model?
- What was TBL's revenue model (tongue-in-cheek, none)?
 - Organisations like Google monetized the internet, and this mechanism could destroy their ability to do so.
 
 - Search work is semi-distributed. Suggested letting the web do the work for you.
 - Trying to structure content in a manner simultaneously palatable to both humans and machines.
 - Using spare CPU time on servers for natural language processing (or other AI) of cached or locally available resources.
 - Imagine a smushed Wolfram Alpha, Google, Wikipedia, and Watson, and then distributed over the net.
 - The document was TBL's idea of the atom of content, whereas nowaday we really need something more granular.
 - We want to extract higher-level semantics.
 - Google may not be pure keyword search anymore. It is essentially AI now, but we still struggle with expressing what we want to Google.
 - What about the adversarial aspect of content hosters, vying for attention?
 - People do actively try to fool you.
 - Compare to Google News, though that is very specific to that domain. Their vision is a semantic web, but they are incrementally building it.
 - In a scary fashion, Google is one of the central points of failure of the web. Even scarier is less technically competent people who depend on Facebook for that.
 - There is a semantic gap between how we express and query information, and how AI understands it.
 - Can think of Facebook as a distributed human search infrastructure.
 - A core service of an operating system is locating information. Search is infrastructure.
 - The problem is not purely technical. There are political and social aspects.
- Searching for a file on a local filesystem should have a unambiguous answer.
 - Asking the web is a different thing. “What is the best chocolate bar?”
 
 - Is the web a network database, as understood in COMP 3005, which we consider harmful.
 - For two-way links, there is the problem of restructuring data and all the dependencies.
 - Privacy issues when tracing paths across the web.
 - What about the problem of information revocation?
 - Need more augmented reality and distributed and micro payment systems.
 - We need distributed, mutually untrusting social networks.
- Now we have the problem of storage and computation, but also take away some of of the monetizationable aspect.
 
 - Distribution is not free. It is very expensive in very funny ways.
 - The dream of harvesting all the computational power of the internet is not new.
- Startups have come and gone many times over that problem.
 
 - Google's indexers understands quite well many documents on the web. However, it only presents a primitive keyword-like interface. It doesn't expose the ontology.
 - Organising information does not necessarily mean applying an ontology to it.
 - The organisational methods we now use don't use ontologies, but rather are supplemented by them.