DistOS 2014W Lecture 6
Group Discussion on "The Early Web
Questions to discuss:
- How do you think the web would have been if not like the present way?
- What kind of infrastructure changes would you like to make?
Group 1
- Relatively satisfied with the present structure of the web some changes suggested are in the below areas:
- Make use of the greater potential of Protocols
- More communication and interaction capabilities.
- Implementation changes in the present payment method systems. Example usage of "Micro-computation" - a discussion we would get back to in future classes. Also, Cryptographic currencies.
- Augmented reality.
- More towards individual privacy.
Group 2
A large portion of the web serves content that is overwhelmingly concerned about presentation rather than structuring content. Tim Berner-Lees himself bemoaned the death of the semantic web.
- Information to be classified in detail
- Organize things on web. Ex: Yahoo indexers
- Suggestion for the need of Universal Decimal System an idea by Paul Otlet to be considered.
- In the end it comes to semantic web
- Information redundancy
- Information verification
Group 3
- What we want to keep
- Linking mechanisms
- Minimum permissions to publish
- What we don't like
- Relying on one source for document
- Privacy links for security
- Proposal
- Peer-peer to distributed mechanisms for documenting
- Reverse links with caching - distributed cache
- More availability for user - what happens when system fails?
- Key management to be considered - Is it good to have centralized or distributed mechanism?
Group 4
- An idea of web searching for us
- A suggestion of a different web if it would have been implemented by "AI" people
- AI programs searching for data - A notion already being implemented by Google slowly.
- Generate report forums
- HTML equivalent is inspired by the AI communication
- Higher semantics apart from just indexing the data
- Problem : "How to bridge the semantic gap?"
- Search for more data patterns
Group design exercise — The web that could be
- “The web that wasn't” mentioned the moans of librarians.
- A universal classification system is needed.
- The training overhead of classifiers (e.g., librarians) is high. See the master's that a librarian would need.
- More structured content, both classification, and organization
- Current indexing by crude brute-force searching for words, etc., rather than searching metadata
- Information doesn't have the same persistence, see bitrot and Vint Cerf's talk.
- Too concerned with presentation now.
- Tim Berner-Lees bemoaning the death of the semantic web.
- The problem of information duplication when information gets redistributed across the web. However, we do want redundancy.
- Too much developed by software developers
- Too reliant on Google for web structure
- See search-engine optimization
- Problem of authentication (of the information, not the presenter)
- Too dependent at times on the popularity of a site, almost in a sophistic manner.
- See Reddit
- How do you programmatically distinguish satire from fact
- The web's structure is also “shaped by inbound links but would be nice a bit more”
- Infrastructure doesn't need to change per se.
- The distributed architecture should still stay. Centralization of control of allowed information and access is terrible power. See China and the Middle-East.
- Information, for the most part, in itself, exists centrally (as per-page), though communities (to use a generic term) are distributed.
- Need more sophisticated natural language processing.
Class discussion
Focusing on vision, not the mechanism.
- Reverse linking
- Distributed content distribution (glorified cache)
- Both for privacy and redunancy reasons
- Suggested centralized content certification, but doesn't address the problem of root of trust and distributed consistency checking.
- Distributed key management is a holy grail
- What about detecting large-scale subversion attempts, like in China
- What is the new revenue model?
- What was TBL's revenue model (tongue-in-cheek, none)?
- Organisations like Google monetized the internet, and this mechanism could destroy their ability to do so.
- Search work is semi-distributed. Suggested letting the web do the work for you.
- Trying to structure content in a manner simultaneously palatable to both humans and machines.
- Using spare CPU time on servers for natural language processing (or other AI) of cached or locally available resources.
- Imagine a smushed Wolfram Alpha, Google, Wikipedia, and Watson, and then distributed over the net.
- The document was TBL's idea of the atom of content, whereas nowaday we really need something more granular.
- We want to extract higher-level semantics.
- Google may not be pure keyword search anymore. It is essentially AI now, but we still struggle with expressing what we want to Google.
- What about the adversarial aspect of content hosters, vying for attention?
- People do actively try to fool you.
- Compare to Google News, though that is very specific to that domain. Their vision is a semantic web, but they are incrementally building it.
- In a scary fashion, Google is one of the central points of failure of the web. Even scarier is less technically competent people who depend on Facebook for that.
- There is a semantic gap between how we express and query information, and how AI understands it.
- Can think of Facebook as a distributed human search infrastructure.
- A core service of an operating system is locating information. Search is infrastructure.
- The problem is not purely technical. There are political and social aspects.
- Searching for a file on a local filesystem should have a unambiguous answer.
- Asking the web is a different thing. “What is the best chocolate bar?”
- Is the web a network database, as understood in COMP 3005, which we consider harmful.
- For two-way links, there is the problem of restructuring data and all the dependencies.
- Privacy issues when tracing paths across the web.
- What about the problem of information revocation?
- Need more augmented reality and distributed and micro payment systems.
- We need distributed, mutually untrusting social networks.
- Now we have the problem of storage and computation, but also take away some of of the monetizationable aspect.
- Distribution is not free. It is very expensive in very funny ways.
- The dream of harvesting all the computational power of the internet is not new.
- Startups have come and gone many times over that problem.
- Google's indexers understands quite well many documents on the web. However, it only presents a primitive keyword-like interface. It doesn't expose the ontology.
- Organising information does not necessarily mean applying an ontology to it.
- The organisational methods we now use don't use ontologies, but rather are supplemented by them.