DistOS 2014W Lecture 6: Difference between revisions
Convert previous content to more idiomatic markup (see pun on semantic web) |
Appending raw dump of my notes (to edit later) |
||
Line 48: | Line 48: | ||
** Problem : "How to bridge the semantic gap?" | ** Problem : "How to bridge the semantic gap?" | ||
** Search for more data patterns | ** Search for more data patterns | ||
= Group design exercise — The web that could be = | |||
* “The web that wasn't” mentioned the moans of librarians. | |||
* A universal classification system is needed. | |||
* The training overhead of classifiers (e.g., librarians) is high. See the master's that a librarian would need. | |||
* More structured content, both classification, and organization | |||
* Current indexing by crude brute-force searching for words, etc., rather than searching metadata | |||
* Information doesn't have the same persistence, see bitrot and Vint Cerf's talk. | |||
* Too concerned with presentation now. | |||
* Tim Berner-Lees bemoaning the death of the semantic web. | |||
* The problem of information duplication when information gets redistributed across the web. However, we do want redundancy. | |||
* Too much developed by software developers | |||
* Too reliant on Google for web structure | |||
** See search-engine optimization | |||
* Problem of authentication (of the information, not the presenter) | |||
** Too dependent at times on the popularity of a site, almost in a sophistic manner. | |||
** See Reddit | |||
* How do you programmatically distinguish satire from fact | |||
* The web's structure is also “shaped by inbound links but would be nice a bit more” | |||
* Infrastructure doesn't need to change per se. | |||
** The distributed architecture should still stay. Centralization of control of allowed information and access is terrible power. See China and the Middle-East. | |||
** Information, for the most part, in itself, exists centrally (as per-page), though communities (to use a generic term) are distributed. | |||
* Need more sophisticated natural language processing. | |||
= Class discussion = | |||
Focusing on vision, not the mechanism. | |||
* Reverse linking | |||
* Distributed content distribution (glorified cache) | |||
** Both for privacy and redunancy reasons | |||
** Suggested centralized content certification, but doesn't address the problem of root of trust and distributed consistency checking. | |||
*** Distributed key management is a holy grail | |||
*** What about detecting large-scale subversion attempts, like in China | |||
* What is the new revenue model? | |||
** What was TBL's revenue model (tongue-in-cheek, none)? | |||
** Organisations like Google monetized the internet, and this mechanism could destroy their ability to do so. | |||
* Search work is semi-distributed. Suggested letting the web do the work for you. | |||
* Trying to structure content in a manner simultaneously palatable to both humans and machines. | |||
* Using spare CPU time on servers for natural language processing (or other AI) of cached or locally available resources. | |||
* Imagine a smushed Wolfram Alpha, Google, Wikipedia, and Watson, and then distributed over the net. | |||
* The document was TBL's idea of the atom of content, whereas nowaday we really need something more granular. | |||
* We want to extract higher-level semantics. | |||
* Google may not be pure keyword search anymore. It is essentially AI now, but we still struggle with expressing what we want to Google. | |||
* What about the adversarial aspect of content hosters, vying for attention? | |||
* People do actively try to fool you. | |||
* Compare to Google News, though that is very specific to that domain. Their vision is a semantic web, but they are incrementally building it. | |||
* In a scary fashion, Google is one of the central points of failure of the web. Even scarier is less technically competent people who depend on Facebook for that. | |||
* There is a semantic gap between how we express and query information, and how AI understands it. | |||
* Can think of Facebook as a distributed human search infrastructure. | |||
* A core service of an operating system is locating information. '''Search is infrastructure.''' | |||
* The problem is not purely technical. There are political and social aspects. | |||
** Searching for a file on a local filesystem should have a unambiguous answer. | |||
** Asking the web is a different thing. “What is the best chocolate bar?” | |||
* Is the web a network database, as understood in COMP 3005, which we consider harmful. | |||
* For two-way links, there is the problem of restructuring data and all the dependencies. | |||
* Privacy issues when tracing paths across the web. | |||
* What about the problem of information revocation? | |||
* Need more augmented reality and distributed and micro payment systems. | |||
* We need distributed, mutually untrusting social networks. | |||
** Now we have the problem of storage and computation, but also take away some of of the monetizationable aspect. | |||
* Distribution is not free. It is very expensive in very funny ways. | |||
* The dream of harvesting all the computational power of the internet is not new. | |||
** Startups have come and gone many times over that problem. | |||
* Google's indexers understands quite well many documents on the web. However, it only '''presents''' a primitive keyword-like interface. It doesn't expose the ontology. | |||
* Organising information does not necessarily mean applying an ontology to it. | |||
* The organisational methods we now use don't use ontologies, but rather are supplemented by them. |
Revision as of 02:14, 24 January 2014
Group Discussion on "The Early Web
Questions to discuss:
- How do you think the web would have been if not like the present way?
- What kind of infrastructure changes would you like to make?
Group 1
- Relatively satisfied with the present structure of the web some changes suggested are in the below areas:
- Make use of the greater potential of Protocols
- More communication and interaction capabilities.
- Implementation changes in the present payment method systems. Example usage of "Micro-computation" - a discussion we would get back to in future classes. Also, Cryptographic currencies.
- Augmented reality.
- More towards individual privacy.
Group 2
A large portion of the web serves content that is overwhelmingly concerned about presentation rather than structuring content. Tim Berner-Lees himself bemoaned the death of the semantic web.
- Information to be classified in detail
- Organize things on web. Ex: Yahoo indexers
- Suggestion for the need of Universal Decimal System an idea by Paul Otlet to be considered.
- In the end it comes to semantic web
- Information redundancy
- Information verification
Group 3
- What we want to keep
- Linking mechanisms
- Minimum permissions to publish
- What we don't like
- Relying on one source for document
- Privacy links for security
- Proposal
- Peer-peer to distributed mechanisms for documenting
- Reverse links with caching - distributed cache
- More availability for user - what happens when system fails?
- Key management to be considered - Is it good to have centralized or distributed mechanism?
Group 4
- An idea of web searching for us
- A suggestion of a different web if it would have been implemented by "AI" people
- AI programs searching for data - A notion already being implemented by Google slowly.
- Generate report forums
- HTML equivalent is inspired by the AI communication
- Higher semantics apart from just indexing the data
- Problem : "How to bridge the semantic gap?"
- Search for more data patterns
Group design exercise — The web that could be
- “The web that wasn't” mentioned the moans of librarians.
- A universal classification system is needed.
- The training overhead of classifiers (e.g., librarians) is high. See the master's that a librarian would need.
- More structured content, both classification, and organization
- Current indexing by crude brute-force searching for words, etc., rather than searching metadata
- Information doesn't have the same persistence, see bitrot and Vint Cerf's talk.
- Too concerned with presentation now.
- Tim Berner-Lees bemoaning the death of the semantic web.
- The problem of information duplication when information gets redistributed across the web. However, we do want redundancy.
- Too much developed by software developers
- Too reliant on Google for web structure
- See search-engine optimization
- Problem of authentication (of the information, not the presenter)
- Too dependent at times on the popularity of a site, almost in a sophistic manner.
- See Reddit
- How do you programmatically distinguish satire from fact
- The web's structure is also “shaped by inbound links but would be nice a bit more”
- Infrastructure doesn't need to change per se.
- The distributed architecture should still stay. Centralization of control of allowed information and access is terrible power. See China and the Middle-East.
- Information, for the most part, in itself, exists centrally (as per-page), though communities (to use a generic term) are distributed.
- Need more sophisticated natural language processing.
Class discussion
Focusing on vision, not the mechanism.
- Reverse linking
- Distributed content distribution (glorified cache)
- Both for privacy and redunancy reasons
- Suggested centralized content certification, but doesn't address the problem of root of trust and distributed consistency checking.
- Distributed key management is a holy grail
- What about detecting large-scale subversion attempts, like in China
- What is the new revenue model?
- What was TBL's revenue model (tongue-in-cheek, none)?
- Organisations like Google monetized the internet, and this mechanism could destroy their ability to do so.
- Search work is semi-distributed. Suggested letting the web do the work for you.
- Trying to structure content in a manner simultaneously palatable to both humans and machines.
- Using spare CPU time on servers for natural language processing (or other AI) of cached or locally available resources.
- Imagine a smushed Wolfram Alpha, Google, Wikipedia, and Watson, and then distributed over the net.
- The document was TBL's idea of the atom of content, whereas nowaday we really need something more granular.
- We want to extract higher-level semantics.
- Google may not be pure keyword search anymore. It is essentially AI now, but we still struggle with expressing what we want to Google.
- What about the adversarial aspect of content hosters, vying for attention?
- People do actively try to fool you.
- Compare to Google News, though that is very specific to that domain. Their vision is a semantic web, but they are incrementally building it.
- In a scary fashion, Google is one of the central points of failure of the web. Even scarier is less technically competent people who depend on Facebook for that.
- There is a semantic gap between how we express and query information, and how AI understands it.
- Can think of Facebook as a distributed human search infrastructure.
- A core service of an operating system is locating information. Search is infrastructure.
- The problem is not purely technical. There are political and social aspects.
- Searching for a file on a local filesystem should have a unambiguous answer.
- Asking the web is a different thing. “What is the best chocolate bar?”
- Is the web a network database, as understood in COMP 3005, which we consider harmful.
- For two-way links, there is the problem of restructuring data and all the dependencies.
- Privacy issues when tracing paths across the web.
- What about the problem of information revocation?
- Need more augmented reality and distributed and micro payment systems.
- We need distributed, mutually untrusting social networks.
- Now we have the problem of storage and computation, but also take away some of of the monetizationable aspect.
- Distribution is not free. It is very expensive in very funny ways.
- The dream of harvesting all the computational power of the internet is not new.
- Startups have come and gone many times over that problem.
- Google's indexers understands quite well many documents on the web. However, it only presents a primitive keyword-like interface. It doesn't expose the ontology.
- Organising information does not necessarily mean applying an ontology to it.
- The organisational methods we now use don't use ontologies, but rather are supplemented by them.