A Reasonable Semantic Web

Paper Title: 
A Reasonable Semantic Web
Authors: 
Pascal Hitzler, Frank van Harmelen
Abstract: 
The realization of SemanticWeb reasoning is central to substantiating the SemanticWeb vision. However, current mainstream research on this topic faces serious challenges, which forces us to question established lines of research and to rethink the underlying approaches. We argue that reasoning for the Semantic Web should be understood as "shared inference," which is not necessarily based on deductive methods. Model-theoretic semantics (and sound and complete reasoning based on it) functions as a gold standard, but applications dealing with large-scale and noisy data usually cannot afford the required runtimes. Approximate methods, including deductive ones, but also approaches based on entirely different methods like machine learning or natureinspired computing need to be investigated, while quality assurance needs to be done in terms of precision and recall values (as in information retrieval) and not necessarily in terms of soundness and completeness of the underlying algorithms.
Full PDF Version: 
Submission type: 
Other
Responsible editor: 
Krzysztof Janowicz
Decision/Status: 
Accept
Reviews: 

Review 1 by Claudia d'Amato:

The paper argues on the necessity of rethinking the way of reasoning in the Semantic Web to cope
with the scaling problem.

The paper addresses a very interesting problem in a very clear way, giving an overview of the
adopted approaches and indicating inductive reasoning as a possible solution.

In the following some detailed comments are reported
* the argumentation and the example at the end of sect. 1 could be extended and made more explicit
* end of 1st column pp. 2: if A and B do not refer to the same knowledge base, an ontology matching
problem would be solved/considered in order to count the shared inferences. Some comments on this aspect could be added
* end of sect. 2: can you please clarify why semantics as "shared inference" do not presuppose to
use model theory? How inferences are computed (until here alternative forms of reasoning have not been analyzed)?
* middle of 1st column pp. 4 (sect. 5): "Two of the main obstacles are scalability of the algorithms, and requirements on the input data" --> maybe it is better to anticipate that the desired requirement is the absence of noise (this is written two paragraph later on).
* I find very interesting the argumentation of the need of new forms of reasoning with the parallel with precision and recall in the information retrieval setting. However, some experimental results showed in Claudia d'Amato, Nicola Fanizzi, Floriana Esposito: Query Answering and Ontology
Population: An Inductive Approach. ESWC 2008:288-302 demonstrated that alternative/additional metrics are necessary due to the Open World Assumption. Discussing also the evaluation aspect could be interesting
* 2nd paragraph sect. 4: arguing of the necessity of dynamically handling inconsistency, the following references could be useful (even if the adopted approach focuses on the TBox rather than on the ABox)
(a) Thomas Scharrenbach, Rolf Grütter, Bettina Waldvogel, Abraham Bernstein, Structure Preserving TBox Repair using Defaults, Proceedings of the 23rd International Workshop on Description Logics (DL 2010) 2010, CEUR Workshop Proceedings.
(b) Thomas Scharrenbach, Abraham Bernstein, On the Evolution of Ontologies using Probabilistic Description Logics, Proceedings of the First ESWC Workshop on Inductive Reasoning and Machine Learning on the Semantic Web, June ; 2009, CEUR Workshop Proceedings.

MINOR:
* beginning of 2nd column pp. 1: "However, as is often the case" -> "However, as it is often the case"
* please recall the definition of instance unification problem with a footnote
* 1st column pp. 3: "Only recently has the semantic web community begun to appreciate" -> "Only
recently the semantic web community has begun to appreciate"
* middle of 1st column pp. 3: "to which they they approximate full" -> "to which they approximate full"
* last row pp. 3: "practially" -> "practically"
* 2nd paragraph sect. 6: "Approximate reasoning, under...of further research" -> this sentence should be rephrased

Review 2 by Thomas Lukasiewicz:

Contents: The paper suggests that shared and non-classical inference, which is based on approximate (deductive and non-deductive) techniques, and which should be scalable to the size of the Web and also be able to handle noisy data, are a very promising direction of future research for the Semantic Web. Here, soundness and completeness as quality measures from KRR are very likely to be replaced by precision and recall from IR, respectively.

Evaluation: This is a very nice paper, which I fully support; however, I have some suggestions for improvements: the current contents is written in a rather vague way; it would be very good to give some further evidence for the statements about the current developments in the Semantic Web via existing systems or previous publications in the literature. Similarly, it would be good to be more concrete about the suggested future direction of research for the Semantic Web; for example, by taking some representative systems or approaches from the literature. Especially the following two papers match exactly the vision of future research in the Semantic Web laid down in the submission (the first one combines Semantic Web reasoning with standard Web search, giving up the completeness of standard reasoning for the sake of efficiency, while the second one goes even one step further by replacing deductive inference by inductive inference, thus even giving up the correctness of standard reasoning, for the sake of additionally being able to handle inconsistencies, noise, and incompleteness):

-- Bettina Fazzinga, Giorgio Gianforme, Georg Gottlob, Thomas Lukasiewicz: Semantic Web Search Based on Ontological Conjunctive Queries. In Proceedings FoIKS 2010, pp. 153-172, LNCS 5956, Springer, 2010.

-- Claudia d'Amato, Nicola Fanizzi, Bettina Fazzinga, Georg Gottlob, and Thomas Lukasiewicz: Combining Semantic Web Search with the Power of Inductive Reasoning. In Proceedings SUM 2010, LNCS, Springer, 2010 (full paper in press).

Minor comments: The paper contains some typos, which may be easily found by a spell-checker.

Tags: 

Comments

That's a nice position statement where the challenges and opportunities for reasoning on Web data are lying.

I suggest to cross-reference from our article
http://www.semantic-web-journal.net/content/new-submission-can-we-ever-c...

You might find that read interesting as well, since we have quite some references in there on analyses on the noise in Web data and how Web data can be currently dealt with in existing Web of Data systems that do reasoning... some examples:

R. Delbru, A. Polleres, G. Tummarello and S. Decker. Context Dependent Reasoning for Semantic Documents in Sindice. In Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS). Kalrsruhe, Germany, 2008.

Aidan Hogan, Andreas Harth, and Axel Polleres. Scalable authoritative owl reasoning for the web. International Journal on Semantic Web and Information Systems, 5(2):49-90, 2009.

Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. Weaving the pedantic web. In 3rd International Workshop on Linked Data on the Web (LDOW2010) at WWW2010, Raleigh, USA, April 2010.

... I know, shameless self-promotion, but I couldn't help leaving that comment, would be glad if you had a look.

Axel

EDIT 15/07: Removed some of the comments I had previously made that were actually addressed in the paper... Apologies...

Nice paper. I particularly like the argumentation against the over-emphasis on completeness for practical reasoning systems – adopting (precision/)recall measures could be a first step away from the current fixation on (soundness/)completeness. We certainly need some such metrics. One problem I see with precision/recall is that we need a gold standard, and as you say, formal semantics does not represent a useful gold standard if you're working with the assumption that the data is fallible – the main question that then arises is what form that gold standard takes... Is it application specific? Is it data specific? Does it even make sense to speak of a gold standard?

Minor comments:
Page 1: "Some of the problems...". This paragraph is vague and confusing... to which concrete problems do you refer? Firstly, for example, FOAF and SIOC – as popular web vocabularies – contain inverse-functional properties, class hierarchies, and disjoint classes as you mention. Secondly, the vast majority of Linked Data *is* "annotated with ontologies" – a conservative estimate would put 80%-90% of triples using dereferencable classes/properties [see the Pedantic Web paper so shamelessly ref'ed by Axel...]; in fact, the Pedantic Web paper comes to the alternate conclusion that most of the "problems plaguing Linked Open Data sets" are from over-specified classes and properties (particularly where the label of the term doesn't signify its underlying semantics).

Page 1: rdf:domain -> rdfs:domain ... also, define scarce: scarce as in not enough, or scarce relative to something else? I think there's more than enough such RDFS terms used in the cloud. The only reason why owl:sameAs is more common is because it is "A-Box": used to relate individuals which are naturally more numerous.

Page 2: I somewhat sympathise with what you're saying with the last sentence of Section 1, but it's a huge exaggeration. Particularly, *often* and *without contemplation* aren't very fair. From your example, there's always a trade-off between how much information is preserved in triplification and how verbose the resulting triples seem – this is a result of using RDF, and not of lack of "contemplation" by publishers; "usefulness" is largely subjective.

Page 3: clasical -> classical

Page 3: (or even obtainable) -> (or even be obtainable)

Page 3: I'd rather that the query "finding all potential terrorist subjects" was 100% sound. :)

Throughout: You should check nesting of closing inverted commas and punctuation: e.g.,
Abstract: "shared inference,"
Page 1: ..."data" and "knowledge."
Page 3: ...for shared inference,"

Thank you very much for the excellent remarks, and in particular for the pointers to relevant literature which we had missed. We incorporated almost all suggestions, and believe that it has improved the paper.

Dear Aidan,

I just wanted to explicitly respond to two points you raised. This is independent of the paper review, but I think this discussion is important.

Concerning the gold standard problem you raise - a critical aspect, in my opinion, is exactly the realization that we *need* a gold standard. If we don't have one, then we are losing a core aspect of the semantic web idea. In fact, I believe if we neglect "formal semantic" aspects, then we're not doing semantic web at all (which doesn't mean it won't be useful or even sellable under the "semantic web" label). The current "gold standard" is the formally defined semantics of the major ontology languages. We argue in the paper that we cannot give up on the idea of a gold standard, but that a too narrow perspective on it (which, e.g., ignores noise as occurring in LOD) is not going to work out. However (and we also argue for this in the paper), the realization that we need a "gold standard" does not necessarily mean that this gold standard needs to be a model-theoretically defined formal semantics. We need to question established methods without throwing out the baby with the bath water...

Concerning your remarks on "Page 1:" I don't see schema knowledge in LOD which is really useful for reasoning - and I think we agree on this. However, I cannot subscribe to the perspective that it's simply over-specification which is causing this (although this is certainly one aspect of the problem). In fact, I believe that we are still a bit away from understanding how to do this right, in a pragmatic sense.

Hi Pascal,

I believe that we're very much on the same page with regards a gold standard for "pragmatic" reasoning systems. I can see this paper serving as a useful reference for me in the future.

Wrt. the actual usefulness of schema data in Linked Data – again somewhat aside from the paper as you say, but an interesting discussion perhaps – I would not take such a strong view. I agree that *much* of the inferencing currently mandated by the schema data is not so "inspiring", but I would maintain that certain popular LOD vocabularies do enable useful inferences.

Taking a conveniently subjective notion of "useful inferences" based on experiences in SWSE, I can give examples of LOD schema data that enable such inferencing:
* FOAF's inverse-functional properties are very useful for deriving owl:sameAs relations and allow for automatic smushing of people through foaf:homepage, foaf:mbox, etc. (Granted, use of these terms is prone to noise, but thats perhaps a seperate issue). Similarly, functional properties are also useful: e.g., foaf:primaryTopic.
* SIOC's use of inverse properties enable easy navigation and query-answering over the data: sioc:next_by_date/sioc:prev_by_date, sioc:reply_of/sioc:has_reply.
* Popular vocabularies often define internal mappings to related vocabularies using subsumption relations; e.g., SIOC -> FOAF; FOAF -> DC; etc.

I favour the trend of lightweight vocabularies being published on the Web, in a form of bottom-up evolution. I don't think over-simplification is the problem: considering the Web use-case, the first and foremost aim of these vocabularies is to enable and encourage the re-use of terms across different sources. By adding more expressive descriptions of terms, I can see the following problems:
* re-use becomes more difficult as the precise meaning of the term becomes more constrained;
* more noise becomes apparent as people try to put the square term into the round hole;
* extending -- or even just instantiating -- the vocabulary requires an increasing level of expertise.

For me, the priorities for Web vocabularies are (and should be) re-use first, and reasoning second.

In any case, this is somewhat aside from the paper, and maybe a discussion for a later date.

This is not a proper review but just minor corrections.
I enjoyed reading the paper and think it is quite useful. I can't say much more than the already existing reviews but I found the following typos:

The last citation should be:

D. Vrande\v{c}i\'c [et al.] [...] In R. H\'eliot and A. Zimmermann.

Please correct the spellings of these three surnames.

Indeed. Thanks for noticing, Antoine!