Deep Learning for Noise-Tolerant RDFS Reasoning

Tracking #: 2028-3241

Bassem Makni
James Hendler

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
Since the 2001 envisioning of the Semantic Web (SW) [1] as an extension to the World Wide Web, the main research focus in SW reasoning has been on the soundness and completeness of reasoners. While these reasoners assume the veracity of the input data, the reality is that theWeb of data is inherently noisy. Although there has been recent work on noise-tolerant reasoning, it has focused on type inference rather than full RDFS reasoning. The literature contains many techniques for Knowledge Graph (KG) embedding, however these techniques were not designed for RDFS reasoning. This paper documents a novel approach that applies advances in deep learning to extend noise-tolerance in the SW to full RDFS reasoning; this is a stepping stone towards bridging the Neural-Symbolic gap for RDFS reasoning and beyond. Our embedding technique—that is tailored for RDFS reasoning—consists of layering RDF graphs and encoding them in the form of 3D adjacency matrices where each layer layout forms a graph word. Each input graph and its entailments are then represented as sequences of graph words, and RDFS inference can be formulated as translation of these graph words sequences, achieved through neural machine translation. Our evaluation confirms that deep learning can in fact be used to learn the RDFS inference rules from both synthetic and real-world SW data while demonstrating a noise-tolerance unavailable with rule-based reasoners; learning the inference on the LUBM synthetic dataset achieved 98.4% validation and 98% test accuracy while it achieved 87.76% validation accuracy on a subset of DBpedia.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Alessandra Mileo submitted on 19/Nov/2018
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The originality of the approach, significance for the community and quality of the presentation is overall suitable for publication, but there are a few issues I would still invite authors to consider.
The paper has addresses some of the key concerns of reviewers from their previous submission, including the relation with state-of-the-art, the clear overview of the method and the evaluation comparison and baseline.
Some aspects however could still be improved.
These includes a better motivation of the design choice for the RNN architecture.
Also, in the evaluation, a plot of results for section 7.4 is missing.
Another note is that the absence of a response to reviewers comments made it harder to make it sure all the points of concern (esp the minor ones) have been addressed or interpreted correctly. I suggest authors to provide that.
Some of the relevant related work mentioned by reviewers is still missing.

Review #2
By Heiko Paulheim submitted on 29/Nov/2018
Major Revision
Review Comment:

The paper describes a reasoning system which is based on deep learning. As such, it trains a neural network to perform inference on RDF graphs, and it is shown that that the proposed system can learn to mimic standard RDFS reasoners.

In general, the paper is clearly written and well understandable, and it proposes an interesting solution. While the solution is interesting, I miss some details in both the description as well as the experiments.

One of the main weaknesses (which I think the authors can easily fix) is the completeness of the related work section. I am usually not the type of guy who abuses reviews to request citations of his own work, but we have done some work in the recent past centered around using machine learning for approximate reasoning [1,2]. Moreover, I do not agree that "all previous work in the literature about reasoning with noisy Semantic Web data focuses on type inference" - there is also a large body of work dealing with relation/link prediction and/or validation.

There are some places in which I would appreciate more details when describing the experiments. One of those issues concerns the train/validation/test splitting. The split seems to be done based on resources. My question is: given that resource X ends in train, Y ends in test. Given a triple X P Y, does it end in train, test, or both? If it is "both", does that oversimplify the problem?

As for the DBpedia example, I wonder why the authors picked a specific class (i.e., Scientists) instead of a random sample. There are two concerns here: (1) the type and distribution of noise observed in the sample may not be representative for all of DBpedia, and (2) the model might overfit to particular inferences that hold for that sample, but not in general.

As far as the experiments are concerned, I would have liked to see a comparison to the baselines. Figures 6 and 7 stand side by side, but it is not clear where the proposed approach outperforms TransH etc., and where it does not. A more thorough comparison and also a discussion of cases where each of those methods is superior would strengthen the paper.

After definition 2, three cases of non-propagable noise are distinguished, and they are supposed to be mutually exclusive. I do not agree here. Specifically, "when the property of a corrupted triple is corrupted to its super property or sibling property" - there may be different domain/range definitions for those super and sibling properties. Also, there may be cases which are not purely the second or the third case: e.g., the original triple generated 3 triples A,B,and C, where A and B are also generated by others. The new triple generates only C.

In the proof for definition 6, I cannot follow why exactly T' and T should have the same representation. As far as I understand, each property has its own layer in the representation, and if T' has one additional property that T does not have, its layered representation should also have one extra layer, and therefore, it cannot be the same representation.

[1] Heiko Paulheim and Heiner Stuckenschmidt Fast approximate A-box consistency checking using machine learning. In: ESWC 2016.
[2] Christian Meilicke, Daniel Ruffinelli, Andreas Nolle, Heiko Paulheim and Heiner Stuckenschmidt Fast ABox consistency checking using incomplete reasoning and caching. In:RuleML+RR 2017.

Review #3
By Dagmar Gromann submitted on 04/Dec/2018
Minor Revision
Review Comment:

I would like to thank the authors for addressing many of the issues raised in previous reviews. However, I would also like to point out that the fact that no summary of which issues were addresed in which manner with the resubmission of the work was detrimental to the reviewing process.

In terms of signifiance, originality, and scope, this paper perfectly matches this special issue and presents an excellent example of how Semantic Web technologies and deep learning can join forces. Significant improvements have been achieved on the related work section as opposed to the previous work. The method sections have been successfully decluttered and streamlined and are now easy to follow and well written. Several opaque aspects of the description have been clarified and explained in a easy to understand manner. Evan missing baseline experiments have been added. However, some minor aspect could still be improved upon.

The design choices are not yet justified. When looking into neural machine translation, there are many different architectures, of which CNNs have recently performed best. Why is this the best choice of architecture for this problem? Could you make the code available? Especially since no hyperparameter settings for the training and no test set accuracies are provided, which makes an estimation of performance drastically more difficult. The necessity to adjust hyperparameters for different datasets was mentioned, however, the hyperparameter settings are not indicated. Several parts of the Conclusion actually are discussions of the results and should thus, be in such a section.

I think with the inclusion of those truly minor aspects, the paper could be further improved. I encourage the authors to explicitly indicate the addressed aspects in the cover letter of the resubmission of their paper.

Minor comments (in order of appearance)
p. 3 / 19 links prediction => link prediction
low dimension representation => dimensional (several times)
p. 5 / 19 introduction transformations => introducing transformations
p.5 / 29-30 graphs and trees kernel => graph and tree kernel(s)
p. 7 can though be => ?
p. 9 /43 "In Table 4, " => please mention that it is in the appendix
p. 11 / 47 tenor creation => tensor
p. 12 image quality of Fig. 4 is very poor
p. 12 / 20 quotation encoding of unknown should be ``unknown" in Latex
p. 16 equally as important => equally important