On the relation between keys and link keys for data interlinking

Tracking #: 2535-3749

Manuel Atencia
Jérôme David
Jérôme Euzenat

Responsible editor: 
Aidan Hogan

Submission type: 
Full Paper
Both keys and their generalisation, link keys, may be used to perform data interlinking, i.e. finding identical resources in different RDF datasets. However, the precise relationship between keys and link keys has not been fully determined yet. A common formal framework encompassing both keys and link keys is necessary to ensure the correctness of data interlinking tools based on them, and to determine their scope and possible overlapping. In this paper, we provide a semantics for keys and link keys within description logics. We determine under which conditions they are legitimate to generate links. We provide conditions under which link keys are logically equivalent to keys. In particular, we show that data interlinking with keys and ontology alignments can be reduced to data interlinking with link keys, but not the other way around.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 07/Sep/2020
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The main contribution of this work resides in formalizing the relationship between different semantics of existing keys and link keys. The authors in this new version made the effort to clarify the relevance of weak/strong and plain keys. They emphasized that this distinction provides a framework to compare link lens and keys combined with ontology alignement. However, the authors did not succeed to provide an experimental evaluation that shows that the quality of obtained links using link keys are better than keys combined with alignement.

Example 4, to better convince on the relevance of defining these different kinds of keys, example 4 presents a new scenario on a new dataset (geoNames) which shows that weak keys allows to generate better results than keys with alignement.

Even though that an extensive experimental evaluation may make the impact of proposed work even more strong it remains a real progress in the comparison of different kinds of exiting keys in the literature.

Review #2
Anonymous submitted on 08/Oct/2020
Review Comment:

My main concern was that the paper didn’t demonstrate that on some data extracted from existing knowledge graphs that the different types of keys can really leads to different results (in particular for strong keys and link keys). Even if no experimentation has been added, the authors have proposed a convincing example (example 4) based on two real datasets, Geonames and INSEE, that clearly shows that the results can be very different.
Moreover, I can understand that differences between weak, plain and strong keys can be illustrated (and the added examples are well-chosen) but cannot be shown on datasets that do not contain duplicates.

All my minor remarks has been taken into account (or argued).

I would recommend to accept this version of the paper.