Review Comment:
Data interlinking is a major inference in the setting of the Semantic Web. It allows finding identical resources among different datasets, and also, as a particular case, finding duplicate resources within a single dataset. Data interlinking tools are therefore fundamental for developing the Linked Open Data cloud (LOD), as well as for exploring/navigating within it, based on resources of interest.
Data interlinking builds on the notion of keys, which identify resources within a dataset. These keys are defined for classes of resources by some properties they have. Two distinct resources of a same class can then be stated equivalent if their descriptions intersect for some properties and/or are equal for some other properties.
Based on these keys, alignments (correspondences) between datasets can be found/set/enriched.
This paper starts by revisiting two mains notions of keys that have been used in the literature for data interlinking: S-keys and F-keys. S-keys, called here in-keys, state that two resources of a class are the same if their values on each key property intersect, while F-keys, called here eq-keys, impose that their values on each key property are equal.
Two trivial results are then provided: eq-keys are in-keys, and in-keys are eq-keys when the considered properties are functional.
These notions of in- and -eq keys are then generalized to that of keys, so that class resources can be identified with the behavior of in-keys for some properties, and that of eq-keys from some other properties.
Then, the paper studies data interlinking when keys are used in combination with alignments. It points out two simple results that allows finding links, using in- and eq-keys, between the resources of some class C in one dataset and these of another class D, known to be subsumed by C in the alignment, from another dataset.
Finally, this is the main part of the paper, link keys are introduced, compared with keys, and used for data interlinking.
Link keys generalize in- and -eq keys; they are used to establish that a class C resource is the same as a class D resource if their descriptions intersect or are equal wrt some properties. Various flavors of link keys are defined; they differ on whether the set properties used for the descriptions of C and D are keys (strong in- and eq-link keys), are not keys (weak in- and eq-link keys), or are not keys for C and D but for the linked resources (plain in- and eq-link keys).
Some simple results then follow, roughly eq-link keys are in-link keys, and in-link keys are eq-link keys when the considered properties are functional, as well as strong link keys are plain link keys, and plain linked keys are weak keys.
Expected relationships are then pointed out between link keys and keys.
The paper ends with two simple results that allows finding links, using weak in- and eq-link keys. Interestingly, it is pointed out that data interlinking with link keys is more general than with keys and alignments due to the use of properties that do not form keys for the class they describe.
Comments:
The paper studies the inference problem of data interlinking, which is central for the Semantic Web, especially for the development and use of the LOD cloud datasets.
The goal of the paper is to provide logical foundations for data interlinking with keys and link keys, but unfortunately, in my opinion, this goal is not met.
First, the paper is hard to read and follows because the writing is very technical, though the results (which are easy to understand) are simple. For instance, in the Introduction, the problem of data linking as well as the goal of the study, though important, are neither motivated nor exemplified in details. There is a deluge of variations of key notions, with non-intuitive names, with no idea about their meaning or purpose, etc and of technical results about them, that only specialists on the topic may catch. I think that part of the good examples used in the paper could be brought into the introduction to make it more digestible.
Second, the paper does not really provide logical foundations for data interlinking, but rather use description logics to show some ways of using keys to infer links between data. Further, all the provided technical results are easy to obtain and quite directly follow from the various definitions of keys.
Overall, in my opinion, the paper lacks in originality and significance of the results, and also suffers from the way it is written (more below), for a venue like SWJ.
Other comments:
- in- and eq-keys are introduced right from the start (abstract, introduction) with almost no intuition; we need to reach section 4 to understand that in- means intersection and eq- equality, just before the formal definitions. This should be said in the first place, to help having the meaning of what in- and eq-keys are.
- Propositions 1, 4. There is something wrong here. Every eq-key is a in-key, hence the left-hand and right-hand sides of \models should be switched, as every model of an eq-key is a model of the corresponding in-key.
- Propositions 2, 5. There is something wrong here. When properties are functional, an in-key is an eq-key, hence the left-hand and right-hand sides of \models should be switched, as every model of an in-key with functional properties is a model of the corresponding eq-key.
- Page 8. The notation {p_i}_{i=1}^k = {p_1,…,p_k} is introduced while it has been already been used.
- Beginning of Section 5. It must be said in the text that A is the alignment at hand.
|