Review Comment:
Towards Linkset Quality for Complementing SKOS Thesauri
Overall evaluation
Select your choice from the options below and write its number below.
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject
-1
Reviewer's confidence
Select your choice from the options below and write its number below.
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)
3
Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4
Novelty
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
3
Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4
Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present
4
Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
3
Review
This paper describes a measure for determining the quality of (VoID) linksets that interlink SKOS thesauri. The introduced measure, termed “linkset importing”, determines whether the linkset will increase the the completeness of a selected target dataset. Completeness is determined by whether new values for properties in the target dataset can be incorporated via the linksets. This would be helpful for example in determining whether additional language labels for resources would be included.. The evaluation is done with respect to a synthetic benchmark. The authors describe the benchmark generator and make it available.
Overall, I like the idea behind the paper. Links are a crucial part of what makes Linked Data, linked data. We see that many of the interconnections are generated by third parties in the form of linksets. Indeed, there may many different linksets that could be potentially of interest, thus, understanding whether any given linkset enriches a particular dataset is clearly of interest. (On this point, I’m not sure if the author’s statement that link sets are extremely important in Linked Data is provocative as stated in the conclusion.)
Unfortunately, the paper suffers from two major issues: terminological coherence and breadth of evaluation.
## Terminological coherence
While reading the paper I struggled understanding the definitions of important concepts.
Some examples:
- The definition of linkset importing is given in a couple of different ways: “linkset importing, a measure which assesses linksets as good as they improve a dataset with its interlinked entities’ properties.” It’s unclear what improve means or interlinked properties. Or ““linkset importing which measures the percentage of values that can be imported in subject dataset, from the object dataset, when complementing via a linkset.” It’s not clear what the subject and object dataset are in this context.
- The notion of complementing is central to the paper but it’s presented in a confusing fashion e.g. ”For completeness of dataset obtained by complementing SKOS thesauri with their skos:exactMatch-related information”. What is skos:exactMatch related information? It took me until section 3.2 to figure out what was really meant with the definition when the scoring function presented with respect to the notion of importing potential defined as “this function evaluates how many new values for a property p, are distinct from those already existing in the subject dataset X and can be reachable through L”.
- RDFEntities is oddly defined as entities “exposed as” resources. I assume the authors just mean all the resources within the datasets considered. Sometimes within the literature there is a distinction between resources and entities within an RDF dataset. It would be helpful to be precise.
- In the definition of the test framework, T is defined as the seed thesaurus which acts as the gold standard but later when discussing completeness assessment T is defined as a thesaurus in one the tests sets: “thesauri T, G ∈ D, where T is a subject thesaurus in one of the test sets, and G is the gold standard”
These are just some of the examples. I think the whole paper would be improved with more precise terminological definitions. In particular, I think what the author’s trip over is the idea that one is “adding” a dataset to another one via a linkset. I think by specifically adopting the terminology from VoID, void:subjectsTarget and void:objectsTarget the terminology would be much less confusing.
## Evaluation
In terms of evaluation, given that it is synthetic, I wondered why the evaluation was not tested with more seed ontologies? Likewise, not all results were presented. I would prefer more results in the paper rather than the validation architecture picture, especially, given that the OAEI framework is used a reference architecture.
For Figure 3, why is the plot drawn as a connected line chart given that the x-axis are the test sets considered? It would also be tremendously helpful to have a better connection between Table 2 and Figure 3. Table 2 itself is difficult to understand. It’s fundamental to the paper and difficult to understand what the various changes actually are. For test 1 and test 2 it seems it’s based around the deletion of percentages of skos:labels in the test. I’m not sure what modifications were made for the remainder of the test datasets.
Some final thoughts:
In terms of related work, while the summary of the author’s prior work was fine, what I missed was the differences to this work. It focuses on SKSO, ok, but what is being reused, added upon or changed?
I didn’t understand why the focus was on SKOS, couldn’t the method be applied to any linkset?
I also wondered about the limitation to measuring just values for properties. What happens if the object-target has properties that aren’t in the subject-target? Don’t those additional properties increase the completeness of datasets.
In summary, I think the core of the research is there but in the current setup it is difficult understand and thus the measure itself cannot yet be built upon by others or compared against.
Minor comments
- “datasets has been exposed and interlinked according to…” - has been should be have been
- “proposing the linkset importing” - remove “the”
- Can you provide support for this statement: “Linked Data is largely adopted by data producers such as European Environment Agency, US and some EU Governs, whose first ambition is to share (meta)data making their processes more effective and transparent.”
- Add a citation for this quote “Linked Data will evolve the current web data into a Global Data Space”
- “remarkable number of thesauri” - how many?
- “LACT link specification” should be “LATC link specification”
|