Review Comment:
In this paper, the authors studied the problem of quality assessment of RDB2RDF mappings, where three main contributions were made: (i) defining quality assessment methodology focusing on relational databases to RDF; (ii) designing 43 metrics to measure the quality; and (iii) implementing the approach and evaluating it on three real-world datasets.
In general, the comprehensiveness of this work impressed me a lot. The authors formally defined 43 metrics and implemented them one by one. I checked the technical report provided online, which even has 116 pages! Also, the studied problem is very interesting and important, I am not aware of other previous work specifically focusing on RDB2RDF. The readability of the paper is basically ok, however, some lengthy definitions and notations in Section 4 are a little hard to be understood, which required me to read several times to figure out their exact meanings.
In the following, I will firstly give three of my main concerns, which I hope the authors could address in the revision:
1. My biggest concern is the exactly addressed problem. From the paper title and definitions, I can tell that the authors tried to assess the quality of RDB2RDF. However, many of the proposed metrics, e.g. metric 7, only focus on the quality assessment of linked data. There is nothing to do with the relational database part or the mapping definitions (views). Because the authors has published a paper for quality assessment of linked open data [34], I think the new contributions are not enough. In future work, the authors said that a shortcoming of this paper is the lack of SQL parsing. Is it the reason that you cannot assess the quality of the relational database part? Anyway, if I were wrong, please correct me.
2. The semantics or reasoning issues have not been explicitly considered for designing the metrics. For example, the equivalence relation can be inferred in many ways, but the authors only dealt with owl:sameAs. For another example, how to address the inheritance or compatibility of datatypes in metric 3, such as non-negative integer is a kind of integer, or float and double can be compatible in some sense. In general, I think this issue can be amended by adding more explanations in the revision.
3. I was wondering if there is some way to combine the scores for different metrics. Because the authors proposed 43 metrics in 11 dimensions, the users could be lost in so many assessment results. Maybe a better way is to provide the users one overall quality score and present the details hierarchically. Actually, I believe that the authors have already had some thoughts on this issue (from the average scores in Table 4), but adding a discussion would be better. Nevertheless, this is just a suggestion rather than a flaw of the paper.
Other minor comments are as follows:
- Footnote 1. An exception are metrics -> is ?
- Section 2 (related work) lacks technical comparison. In general, there may be no work directly addressing RDB2RDF quality assessment, I believe that some metrics proposed in the paper have been already used in literature. Therefore, I would like to see more technical details.
- Section 2. one of your previous work should be added: Test-driven evaluation of linked data quality. WWW2014, 747-758
- Page 3 left column. What does “their conformance with the BCP 47 standard” mean? For the readers like me, I do not know BCP 47.
- Page 4 left column. Is “an set of columns” necessary? As far as I know, for the relational model, the columns are not assumed ordered, and I think the order is not really relevant to the remaining definitions.
- Page 4 left column. I am not very sure about the meaning of “q_h”? Does it indicate the variables not in the graph position?
- Page 4 right column. From the example and Fig. 1, I cannot see how to deal with blank nodes. Actually, I think that the authors just assign some internal names to them. A more important issue is: Do blank nodes have the quality issues? Do they need to be assessed?
- Fig. 3. The difference between the theoretical approach and the empirical approach is still unclear to me. For example, the metrics you proposed in the paper are based on your thinking? Why are they not empirical? Maybe a short example is helpful here.
- Page 10 left column. Do you have any reason to choose these 7 metrics out of 43 in the paper? If you have, please list the reasons.
- Page 10 metric 1. Reverse engineering used the names "entity relation" and "relationship relation" to differentiate your meanings.
- Page 10 metric 1. When can f1(D) be larger than 1?
- f3() and f4(). What does the hat “^” mean?
- Page 13 metric 4. I believe that the real meanings of primary keys and functional properties are different, due to the closed/open world semantics. But in practice, their meanings are often mixed. Please add some clarifications here. Additionally, how to address the OWL2 hasKey property.
- Page 15. The numerics are separated using either commas (e.g. 100,000) or dots (e.g. 100.000). Please unify the notation.
- Page 15. Please specify the exact sizes of the three datasets. Otherwise, I would think that the assessment can only be conducted on small scale, which affects its usability and feasibility.
- Page 17. Can you add a short example for the six deficiencies that were mapping errors?
|