Quality Assurance of RDB2RDF Mappings

Tracking #: 1061-2272

Authors: 
Patrick Westphal
Claus Stadler
Jens Lehmann

Responsible editor: 
Pascal Hitzler

Submission type: 
Full Paper
Abstract: 
Since datasets in the Web of Data stem from many different sources, ranging from automatic extraction processes to extensively curated knowledge bases, their quality also varies. Thus, significant research efforts were made to measure and improve the quality of Linked Open Data. Nevertheless, those approaches suffer from two shortcomings: First, most quality metrics are insufficiently formalised to allow an unambiguous implementation which is required to base decision on them. Second, they do not take the creation process of RDF data into account. A popular extraction approach is the mapping of relational databases to RDF (RDB2RDF). RDB2RDF techniques allow to create large amounts of RDF data with only few mapping definitions. This also means that single errors in an RDB2RDF mapping can affect a considerable portion of the generated data. In this paper we present an approach to assess RDB2RDF mappings also considering the actual process of the RDB to RDF transformation. This allows to detect and fix problems at an earlier stage before resulting in potentially thousands of data quality issues in published data. We propose a formal model and methodology for the evaluation of the RDB2RDF mapping quality and introduce actual metrics. We evaluate our assessment framework by applying our reference implementation on different real world RDB2RDF mappings.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Wei Hu submitted on 05/May/2015
Suggestion:
Major Revision
Review Comment:

In this paper, the authors studied the problem of quality assessment of RDB2RDF mappings, where three main contributions were made: (i) defining quality assessment methodology focusing on relational databases to RDF; (ii) designing 43 metrics to measure the quality; and (iii) implementing the approach and evaluating it on three real-world datasets.

In general, the comprehensiveness of this work impressed me a lot. The authors formally defined 43 metrics and implemented them one by one. I checked the technical report provided online, which even has 116 pages! Also, the studied problem is very interesting and important, I am not aware of other previous work specifically focusing on RDB2RDF. The readability of the paper is basically ok, however, some lengthy definitions and notations in Section 4 are a little hard to be understood, which required me to read several times to figure out their exact meanings.

In the following, I will firstly give three of my main concerns, which I hope the authors could address in the revision:

1. My biggest concern is the exactly addressed problem. From the paper title and definitions, I can tell that the authors tried to assess the quality of RDB2RDF. However, many of the proposed metrics, e.g. metric 7, only focus on the quality assessment of linked data. There is nothing to do with the relational database part or the mapping definitions (views). Because the authors has published a paper for quality assessment of linked open data [34], I think the new contributions are not enough. In future work, the authors said that a shortcoming of this paper is the lack of SQL parsing. Is it the reason that you cannot assess the quality of the relational database part? Anyway, if I were wrong, please correct me.

2. The semantics or reasoning issues have not been explicitly considered for designing the metrics. For example, the equivalence relation can be inferred in many ways, but the authors only dealt with owl:sameAs. For another example, how to address the inheritance or compatibility of datatypes in metric 3, such as non-negative integer is a kind of integer, or float and double can be compatible in some sense. In general, I think this issue can be amended by adding more explanations in the revision.

3. I was wondering if there is some way to combine the scores for different metrics. Because the authors proposed 43 metrics in 11 dimensions, the users could be lost in so many assessment results. Maybe a better way is to provide the users one overall quality score and present the details hierarchically. Actually, I believe that the authors have already had some thoughts on this issue (from the average scores in Table 4), but adding a discussion would be better. Nevertheless, this is just a suggestion rather than a flaw of the paper.

Other minor comments are as follows:

- Footnote 1. An exception are metrics -> is ?
- Section 2 (related work) lacks technical comparison. In general, there may be no work directly addressing RDB2RDF quality assessment, I believe that some metrics proposed in the paper have been already used in literature. Therefore, I would like to see more technical details.
- Section 2. one of your previous work should be added: Test-driven evaluation of linked data quality. WWW2014, 747-758
- Page 3 left column. What does “their conformance with the BCP 47 standard” mean? For the readers like me, I do not know BCP 47.
- Page 4 left column. Is “an set of columns” necessary? As far as I know, for the relational model, the columns are not assumed ordered, and I think the order is not really relevant to the remaining definitions.
- Page 4 left column. I am not very sure about the meaning of “q_h”? Does it indicate the variables not in the graph position?
- Page 4 right column. From the example and Fig. 1, I cannot see how to deal with blank nodes. Actually, I think that the authors just assign some internal names to them. A more important issue is: Do blank nodes have the quality issues? Do they need to be assessed?
- Fig. 3. The difference between the theoretical approach and the empirical approach is still unclear to me. For example, the metrics you proposed in the paper are based on your thinking? Why are they not empirical? Maybe a short example is helpful here.
- Page 10 left column. Do you have any reason to choose these 7 metrics out of 43 in the paper? If you have, please list the reasons.
- Page 10 metric 1. Reverse engineering used the names "entity relation" and "relationship relation" to differentiate your meanings.
- Page 10 metric 1. When can f1(D) be larger than 1?
- f3() and f4(). What does the hat “^” mean?
- Page 13 metric 4. I believe that the real meanings of primary keys and functional properties are different, due to the closed/open world semantics. But in practice, their meanings are often mixed. Please add some clarifications here. Additionally, how to address the OWL2 hasKey property.
- Page 15. The numerics are separated using either commas (e.g. 100,000) or dots (e.g. 100.000). Please unify the notation.
- Page 15. Please specify the exact sizes of the three datasets. Otherwise, I would think that the assessment can only be conducted on small scale, which affects its usability and feasibility.
- Page 17. Can you add a short example for the six deficiencies that were mapping errors?

Review #2
Anonymous submitted on 08/May/2015
Suggestion:
Major Revision
Review Comment:

This paper proposes an approach to evaluate the quality of RDB2RDF mappings. Specifically, 43 metrics are defined by considering context information in different scopes. This paper presents a moderate originality because the metrics defined in the paper are mostly extensions of the existing work. The evaluation results need more detailed explanation (see problems listed in specific comments). In general, this paper is well organized and written.

Strong points of this paper:
(1) It is the first formal approach to solve RDF quality assessment problem.
(2) Various metrics are defined, there are 43 metrics in total.

Weak points of this paper:

(1) Quality dimensions are mainly from reference [34], they picked 13 out the original 18 dimensions. I think the category in Table 2 should follow the same order as in Table1 for the convenience of comparison.
(2) Metrics are defined without proper explanations.

Specific comments:

(1) Page 7, there are six numbers in parentheses in the first paragraph; some of them are in the beginning of sentences, some are in the middle, and some are in the end of sentences. They might refer to the six steps in Fig. 4, but it may lead to misunderstanding.

(2) In definition 4.1, the concept of “quad” is not defined. Although it is defined as in reference [10], it would be better to introduce the definition of “quad” to make this paper self-contained. Definition 4.2 also cited reference [10], so it has the same problem.

(3) Definitions should be concise and not too long. The last paragraph in page 4 left column is an explanation of definition 4.1, and it is not supposed in the body of definition 4.1.

(4) The caption of Table 4 is too long, and some information in the caption is already presented in paragraph 2 in page 15.

(5) Why not all metrics of Consistency for dataset LinkedBrainz were evaluated? It should be explained.

(6) Average scores are computed in Table 4, do all the metrics are in the same range?

(7) How to detect quality deficiency? How to specify errors in triples? More detailed information should be added in the paper.

(8) Is it right for the metric scores that the higher the better? Why bars in Figure 5 grows in different directions? It should be explained.

(9) In the last paragraph in Section 7, it is claimed that the assessment results revealed 6 deficiencies that were clear mapping errors, and these caused more than 850,000 violations in total. How is this condition made?