Review Comment:
Summary
This submission presents a data scaling approach for benchmarking Ontology-Based Data Access (OBDA). It builds on OWL and SPARQL as well as R2RML. The main technical part of the paper is the VIG algorithm which is divided in two phases: collecting information about the initial data and then growing the initial data based on the previous analysis with the stated scale factor. The approach is evaluated with several case studies and technologies.
Evaluation
The paper is well-structured and easy to read. The investigated problem is relevant and extends existing approaches for scaling pure database systems working soley with relations. The experimental results seems promising and also point out future research lines for improving the presented approach.
While in general I appreciate the approach and presentation, there are also points which should be improved. In the following, I detail these points.
1) The paper is an extension of a previous publication at the BLINK workshop. This is also mentioned in the introduction and as difference is the evaluation mentioned which is improved in the journal submission. However, it would be really interesting for the reader, why the evaluation has been extended? What can we learn from the evaluation of this submission with respect to the previous workshop evaluation? I assume some aspects are evaluated deeper or more results have been needed to derive some specific conclusions. This should be discussed also shortly in the introduction. In particular, the question comes up what does the DBLP case contribute to the evaluation? Is it about the real-world data aspect?
2) Concerning the evaluation, I am impressed about the different settings which have been used to evaluate the presented approach. However, as the evaluation section is long and many aspects are discussed, I would propose to introduce for the evaluation section research questions which should be answered by analyzing the different cases. In Section 5.4 these questions should be explicitly answered and threats to validity should be discussed. Maybe also related to my previous point about the additional evaluation cases mentioned in the introduction, it would be interesting to discuss the characteristics of the cases.
Also the setup of the evaluation study and the discussion of the results could be more separated. For instance, the discussion of the adaptation of the BSBM case is very detailed and somehow one get lost in these details when reading the evaluation section - mostly due to jumping between setup of the study and the results.
3) Related work: The distinction to Rex [4] is not clear. It is only mentioned that Rex has a better handling of content for non-key columns. As it is mentioned as a strictly related approach, a more detailed discussion would be needed.
4) I was wondering when reading this paper why the scaling is done on the database level and not on the ontology level. Would it be easier? Currently the mappings have to be exploited to reason about proper ways to scale the data. I have somehow the impression that using directly the ontology would allow to exploit more semantics to reason about proper ways of scaling data. At least, a discussion would be interesting for the reader why it has been decided to do the data scaling on the database level. I understand that the data is originating from the database, but why not scaling the corresponding ontology graph?
Minor issues
Abstract and Introduction: "read-world data" -> "real-world data"
Page 10: data that are -> data that is
Page 12: FactPages ^12 -> please delete the space between the word and the footnote
Page 14: cannot are not taken -> delete cannot
|