Review Comment:
This is a review of an earlier reviewed paper presenting an ontology that aims to represent the shared constructions that practitioners can encounter while using available declarative mapping languages to build KGs from heterogeneous data sources. As so, in this version, the paper has improved its scope, presentation and clarity, which makes it easier to read and follow. Nevertheless, before its publication, there are still some minor remarks that should be addressed.
The first thing that surprised me in comparison with the previous version is the argument used to not cover all the expressiveness and flexibility that SPARQL-based solutions offer. It is said that they are procedural (in this context?) so as the ontological approach is based in RDF (and declarative) it has some inherited expressiveness problems. Therefore, it does not make much sense to represent these more procedural instructions in the ontology.
I agree that here the border between imperative and declarative is a bit fuzzy in some functionalities. Nevertheless, as far as I can tell, SPARQL is a declarative language like other query languages (e.g., SQL). In these cases they offer a bit more flexibility like the cited for iteration which, by using the for keyword it reminds of the imperative fashion. But, I could argue that the same can be represented with a functional programming construction like the range being it more declarative but with exactly the same functionality. In a way, I am saying that I want to generate these values but I am not saying how to generate them. Interestingly, functions used in many examples (that would require an implementation, many times an imperative one) are not out of the scope. I think you should use other kind of argument, like waiting until there is more consensus in the community, like functions have.
The other interesting argument is that the RDF syntax imposes a limitation in the expressiveness of the language. This is indeed a very interesting discussion, that maybe should be raised in the community as the vast majority of solutions go in this way. But for the given examples I do not see a problem or a limitation in the syntax, letting alone the verbosity that such solution will deliver.
Another point that raised my attention is that some ShExML listings would not work due to an erroneous use of the syntax. I didn’t check all the listings in the paper but it makes me think that it could be the case for more languages. It seems to me like they were not verified and executed to ensure that they were correct. This endangers the pedagogical aspect of the paper (which by the way I really like) but also the reproducibility. I would advice to review and execute all the mappings offered in the listings (you can find the corrected one for ShExML below) and also offer the complete mapping rules as supplemental material so other people can check and learn from your findings.
This point drives me to another aspect that in my opinion is quite unattended, implementation details. The idea seems too abstract many times without being possible to tell whether this would work in the future. As being an ontology paper this is out of the review process but I think that it would be very beneficial to give some hints about the future implementation, for example, for accessing fields outside the iteration scope.
As a side note, ShExML has recently added support for functions and conditional statement generation. This changes will be soon reflected in the specification. Should you want to have the most updated information in the paper you could check them in the ShExML Github repo (https://github.com/herminiogg/ShExML) or drop me a line.
References should be thoroughly reviewed. For example, venue is missing in [4, 8, 9, 19, 21, 22, 25, 29, 30, 31, 32, 33, 44, 46, 47, 48, 50, 51, 53, 54, 57, 60]; [58] is duplicated with [32]; [12] is from ESWC; [3, 27] are not PhD thesis but Research Reports; [37] has been peer-reviewed and formally published; and [42] needs to be completed.
Additionally, I leave below more comments and suggestions per section:
#Title
Integrating → (I don’t see that the ontology is really integrating in this stage, maybe use something like describing or representing)
#Related work
noSQL → NoSQL
is a commercial language used commercially → is a commercial language
providing more complex comparison frameworks → (from the description afterwards I would say they are qualitative comparisons but not necessarily more or less complex)
#Methodology
[23, 49] → (The followed methodologies remind of a normal software development methodology, it would be nice to add some clarification why these are more interesting and in which things they excel at)
there are no use cases as this ontology is a mechanism of representation of mapping language’s features → (Is not the representation of mapping languages features a use case?)
#Conceptual Mapping Requirement Specification
W3C Knowledge Graph Construction Community → (add a link?)
neither SPARQL-based languages nor languages based on other schemes consider this feature → (this sentence is a bit ambiguous, if you mean by other formats the rest of the languages neither following RDF nor SPARQL then it is not correct)
such as RML and ShExML [57, 58] → such as RML [57] and ShExML [58]
especially in tree-like data sources → (only JSON is affected due to JSONPath inability, XML using XPath is capable though)
#Conceptual Mapping Implementation
is a subclass of dcat:Distribution → (beware of the long empty gap)
and defines which is the data in the source that is retrieved → and defines which data is retrieved from the source
and can define shapes (shapes) for its restrictions → (what do you mean? this is the first time that we hear about shapes. What are they intended for in the ontology?)
has been created for each taxonomy has been created → has been created for each taxonomy
Listing 8: Data sources → (data sources of what? use longer descriptions in the captions so the reader can understand better what is the figure/listing about)
$.zipcodes.* → (How is this expression supposed to work? As commented earlier some implementation details may be needed to fully understand the solution)
#Conclusion and Future Work
the Conceptual Mapping, an ontology-based mapping language called Conceptual Mapping → the Conceptual Mapping, an ontology-based mapping language
such as SPARQL-Generate of Facade-X → such as SPARQL-Generate or Facade-X
#Appendix A
of the ontology that don’t appear → of the ontology that do not appear
This example show how to describe → This example shows how to describe
dcat:mediaType “text/json” → (but endpointURL is .csv)
ShExML listing corrected. It is also possible to define a QUERY (http://shexml.herminiogarcia.com/spec/#query) and then use the variable like ITERATOR it_cities :
SOURCE cities_rdb jdbc:mysql://localhost:3306/citydb
SOURCE coord_json
ITERATOR it_cities {
FIELD c_city
FIELD population
FIELD year
FIELD zipcode
}
ITERATOR it_coord {
FIELD lat
FIELD long
FIELD loc_city
}
|