Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
*** Contribution ***
The authors present mapping-template, a tool based on the Apache Velocity template engine for specifying transformations between different data formats. To frame their tool at an abstract level, they introduce a workflow for the generalization of the mapping process and perform a qualitative analysis of existing mapping approaches against this workflow. A comparison, both qualitative and quantitative, of mapping-template against state-of-the-art R2RML/RML engines concludes the paper.
The proposed abstract workflow and the extensions to the tool are valuable contributions to the field, and the qualitative analysis is thorough. However, in my opinion certain claims and comparisons require further clarification and validation, as detailed below.
*** Novelty ***
This aspect of the work raises significant concerns. The submission substantially overlaps with [A], a 19-page paper from the Knowledge Graph Construction Workshop that shares the same title as the present work. Many sections are word-for-word identical, which raises questions about the level of new contribution.
The authors highlight some notable extensions in the abstract:
- Analysis of the state-of-the-art for Knowledge Graph Construction (KGC) within the context of the proposed workflow (Table 1 and Section 3.4).
- Direct support for the execution of RML mapping rules (end of Section 4) in a tool called "mapping-template-rml".
- Inclusions of results from the KGC Challenge 2024 and extensions of tests for mapping-template-rml.
While these extensions are useful, the overlap with [A] should be explicitly addressed, and [A] should be properly cited (I was surprised not to find any citation of this work).
Regarding the claim made in the abstract about addressing a gap in literature---support for conversion between different data formats---this requires deeper discussion. The statement, "Existing solutions for the declarative lifting of data to RDF are not able to effectively support knowledge conversion towards a generic output," is only partially justified. Specifically, in Section 2.3, the authors acknowledge solutions like [54] and [19], which already address lowering mappings to heterogeneous data sources. Clarifying the precise gap filled by this work is essential to strengthening the novelty claim.
Finally, the use of the term "knowledge conversion" is ambiguous. Clearer terminology or a concrete definition would improve comprehension.
*** Significance of Results ***
The qualitative analysis considering the requirements for declarative mapping languages is detailed and provides useful insights. However, I have a few concerns on the other half of the qualitative analysis, as well as the quantitative analysis:
1) Relevance of the quantitative results for "mapping-template":
- The analysis is undermined by the fact that mapping-template does not support RML, and that to fill this gap authors manually converted the RML mappings into mappings for their tool.
- The manual conversion raises questions about fairness. Were these mappings validated for equivalence? Could manual optimization during conversion have provided an unfair advantage? Explicit validation and examples of these mappings would clarify this issue.
- Furthermore, the comparison between mapping-template and mapping-template-rml (which natively supports RML) is not thoroughly discussed. For example, Figure 6 suggests differences in performance, but the reasons for these differences are not adequately analyzed.
2) Applicability to Big Data Scenarios:
- Results indicate that mapping-template-rml struggles with memory consumption and large datasets, raising concerns about its scalability. Addressing this limitation would enhance the tool's practical relevance.
3) Omitted Experiments:
- The omission of comparisons with morph-kgc for mapping-template-rml and the lack of tests at larger scales for Figure 6 reduce the comprehensiveness of the analysis. Including these would strengthen the quantitative evaluation.
*** Quality of Writing ***
The writing is generally clear but could be improved with better organization and formatting. I provide a few examples:
- Figures and Tables:
-- Figure 1 lacks a detailed description clarifying the meaning of the layers, rectangles, rounded rectangles, and so on.
- Section Structure:
-- I would add a sentence introducing Sections 3.1 to 3.3.
-- I would reformat Section 5 to separate the discussion of requirements and test cases. The structure of Section 6 could be improved in a similar way.
- Minor Issues:
-- Labels like C1, C4, etc., in Section 5 are used without explanation and should be introduced (they were in the workshop publication).
-- "The declarative specification of heterogeneous data sources (Data source specification) is supported to enable several use cases through different extensions" -> What is the subject of this sentence?
-- the template "http://example.com/{class}" on the right-hand side of Fig. 3 should be "http://example.com/{type}" instead.
-- Typo in Footnote 27.
Finally, I think that a couple of statements in the paper should be slightly adjusted. Specifically:
1) "For an average developer, without a deep understanding of RDF, our template-based approach appears to be less verbose and simpler than RML-based solutions": according to my understanding, RML is not designed with that goal in mind. The goal of RML is rather to provide an exchange format for declarative mappings towards heterogeneous sources.
2) "to generate RDF-star, a user knowing MTL should only be able to write RDF-star, while a user knowing RML should learn RML-star": to be fair, one should also acknowledge that RML-star is just a slight extension of RML. Anyone mastering RML should not encounter particular problems in "learning" RML-star.
*** Reproducibility ***
The reproducibility of the work seems appropriate. The authors provide all necessary materials to repeat the experiments in a public github repository, and the tools are distributed under an Apache 2.0 license.
*** Final Considerations ***
The abstract framework introduced by the authors and the tools presented are valuable contributions to the community. However, the following aspects require attention before the work can be considered for publication:
- Explicitly address the overlap with [A] and better articulate the novelty of the current submission.
- Revise the claims made in the abstract and conclusions regarding literature gaps to ensure consistency with the discussion in the main text.
- Provide a more detailed quantitative analysis, including validation of manually converted mappings, scalability experiments, and comparisons with additional tools.
- Improve the paper’s structure, formatting, and clarity to enhance readability.
With these revisions, the paper would make a stronger and more compelling contribution to the field.
[A] Mario Scrocca, Alessio Carenini, Marco Grassi, Marco Comerio, Irene Celino: Not Everybody Speaks RDF: Knowledge Conversion between Different Data Representations. KGCW@ESWC 2024
|