Review Comment:
Summary:
This paper proposes an approach to translate natural language to SPARQL to access legacy databases through a KG-based layer or rather a SQL to SPARQL and DB to KG translation (see below). To this end, the outputs of ValueNet, an existing NL-to-SQL model, are translated to SPARQL.
Overall evaluation:
The abstract and introduction vastly overstate the actual contributions of this submission, making it difficult to properly assess the proposed approach. It seems to be translating SQL to SPARQL queries with a deterministic approach and relational databases to KGs (strongly building on the existing Ontop). Natural language questions are translated to SQL queries with an already published approach. Thus, the proposed approach lacks a sufficient level of innovation and the unclear focus of the submission affects the line of argumentation. Due to the lack of novelty, I recommend rejecting the submission, since this is an aspect hard to change within a reasonable revision time period of three months.
Focus:
The focus of the proposed approach seems rather unclear. The argument in the abstract and introduction is that the paper proposes an approach that produces high-quality SPARQL queries able to access simplistic KG-layers over legacy databases. However, the evaluation seems to focus on Spider and LC-QuAD - typical datasets for natural language to SQL/SPARQL not related to legacy databases with KG layers. Later on the contribution turns into an improving SQL via KG generation/enrichment argumentation and then to translating SQL to SPARQL and RD to KG.
In the abstract the claim that no NL-SPARQL pairs made me expect a zero-shot approach, which turned into a deterministic SQL to SPARQL translation. The argument that there are few "deep learning based systems for KGs" (p. 2, 24) does not reflect the current state of literature on NL-to-SPARQL let alone the very vibrant field of KG embeddings, entity linking, GCNs, etc.
Novelty:
Regarding the novelty and focus, the authors seem to be unsure themselves (p. 8, "Our main focus and novelty of our approach? lie in Step 2"). Step 2 is the deterministic approach to translating SemQL, a neural network-derived intermediate representation from NL-to-SQL, to SPARQL. Since NL-to-SPARQL requires SemQL, which basically requires translating KG queries in LC-QuAD to SQL queries and limiting the KG to the portion relevant for the query, the main contribution of this paper seems to be this Step 2. Another contribution is extending the proposed Ontop method in case of missing constraints and primary keys in the DB.
Design:
The difference between Ontop and MPBoot is fuzzy and in the evaluation MPBoot does not seem to play a role any more. What exactly is the added benefit of MPBoot over Ontop and ODM and how do you define each of these (see also comments to authors below)?
Evaluation:
Why evaluate on the development set and not the test set? If ValueNet was trained seeing the development set, it is irrelevant that no additional training is performed in this particular submission, since it reuses the previously published and trained model. Presenting an average accuracy evaluation (not per database like Figure 10), would strongly benefit comparison.
While the experiment on the LC-QuAD 1.0 dataset is supposed to show the generalizability of the approach, unfortunately this might not be the case. First, the SPARQL queries are translated into pseudo-SQL and then the KG is limited to the "portion of the ontology explicitly involved in each single query". The baseline systems are mentioned in Section 6.4. but not even named where the actual comparison takes place, let alone described (not even in the Related Work). Hence, the choice is also not justified. Since the dataset utilized for the test across the three approaches is a different size, the numbers are not directly comparable. Why was LC-QuAD 1.0 chosen over 2.0 published in 2019 and why not DBQNA?
Presentation and Style of Writing:
Figure 1 is rather confusing and hard to read. The query workload and the relational databases both serve as a basis for the KG enrichment/generation, however, the queries and the relational database have no interconnection at all. The NL query is only depicted as enriching the KG, but has no direct connection to ValueNet4SPARQL - only via SemQL, which relies on the relational database.
Certain key terms that are not self-evident and might be understood quite differently depending on background and specialization are not defined. For instance, workload-based analysis and query workload might benefit from early specification. In fact, I am not entirely sure why the approach is callded a workload-based analysis.
The paper could benefit from an improved structure and line of argumentation. A lot of basic details on Semantic Web and SQL are introduced in the middle of the paper's technical details, which makes it even harder to clearly discern them. A clear preliminaries section that separates the proposed approach from preliminaries could be an option.
Questions/comments to authors:
- Direct Mapping is introduced twice, once in Section 2.2. and once in Section 3.3.
- The presentation of the difference between Ontop, MPBoot, and Ontop Direct Mapping is very fuzzy up until page 8. Maybe it's just me but I read 4.1. and 4.2. three times only to find a partial answer finally in 4.3 - on the distinction between Ontop and MPBoot I am still not sure. How about already introducing ODM at the beginning of Section 4 where Ontop ist first mentioned and changing the first sentence of Section 4.3. ("As we have seen, Ontop can be used.." - made me think all in Section 4.2. was on Ontop/ODM). I'd also suggest improving the introduction of MPBoot ("exploits the flexibility provided by R2RLM") and ODM ("has the advantage of providing the flexibility to adapt the generation process to specific needs").
- Please make your source code available on a platform dedicated to that purpose, e.g. GitHub, instead of in a Google Drive folder.
Minor comments:
p. 2, 20 adaption => adaptation
p. 4, 24 lcquad and spider => spelling
p. 3 Figure 2: black and white marking iand a figure with a reasonable font size is generally recommended
p. 6, 11 R2RML => best to introduce acronyms when first mentioning them
|