Constructing, Enriching and Querying Knowledge Graphs in Natural Language

Tracking #: 3313-4527

Catherine Kosten
Ursin Brunner
Diego Calvanese
Philippe Cudre-Mauroux1
Davide Lanti
Alessandro Mosca
Kurt Stockinger

Responsible editor: 
Guest Editors Interactive SW 2022

Submission type: 
Full Paper
As Knowledge Graphs (KGs) gain traction in both industry and the public sector, more and more legacy databases are accessed through a KG-based layer. Querying such layers requires the mastery of intricate declarative languages such as SPARQL, prompting the need for simpler interfaces, e.g., in natural language (NL). However, translating NL questions into SPARQL and executing the resulting queries on top of a KG-based access layer is impractical for two reasons: (i) automatically generating correct SPARQL queries from NL is difficult as training data is typically scarce and (ii) executing the resulting queries through a simplistic KG layer automatically derived from an underlying relational schema yields poor results. To solve both issues, we introduce ValueNet4Sparql, an end-to-end NL-to-SPARQL system capable of generating high-quality SPARQL queries from NL questions using a transformer-based neural network architecture. ValueNet4Sparql can re-use neural models that were trained on SQL databases and therefore does not require any additional NL/SPARQL-pairs as training data. In addition, our system is able to reconstruct rich schema information in the KG from its relational counterpart using a workload-based analysis, and to faithfully translate complex operations (such as joins or aggregates) from NL to SPARQL. We apply our approach for reconstructing schema information in the KG on the well-known data set Spider and show that it considerably improves the accuracy of the NL-to-SPARQL results---by up to 36% (for a total accuracy of 94%) - compared to a standard baseline. Finally, we also evaluate ValueNet4Sparql on the well known lcquad data set and achieve an F1-score of 85%, which outperforms the state-of-the-art system by 17%.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Dagmar Gromann submitted on 12/Feb/2023
Review Comment:

This paper proposes an approach to translate natural language to SPARQL to access legacy databases through a KG-based layer or rather a SQL to SPARQL and DB to KG translation (see below). To this end, the outputs of ValueNet, an existing NL-to-SQL model, are translated to SPARQL.

Overall evaluation:
The abstract and introduction vastly overstate the actual contributions of this submission, making it difficult to properly assess the proposed approach. It seems to be translating SQL to SPARQL queries with a deterministic approach and relational databases to KGs (strongly building on the existing Ontop). Natural language questions are translated to SQL queries with an already published approach. Thus, the proposed approach lacks a sufficient level of innovation and the unclear focus of the submission affects the line of argumentation. Due to the lack of novelty, I recommend rejecting the submission, since this is an aspect hard to change within a reasonable revision time period of three months.

The focus of the proposed approach seems rather unclear. The argument in the abstract and introduction is that the paper proposes an approach that produces high-quality SPARQL queries able to access simplistic KG-layers over legacy databases. However, the evaluation seems to focus on Spider and LC-QuAD - typical datasets for natural language to SQL/SPARQL not related to legacy databases with KG layers. Later on the contribution turns into an improving SQL via KG generation/enrichment argumentation and then to translating SQL to SPARQL and RD to KG.

In the abstract the claim that no NL-SPARQL pairs made me expect a zero-shot approach, which turned into a deterministic SQL to SPARQL translation. The argument that there are few "deep learning based systems for KGs" (p. 2, 24) does not reflect the current state of literature on NL-to-SPARQL let alone the very vibrant field of KG embeddings, entity linking, GCNs, etc.

Regarding the novelty and focus, the authors seem to be unsure themselves (p. 8, "Our main focus and novelty of our approach? lie in Step 2"). Step 2 is the deterministic approach to translating SemQL, a neural network-derived intermediate representation from NL-to-SQL, to SPARQL. Since NL-to-SPARQL requires SemQL, which basically requires translating KG queries in LC-QuAD to SQL queries and limiting the KG to the portion relevant for the query, the main contribution of this paper seems to be this Step 2. Another contribution is extending the proposed Ontop method in case of missing constraints and primary keys in the DB.

The difference between Ontop and MPBoot is fuzzy and in the evaluation MPBoot does not seem to play a role any more. What exactly is the added benefit of MPBoot over Ontop and ODM and how do you define each of these (see also comments to authors below)?

Why evaluate on the development set and not the test set? If ValueNet was trained seeing the development set, it is irrelevant that no additional training is performed in this particular submission, since it reuses the previously published and trained model. Presenting an average accuracy evaluation (not per database like Figure 10), would strongly benefit comparison.

While the experiment on the LC-QuAD 1.0 dataset is supposed to show the generalizability of the approach, unfortunately this might not be the case. First, the SPARQL queries are translated into pseudo-SQL and then the KG is limited to the "portion of the ontology explicitly involved in each single query". The baseline systems are mentioned in Section 6.4. but not even named where the actual comparison takes place, let alone described (not even in the Related Work). Hence, the choice is also not justified. Since the dataset utilized for the test across the three approaches is a different size, the numbers are not directly comparable. Why was LC-QuAD 1.0 chosen over 2.0 published in 2019 and why not DBQNA?

Presentation and Style of Writing:
Figure 1 is rather confusing and hard to read. The query workload and the relational databases both serve as a basis for the KG enrichment/generation, however, the queries and the relational database have no interconnection at all. The NL query is only depicted as enriching the KG, but has no direct connection to ValueNet4SPARQL - only via SemQL, which relies on the relational database.

Certain key terms that are not self-evident and might be understood quite differently depending on background and specialization are not defined. For instance, workload-based analysis and query workload might benefit from early specification. In fact, I am not entirely sure why the approach is callded a workload-based analysis.

The paper could benefit from an improved structure and line of argumentation. A lot of basic details on Semantic Web and SQL are introduced in the middle of the paper's technical details, which makes it even harder to clearly discern them. A clear preliminaries section that separates the proposed approach from preliminaries could be an option.

Questions/comments to authors:
- Direct Mapping is introduced twice, once in Section 2.2. and once in Section 3.3.
- The presentation of the difference between Ontop, MPBoot, and Ontop Direct Mapping is very fuzzy up until page 8. Maybe it's just me but I read 4.1. and 4.2. three times only to find a partial answer finally in 4.3 - on the distinction between Ontop and MPBoot I am still not sure. How about already introducing ODM at the beginning of Section 4 where Ontop ist first mentioned and changing the first sentence of Section 4.3. ("As we have seen, Ontop can be used.." - made me think all in Section 4.2. was on Ontop/ODM). I'd also suggest improving the introduction of MPBoot ("exploits the flexibility provided by R2RLM") and ODM ("has the advantage of providing the flexibility to adapt the generation process to specific needs").
- Please make your source code available on a platform dedicated to that purpose, e.g. GitHub, instead of in a Google Drive folder.

Minor comments:
p. 2, 20 adaption => adaptation
p. 4, 24 lcquad and spider => spelling
p. 3 Figure 2: black and white marking iand a figure with a reasonable font size is generally recommended
p. 6, 11 R2RML => best to introduce acronyms when first mentioning them

Review #2
By Isaiah Onando Mulang' submitted on 05/Apr/2023
Minor Revision
Review Comment:

(1) Originality
Rating : 1/4
Even though the paper addresses a long standing yet unsolved task in NL - KG communication in the form of SPARQL query generation, I found that the approach used to address the problem generally consisted reuse of other solutions. As such the novelty is not that strong. The novelty of the paper lies majorly on the choice engineering in the preprocessing and post processing stages of the actual solutions…...

On the other hand due to improper problem statement and motivation (literally even the Research Questions); the solutions discussed in the paper are not solving the problem of NL-to-SPARQL.
Authors need to reassess the problem they are solving and redesign their research questions.
The actual method for NL-to-SPARQL is a reused approach and not novel

(2) significance of the results
The evaluation and results presented in the paper are very impressive.
Particularly, I found the generalisation results on LC-QUAD vital and necessary for the approach.
The authors expose the enhanced dataset online which will then improve research on the task.

(3) quality of writing.
Rating: 2/4
The paper is well written and organized with proper scientific language and easy to follow. I did not find any challenge with the flow and ease of understanding in the paper as well as the organizations of the different sections.

I found a challenge with the motivation of the problem, The authors give quite a historical perspective of the problem and indicate that this problem is challenging but did not dive into details on how the problem is challenging. I would prefer that the paper should detail better on the exact challenge of the problem and why it is difficult to solve (i.e. NL to SPARQL). Perhaps a motivating example could help bring this forward.

These challenges mentioned in the abstract may need rework and restructuring:
“…..However, translating NL questions into SPARQL and executing the resulting queries on top of a KG-based access layer is impractical for two reasons: (i) automatically generating correct SPARQL queries from NL is difficult as training data is typically scarce and (ii) executing the resulting queries through a simplistic KG layer automatically derived from an underlying relational schema yields poor results…..”

We may not claim the task to be impractical as there are already existing and acceptable approaches in the body of knowledge., likewise, we can not say that that execution of the results yield poor results as this is still an open research question, what if tomorrow we have good results.

In this respect: I am confused the actual problem in the paper i.e.:
a) Is the paper solving the problem of lack of data in the domain (hence producing an approach to overcome this?)
b) And the problem of making produced SPARQL queries retrieve better results?


c) The task of end-to-end NL-to-SPARQL translation??

I found a) & b) handled in the paper but not c) while the authors claim they are solving c.

Resources for Reproduction
I did not find any challenges with the shared resources.
Authors could do with better documentation on the github page.

Review #3
Anonymous submitted on 08/Apr/2023
Review Comment:

(1) Originality
The paper proposes ValueNet2SPARQL - a modification of the ValueNet model that was trained on Spider database for NL (natural language) to SQL translation via an intermediary formal language SemQL. The two intrinsic challenges in NL-to-SPARQL are: (i) improving the accuracy of NL-to-SemQL translation and (ii) converting intermediate SemQL-to-SPARQL. The former is the harder problem to solve, while the latter is an engineering challenge. After carefully studying this paper, I found that the authors have solely focused on the latter. More specifically, in section 2.3, the authors claim that they are addressing the preprocessing stage of mapping NL query tokens to corresponding KG (Knowledge Graph) classes and properties. But then I failed to find where and how that is being addressed. In essence, SemQL-to-SPARQL being the focus of the paper, the specific algorithms proposed were to handle joins, aggregate queries, and set operations. However, I again failed to understand why that should be considered something that is research-wise challenging and what research insights we get out of it (I am specifically referring to Algorithm 2 and Algorithm 3 in sections 5.2.1 and 5.2.2). Also, in Algorithm 2 (line 35 of page 12), I missed how the real problems of synonymy, hypernymy, meronymy, and homonymy are being resolved.

(2) Significance of the results
If we look at this paper only within the scope of the engineering solution, then the results (in terms of accuracy) over different (enriched) knowledge graphs are comparable to the base ValueNet model that was designed for NL-to-SQL. In other words, the authors do an impressive job translating SemQL to SPARQL w.r.t to the knowledge graphs (ref. Fig. 10). Also, it has been established that enriching the KGs have been effective since that boosts the accuracy of ValueNet4SPARQL significantly, and makes it comparable to ValueNet.

(3) Quality of writing
Overall the writing was easy to follow. However, I would have liked to have a more detailed description of the base model (i.e., ValueNet) and the introduction of the main contribution (i.e., SemQL-to-SPARQL) right in the beginning. It took me a while to understand that the contribution does not involve NL-to-SemQL in any way. However, it can be just me.

Please also assess the data file provided by the authors under “Long-term stable URL for resources”.

In particular, assess
(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data - Yes

(B) whether the provided resources appear to be complete for replication of experiments, and if not, why – It looks complete, although I did not get the time to replicate the experiments myself.

(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and – NA

(4) whether the provided data artifacts are complete. - Yes