Review Comment:
The work proposes a Knowledge Graph Question Answering (KGQA) system based whose peculiarity is avoiding the translation of natural language questions into formal KG query languages such as SPARQL: the answer is extracted and verbalized from the KG properties. This has two key benefits with respect to the majority of state-of-the-art KGQA frameworks:
1) the system does not need training on the specific structure of the particular KG(s);
2) multiple KGs as well as unstructured texts can be combined.
Evidence (in the form of phrases supporting the answer) and a confidence value are emitted together with each answer.
A system prototype has been implemented, code and data are provided in a resource. Experimental evaluation has been carried out for all the main aspects of the proposed approach: results show the proposed system is very close to state-of-the-art systems performance-wise, while having easier setup and greater flexibility.
The work appears as scientifically sound and fairly novel. Individual components often leverage well-known techniques, but the overall system architecture and behavior are distinctive enough with respect to existing proposals.
Results are significant because they demonstrate the feasibility and effectiveness of a different type of KGQA approach from the majority of existing proposals, and at the same time they open up some questions for further investigation.
The manuscript is well structured and the proposed methods are described in adequate technical detail.
Some aspects of the manuscript can be improved, as explained in what follows.
The README in the attached data file explains the name MuHeQA stands for Multiple and Heterogeneous Question Answering. This explanation is missing from the manuscript: it would be appropriate to put it upon first mention of MuHeQA in the introduction (page 2, line 23).
At the end of the introduction, a short summary of the subsequent article sections would be useful.
In the final example of Section 3.1.1, some part-of-speech (PoS) tags are used without definition, such as VBN and RP. For a clearer and more complete description of the algorithm, it could be useful to list and explain all PoS tags used by the adopted algorithm.
Most important, it should be clarified whether the PoS tagging algorithm is new or taken from the literature/software libraries, and in the latter case the source should be referenced. (Looking at the 'requirements.txt' file in the attached zip data file, I would presume the NLTK Python library was used for PoS tagging, but it should be stated in the manuscript; this could also help understand the meaning of PoS tags.).
Finally, in figure 2 it does not appear correct that a branch can skip 'Group 2' items: in that case, a keyword would be obtained just with one or two JJ/CC items, which is not like what is described in Section 3.1.1 (and there would be also the trivial match of the regular expression represented by the state machine with the empty string.)
Why a 'LIMIT 250' clause is present in the SPARQL queries shown in Figure 3? Does the provided implementation extract only 250 properties from Wikidata and DBpedia?
In the semantic similarity example in Section 3.1.2, it is not clear how the outcomes of the property level and description level similarity are obtained, more details should be provided.
In Section 3.3, the question 'How many active ingredients does paracetamol have?’ does not make sense. as paracetamol itself is an active ingredient, and the reported answer lists four commercial names for paracetamol, not active ingredients. It is not clear whether the example has been formulated this way by mistake or purposefully (but in that case the purpose should be explained).
In Section 4.2, the adopted metric is not completely clear. In particular, do "for each answer" and "of all answers" (page 10, lines 23-24) actually mean "for each question" and "of all questions", respectively?
Eight references out of 41 are taken from non-peer-reviewed sources like arXiv. Whenever possible, check whether these works have been eventually published in a peer-reviewed venue, and decide on the opportunity of keeping the reference or not. For example, reference [4] was published as a poster paper in ICLR 2021.
Language and style are generally good. Minor issues:
- Missing punctuation or extra spaces in some places.
- In journal articles the usage of the first person ("we") and contracted forms ("let's") is usually discouraged.
- Page 1, line 39: "Knowledge graph Question Answering (KGQA)" -> "Knowledge Graph Question Answering (KGQA)"
- Page 1, line 42: it is better to move footnote 2 and "for property graphs" from page 2, line 1 to there, where Cypher is first mentioned.
- Page 2, line 32: "i.e" -> "i.e."
- Page 7, line 24: "we develop a basic solution" -> " a basic solution has been developed"
- Page 8, line 11: "i.e" -> "i.e."
- Page 9, line 1: "fined-tuned" -> "fine-tuned"
- Page 9, line 29: "aka." -> "a.k.a."
- Page 9, line 22: "question-answer interface" -> "question-answering interface"
- Page 9, line 31: "what means that the structure of the answers are dictated" -> "which means that the structure of the answers is dictated"
- Page 10, lines 17 and 18: "considers valid" -> "considers as valid"
- Page 10, line 20: "selects the three most relevant" -> "selects the three most relevant answers"
- Page 10, line 30: “of a valid answer(s)” -> “of valid answers”
- Page 10, line 40: “(.e.g.” -> “(e.g.”
- Page 13, line 21: “STaF-QA” -> “STaG-QA”
- Page 13, line 30: “highlights” -> “highlight”
DATA FILE
The data file is published on GitHub. It contains a README file with instructions, the source code, data and queries to reproduce the experiments reported in the manuscript. The contents seem complete and sufficient for reproducibility of results.
|