Review Comment:
This paper surveys Multilingual Question Answering Systems for Knowledge Graphs.
The current version was revised according to the comments of the reviewers.
Many comments were addressed. Relevant information that was previously missing was added, including some recent works, but the paper was not restructured.
On the suggested dimensions for the review, the paper:
(1) Is suitable as an introductory text, even if the structure is not the most friendly;
(2) Is comprehensive and has a balanced coverage, even if the structure is not balanced (Sec 4 uses half of the paper);
(3) Is well-written but, again, would highly benefit from a reorganisation;
(4) Covers material that is important for, and not limited to, the Semantic Web community.
Before some comments on the structure, I note that the (now) 12 papers considered even if left out of the selection procedure were still not identified. This makes me wonder whether the adopted procedure is reproducible and actually suitable for the task. If it were, would it leave out some many relevant papers?
But most of my comments are not on the relevance of the survey nor the quality of the covered content. They are on form / organisation, towards better highlighting the contributions according to what I would expect in a survey. In my opinion, the paper was already too long and verbose. Now it is even more, without necessarily needing so.
Most of Sec 4.1 is a mere enumeration of systems and their descriptions, almost independently of each other, much like a compilation of paper abstracts.
Table 4 helps, but authors should take advantage of its value. I strongly suggest the section to be structured around the columns of this table from the beginning. This would highlight the actual analysis, put things more systematic, thus making it easier to compare systems and identify trends.
The taxonomy in Sec 4.2 would be better supported if the surveyed systems were classified according to it. Again, this would highlight the contribution and enable to identify the most common groups (possibly by time periods) or overlaps between groups, among others. Figure 1 uses a full page but does not add much.
The following sections suffer from similar problems.
Sec 4.3 seems detached from the previous, without links to the surveyed systems.
The problem of Sec 5 is similar to the one of Sec 4.1: datasets are described almost independently of each other.
Sec 6 would be more useful if it had links to the surveyed systems and benchmarks where the challenges are raised from.
Sec 7 is a brief summary of the approach but it does not have actual conclusions nor summarises the takeaways (even though some are already in the Discussion).
Minor issues:
- p2: , another example -> . Another example
- p3: therefore utilize the latter one ?
- p13: "results are as follows: 39.29%, 33.02%, 23.74%, and 24.56%" -> what metric?
- p16: used in the NLP -> use in NLP
- Sec 3 has no introductory text.
- The sentences of the last paragraph of Section 4 seem incomplete.
- Secs 5.1 and 5.2: some confusion between benchmarks and series of benchmarks.
- p25: question number?
|