|Review Comment: |
The paper describes an approach for question answering over RDF knowledge bases using SPARQL templates. For each question type supported, a template is defined whose variables are filled in with information retrieved from the question using both syntactic and semantic features.
Although the motivation is well-described, the emphasis in this section is mainly placed on comparing the proposed framework to other relevant ones, which should be part of the related work section. I would suggest that the authors revise this section, describing the challenges and motivation, giving also few details on what RDF cubes is (why are RDF cubes important? What problems do they solve? etc.), what is the approach followed in this paper and present more clearly the contribution of the paper. A short comparison to the state of the art (shortcomings, different approaches, etc), could be included in this section, but there is no need to be so detailed. More details can be provided in the related work section.
More details are needed here. For example: “which is based on the weighted percentage of Q that is covered by tagged chunks”, “Chunks have weight based on the apriori probability of being tagged and their length”. The authors should formalize these in order to be better understood. Moreover, “the results of the matching…”. What is the logic behind matching, how is it performed? Is semantics taken into account?
As the authors also explain, the system can answer only a specific type of questions with certain structural dynamics. It would be very useful to somehow summarize the structural question requirements of the framework, e.g. in a table, and also examples of questions that are not supported. For example, each pattern in Fig.3 can be enriched with some example textual questions, and also with examples that are not supported.
Overall, the framework seems to impose some quite hard restrictions on the types of questions that it supports (e.g. tokens must have a specific order), but also on the way information is captured in the KB (e.g. each dataset should have a default measure), hampering first of all flexibility but also makes difficult the use of the framework on top of existing KBs that do not follow the RDF cubes model. The authors should further elaborate on these and describe the way that this inflexibility can be overcome.
Also, I believe that section 3 should be revised in order to better formalize the algorithms and metrics used. Although a verbal description of the methodology sometimes is easier to be understood, important details are suppressed, which makes it difficult to obtain a clear view on the proposed approach.
An example is also missing that would further help the reader understand the approach.
As mentioned earlier, the framework requires a preprocessing phase in order to transform a KB into cube. Although the authors do not explicitly mention this, it seems that the graphical tool that has been developed for evaluating the framework runs over a preprocessed snapshot of QALD-6. If this is the case, more details are needed regarding the effort needed to transform a dataset into cubes and if there are restrictions. For example, any dataset can be transformed into cube? Is there any generic tool that can be used to achieve this? Do we need to develop from scratch a separate tool for each dataset?
Also, the results of the experiments seem to test the ability of the system to find the correct dataset but not the correct answer. If this is the case, it seems that the experimental evaluation may be relevant to QA systems that also follow the RDF cubes model. So, here I have two questions:
1. Is it possible to compare the system to other general purpose QA systems that do not follow the RDF cube model? (for example the systems that are part of the QALD challenges)
2. Why wasn’t possible to measure precision and recall on the actual results returned by the SPARQL queries, but they are computed at the level of datasets?
Also, the term: “statistical question answering” used in the title seems a little bit misleading. From what I have understood, there is no any statistical reasoning involved in the framework, at least not a “heavy” one, that justifies the classification of the approach as statistical one. A title containing “RDF cubes” or so seems to me more meaningful.