Review Comment:
This paper presents a new system for question answering over linked data, which focuses on ease of adaptation to new languages and new datasets. This is an admirable goal, however my reservation of this is that the approach seems not to be greatly novel. My understanding of this is that the approach involves an over-generation of queries based on using known lexicalizations of properties (e.g., rdfs:label) in the datasets, followed by a 5-feature ranking procedure. This seems like a good generic procedure, however the choice not use syntactic tools such as parsers seems to affect performance notably.
Moreover, the authors propose this as an approach that is easily adapted to new languages, as it does not rely on a syntactic parser, however this does not necessarily make the approach more adaptable to new languages, and as the authors themselves note, performance on even major languages such as Italian quickly drops due to the lack of labels for terms. Moreover, the lack of syntactic analysis may make the system perform much worse on tricky questions (for example those involving negation) as shown in Section 5.2.1. (This was more clearly presented in the first version of this paper)
The evaluation is presented mostly based on recent QALD benchmarks and the results are quite mixed, even as presented. In fact for some benchmarks, there is quite a difference between the existing state-of-the-art (F=0.72) and this proposed system (F=0.52) underlining the difference with approaches that use more linguistic analysis. I was surprised that some pubilshed results on some benchmarks are omitted, for example QALD-7 reports systems with F=0.75, three times the reported value here and so it is unclear why they are not included in Table 3. It would be best if the authors tried to include some of these existing systems in their benchmark.
(p5) "weights were determined manually" could you expand on this and perhaps provide a more principled reason for the selection of weights?
(p7) On HDT versus traditional databases. I wonder if this could be quantified... e.g., by implementing a similar search using a SPO style triple store.
The paper is very well-written in terms of language and quite clear, there were only a few minor issues
p5. "examplary" => "example"
Tables should remain inside the margins, especially Table 3 should use the full page width not flow onto the next page. In Table 6 the text crosses over the column line.
"reefication" => "reification"
|