Review Comment:
The paper presents an approach, based on theory in Computational Linguistics, for semi-automatic conversion of natural language queries to formalised (semantic) queries. The aim is to improve interaction between humans and computers in question answering, especially for non-SW experts.
The paper aims to address one of the topics in the special issue - NLI, with a focus on the challenges non-SW experts face in querying different types of data. I am not convinced, however, that this is done. Quite detailed theory is presented, at a fairly high level, on CL methods for parsing natural language, but the authors never really explain what it is they do, or if the approach described is implemented or evaluated. It is therefore impossible to tell if they are actually able to support their target users. Further, considering the aims of the special issue - to support wider adoption of SW technology, especially by non-experts, I would expect some interaction with these target end users, from eliciting and validating requirements, to evaluation. Ideally, this should be done with HCI or UI experts, or at least, some evidence should be provided that indicates the application of user-centred design. However, beyond stating the well-known challenges non-experts face in using SW technology, no attempt appears to have been made to determine if the approach does result in the generation of UIs that support the target user. In fact, the paper only deals with the (underlying) technology for parsing input, NOT with the UI at all. The title and intro are therefore misleading and the paper is probably targeted at an unsuitable venue.
This is not to say that the approach may not be valid, but that the authors have not proved so. The paper is mostly a (fairly dense) discussion of CL approaches to parsing natural language. It only obliquely addresses requirements of the non-SW or non-CL domain expert end user. If the aim is to focus only on the CL contribution, then I would expect the paper to report novel methods in CL to tackle this challenge. However, this is not the case - the key contribution of the paper is therefore difficult to identify.
S4 contains no discussion, and the conclusion is simply a brief rehash of the authors' initial aims. But with no real evidence to support what is said. Finally, for this kind of journal paper, at least some sort of evaluation, should have been carried out - this would have prompted the discussion expected to conclude the paper and validated any conclusions drawn.
This paper is very difficult to follow, for three main reasons
- the presentation of related work, the theory and methodology followed is at a fairly high level, and for what appears to be a system/applications paper, without a clear mapping to actual implementation or design
- there is no clear, consistent "story" that helps the reader make connections between the different sections. Also, a number of unrelated examples are used across the paper and within even single sections, or for the latter, only part of an example, but presented inconsistently (e.g., Table 1 & text in S3.2.1).
- fairly complex language is used where simple words or terms would suffice, making it difficult to understand what is being said. Because of this there are also sections that simply do not make sense.
S2 & 3 are confounded by the use of multiple, unrelated examples. S2 is particularly difficult to follow, especially with the inconsistent and mostly unrelated examples used to illustrate the combinations. Also, whether this is a proposal or a description of actual implementation in some form that allows for testing and evaluation with real users is not clear.
Why is Nooj the choice of tool for implementation? What does it provide over other similar systems? This is especially important, as the authors describe it as a "complex NLP environment".
Further, the examples in S3.2 are in Italian - considering the target audience are required to understand only English, this needs translation. Especially since the examples in the table only overlap with those in the text. I would suggest the authors start with one clear set of examples, and use these throughout to illustrate the different aspects of parsing and analysis they carry out. This would make it easier to understand the authors' approach and assess what it provides over other existing work.
Section 3 "Experiment and Results" opens ... "Starting from this NLP theoretical and practical framework, in this project we propose to build an User Interface for KMS ..." - this is contradictory (see also above) - if a proposal, then it cannot be reported as results. Further, no actual information is provided about the UI.
************** OTHER POINTS
The authors claim to support keyboard-based or voice input - however, how either affects the formalisation of queries, or how whatever differences exist are supported, is never actually addressed.
"Nowadays, humans usually make efforts in “translating” that query into proper keywords, or even into non-acceptable1 sequences of nouns and/or adjective which they never would use in ordinary communication. ..." - examples that illustrate what is meant by "proper keywords" and other ways in which end users formulate queries would be useful. Mapping these to the examples for automatic parsing later in the paper would help the reader to assess what the added value is of this approach.
What is the API used to provide the "ideal solution" to the FST/FSA approach?
"Anyway, our approach is founded on a not statistically-based linguistic formalization which ensures a low degree of ambiguity, a low loss of meaning and an accurate matching between linguistics structures, domain concepts and programming language." - this says what it is NOT - however it is not clear what the authors actually DO.
The distinction between "computerized" and "electronic" dictionaries should be made when they are first mentioned. The content in footnote 9 probably also belongs there.
Further, saying that "All electronic dictionaries built according to LG descriptive method form the DELA System, ..." is debatable.
S3.1 is, simply, difficult to understand. Wrt the domain modeling - saying the "[OO] semantic model and its terminology are compatible with ... RDF" is redundant - that is the whole point of using ontologies. Further, what is being said in the following sentence "Actually, this ontology was already available and is constantly developed. " - is this work contributing to CIDOC? The discussion about the different levels of ontologies is contradictory, and the conclusion that follows is not obviously related to the rest of the section.
S3.4 - "deletion and reduction, which are present in sentence pairs/triples as: ..." - no examples of triples are given.
********
CITATIONS & REFERENCES
in intro - "The NLP activities sketched in this research fall inside Lexicon-Grammar (LG) theoretical and practical framework, ..." - needs a citation
FIGURES
Fig 2 is a simple linear diagram - it does not need to be represented in what at first glance appears to be a complex flow.
Fig 6 is split across two pages - they should appear on a single one - as is the reader must do extra work to relate the two parts. Further, what ontology is being referred to in E29 "Design or Procedure" class? And, in S3.4, for "Production" - which appears to be related? - it is not obvious how the examples in Figs 6 & 7 map to those terms (classes).
LANGUAGE & PRESENTATION
Too much information is placed in footnotes - they should be used only to provide additional information, or, for example, to provide links to more info.
"This aim seems easy to obtain, but the first trial, not yet surmounted, is to digit a query ..." - digit here is incorrect - is this meant to be digitise - even that, while not incorrect, is unusual. I suspect what is being said is the conversion to electronic form?
"1.1. Background
For several years, we will see that similar projects ..." - future tense is incorrect here - the section is presenting existing work.
ALL acronyms and abbreviations should be expanded at first use, in the main text - this is done only in some instances.
A large number of grammatical errors need correcting. A proof-read may help to improve readability.
|