Influence of event representation patterns on some aspects of natural language querying interfaces to ontologies

Tracking #: 3068-4282

Authors: 
Rita Butkienė
Algirdas Šukys
Linas Ablonskis
Rimantas Butleris

Responsible editor: 
Elena Demidova

Submission type: 
Full Paper
Abstract: 
This paper investigates how the application of alternative event representation patterns affects complexity and performance-related aspects of a natural language querying interface: the size of ontology schema and vocabulary, the performance of querying and data import operations, size of the semantic repository and query complexity. The results are based on both experimental and analytical investigation in the context of a custom natural language querying system working on OWL 2 ontologies and employing SBVR vocabularies for bridging lexical and semantic gaps. The workings of the system used for our investigation are typical for a class of similar natural language querying interfaces, thus the results are applicable in the context of all natural language querying systems having a similar design.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Sherzod Hakimov submitted on 18/Mar/2022
Suggestion:
Reject
Review Comment:

The paper presents an evaluation of ontology schema variants built using Semantics of Business Vocabulary and Rules (SBVR)vocabulary. The focus of the paper is on analyzing the effect of each artifact corresponding to three ontology variants in terms of query time, data insertion time. The applicability of the presented method is intended for applying to question answering systems on event-centric data.

------------------------

Motivation: The paper presents an introduction into the question answering field on how the natural language questions are transformed into executable queries with use of vocabularies or query patterns. It is not clear what is the main motivation of the paper from Section 1 where the paper proposes using SBVR vocabulary to create different variants and evaluate the performance on query execution and data insertion times. It lacks the explicit mentions of the contributions of the presented paper.

Presented Idea: The main idea of the paper is to use SBVR vocabulary for translation of natural language questions into SPARQL queries. The question answering is utilized in answering questions about certain aspects of events such as “what did the talk about”, “what did the person confirm”, “what did the person emotionally confirm”, “what did the person talk about in a given year”, “what did the person emotionally confirm in a given year”. These competency questions are explained in Section 4. The application side of this presented method is to apply on querying Lithuanian news portal to find information - using the competency questions.

Paper weaknesses: The presented paper has multiple flaws with regard to novel contributions and description of the presented idea.

The paper lacks the novelty aspect in showing the effect of ontology design on query performance. The paper includes a single knowledge graph and handful query types (that are materialized with respective event related entities). One can not come to a conclusion by looking at these obtained findings. It is also mentioned in Section 5 "The results of evaluating vocabulary size will not be directly applicable to different vocabulary implementations, however they allow an approximate assessment of how the size of the lexicon depends on the chosen schema." One would expect the paper to include such comparisons between different vocabulary implementations and find out whether the condition really holds.

Overall: The novel contributions of the paper are limited based on the justifications provided above. The manuscript has improved based on the reviewer comments but it still lacks in identifying an important research question and providing possible solutions or in-depth analysis for it.

Review #2
Anonymous submitted on 28/Mar/2022
Suggestion:
Major Revision
Review Comment:

The paper presents an evaluation of different ontology representation patters and how they affect SPARQL query performance, data import operations, the size of a triple store and the size of a query. The results of the evaluations can be helpful to many but are limited to only the GraphDB graph database. I can see the contribution of this work for future research but there are some inconsistency regarding its presentation overall. Although the level of English is good, I found it hard following the content at times. See below for more details.

The introduction can be more consice (i.e. present the main research motivation, research questions and your contribution). It is mentioned that you focus on different events. Later, only the event of "talking" and its types is mentioned. It would be good to name the specific types of NLQI systems did your work focused on.

In related work section, sometimes it is unclear if you review someone else's work or your own previously published work. It should be made clearer what you present from already published work and what you present now as a new idea in this manuscript. For example, if the current proposed work is an extension of an existing solution. Several references are missing in this section. Namely of OWL, OWL2, OMG. Further, these abbreviations should be mentioned in the introduction and used consistently through the paper.

The beginning of Section 2.2 sounds more suitable for the introduction. It would be good if it is integrated there. When reviewing the SEM ontology, it should be explained "minimum semantic commitment" means. The names of the reviewed ontologies, when having an abbreviation, should be spelled out first in full. See LODE and its full name in the references. In Section 2.3, what are the pros and cons that have been identified?
In some places, when using "both ontologies", it is not clear to which two ontologies it is referred to. It would be good to see the actual namespaces of the defined concepts.

In Section 3.1, are the mentioned questions the actual competency questions that was refered to in the introduction of Section 3?

In Section 3.1, first sentence, what fragment of which ontology was reused?

Missing reference of GraphDB in Section 3.3. Further, it was not clear which version of GraphDB was used? The free or the licensed one as each supports different functionalities. Was it a local instance of it or on a server?

It would be good to present the recommendations in a more clearer way. Maybe a separate sections even or a table to refer to.

Section 5 has a paragraph with only one sentence.

There is a dot after each reference number, which should not be there. Were references [12], [23], [28] all accessed in 2020 (almost 2 years ago)?

The provided GitHub repository is well structured and has a good description.

Overall, the proposed paper is interesting but needs improvements if it is to be published.

Review #3
Anonymous submitted on 08/May/2022
Suggestion:
Reject
Review Comment:

The article aims to investigate how ontology design patterns for event representation impact the performance of natural language (NL) query interfaces. Given a complex event structure, we can expect that different event modelling patterns with various ontologies, such as those reviewed in Section 2.2, will impact the performance of semantic question answering systems that translate NL-queries into SPARQL.

However, the article lacks a novel contribution. Evaluation metrics do not directly measure the effect of the event representation on the performance of the NL-2-SPARQL query mapping, which is a central aspect to investigate in this context. Instead, some proxy metrics such as SPARQL execution time are used. However, these proxy metrics refer to the SPARQL rather than NL-queries.

Furthermore, the authors evaluate different event-modelling design decisions in a specific semantic search framework (SBVR). The discussion is tightly connected to the SBVR framework and limited by it, and experiments are limited to a few particular event types. It is unclear how generalisable the discussion of performance (e.g. SPARQL query execution time) is to other systems and event types.