Sparklis: An Expressive Query Builder for SPARQL Endpoints with Guidance in Natural Language

Tracking #: 987-2198

Authors: 
Sébastien Ferré

Responsible editor: 
Eero Hyvonen

Submission type: 
Tool/System Report
Abstract: 
Sparklis is a Semantic Web tool that helps users explore and query SPARQL endpoints by guiding them in the interactive building of questions and answers, from simple ones to complex ones. It combines the fine-grained guidance of faceted search, most of the expressivity of SPARQL, and the readability of (controlled) natural languages. No knowledge of the vocabulary and schema are required for users. Many SPARQL features are covered: multidimensional queries, union, negation, optional, filters, aggregations, ordering. Queries are verbalized in either English or French, so that no knowledge of SPARQL is ever necessary. All of this is implemented in a portable Web application, Sparklis, and has been evaluated on many endpoints and questions. No endpoint-specific configuration is necessary as the data schema is discovered on the fly by the tool. Online since April 2014, thousands of queries have been formed by hundreds of users over hundreds of endpoints.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Norbert E. Fuchs submitted on 16/Apr/2015
Suggestion:
Accept
Review Comment:

"SPARKLIS: An Expressive Query Builder for SPARQL Endpoints with Guidance in Natural Language" by Sébastien Ferré

Summary of the Contents
-----------------------------
The contents of this paper is best described as a concrete answer to the question “How to explore and query a large unknown [SPARQL] endpoint beyond the most simple queries without reading or writing any SPARQL, and without preprocessing or configuration?” that the author formulates at the end of his paper.

The goal of the author is to access SPARQL endpoints in a manner that combines user guidance and expressivity. Expressivity is given by providing a large and powerful subset of SPARQL rendered as phrases of a controlled natural language. User guidance results as the combination of the controlled natural language – the paper describes only the English version, but mentions also a French version – and predictive editing which allows users not familiar with SPARQL to interactively construct a query by selecting predefined constituents. In this way each query is syntactically correct, and – importantly – always generates an answer. Answers are given at each instant of the query construction in tabular form using a subset of the controlled natural language.

The paper describes in detail the architecture of the web-client SPARKLIS that incorporates the above functionality. Furthermore, the paper illustrates by selected screen-shots and a table the step-by–step construction of a concrete query together with the respective intermediate answers. The example demonstrates beyond the ease of query construction also its flexibility: constituents can be added or inserted in any order and can be removed individually. In each instance answers are automatically adjusted. For those interested the web-client also shows the query in SPARQL.

Subsequently, the author evaluates the capabilities and the limitations of SPARKLIS according to the following dimensions: expressivity, user guidance specifically with respect to safeness and completeness of query answering, verbalisation in (controlled) natural language, scalability with respect to the size of the SPARQL endpoints and the response time, and finally portability. I briefly worked with the SPARKLIS web-client and found it immediately understandable and highly usable - though of course in the limited time I could not assess the scope of the SPARQL subset covered, the scalability and the portability.

In the next section the author evaluates the impact of SPARKLIS, and comes to a very positive result concerning not only the number of users world-wide, but also regarding continuity of usage, the number of SPARQL endpoints accessed, and the complexity of queries – though shorter queries prevail.

The final section summarises the results and gives an outlook to further work.

Assessment
--------------
(1) In my view SPARKLIS is an important achievement since it conveniently gives much of the power of SPARQL into the hands of people who may not be familiar with this language. The paper itself provides ample evidence of its positive impact on the respective user community. The implementation of SPARKLIS was done with great care not only concerning its usability but also its portability.

(2) The paper is a pleasure to read. I found that all questions that I had – specifically with respect to the capabilities and limitations of SPARKLIS – were ultimately addressed in the paper itself. While the section on related work is already extensive, further references to the work of other researchers are found throughout the text. I am specifically impressed by subsection 5.3 "Readability: verbalization in natural language" that shows how thoughtfully the author handles the verbalisation of SPARQL expressions in order to provide guidance to the users. Some proposals for further work – extend the subset of SPARQL, increase the readability – are only natural, while – given the power of natural language – I fail to understand the author's intention to offer graphical visualisations of the results. Perhaps it would be more consistent with the goals of the author to add details in natural language to the tabular representation of the results.

I suggest to accept this paper without any changes.

Review #2
By Eetu Mäkelä submitted on 20/Apr/2015
Suggestion:
Major Revision
Review Comment:

The article is well written and describes an interesting tool. However, the majority of the contents have already been published elsewhere, and the section that is new (section 6 on evaluation and impact) is unfortunately to my mind the weakest of them.

Thus, while I actually think the work does deserve to be published already because the revisited sections are here written much more clearly than in their original formulations, I suggest a major revision to fully bring about also the new potential I see in the evaluation and impact section.

In that section, while a range of statistics is presented, either do they not actually shed light on the tool, or the analysis is too shallow. For example, various usage statistics are presented: number of loads (is this the number of sessions or the number of page loads?), number of endpoints used, as well as the number of distinct users and their geographic spread. All these statistics tell that the tool is used by a large number of people - but nothing else concerning the tool itself. Then, most popular endpoints, query sizes and session lengths are presented in separate tables and graphs. While these do say something about the tool, they are not analyzed very deeply in the text, and alternate formulations could say much more.

(As an aside, what does the session length graph actually measure? First you talk of session length in sessions, but then move to talking about session length in steps? Are you really talking about number of query steps per session [without regard to how many and how long actual queries this is comprised of]?)

As examples of what could be done, it would be interesting to for example cross-reference the number of users with the amount of sessions to discover how many users are single-time users and how many are repeat users. Similarly, it would be interesting to know if the power users use SPARKLIS mainly against a single endpoint, or as a general purpose tool. Further, do the power users create more complex queries (both in terms of length as well as number of advanced features such as aggregates) than one time queriers (suggesting that users organically learn to use the tool better)? Or if not, is query complexity an external issue, correlating with endpoint (thus intimating that the tool itself is truly general purpose)? In general, how many queries contain filters, aggregates, unions or other complex algebra (at present, this interesting bit is referred to only in text, while this if anything in my mind would warrant a more formal analysis by table or graph, as deeper cross-analysis would point to 1] how important it is for the tool to be able to cover these and 2] how easy are such queries to manifest using the tool)?

Digging even deeper, would it be possible to discover error categories and their frequencies in the query logs? For example, in the example query session given, I can see confusion between constraining by class membership vs property value. What about the frequency of trackbacks (returns to previous query position, indicating error)? Does the occurrence of such errors differ between users or user groups?

Review #3
By Vanessa Lopez submitted on 08/Jul/2015
Suggestion:
Major Revision
Review Comment:

This paper presents the system SPARKLIS, which allows users to interactive build questions over a selected endpoint. The approach is very interesting and practical as it combines auto-completion approaches and faceted search and takes them one step forward. SPARKLIS goal is to allow users to benefit from the expressivity of SPARQL in a user friendly way, while providing guidance and a overview over a dataset, and as such avoiding the habitability problem typical from NLI.

For example, one of the advantages of this approach with respect to previous work is that user can choose any position on the query (instance, class, property or an operator) to be used as a focus to obtain more suggestions and further refine the query. The approach also covers SPARQL features such as union ,negation, optional, filters, aggregations, ordering to build more complex queries (with superlatives, aggregations, etc). The results are verbalised in Natural Language. To alleviate the burden of creating a query from the user, the system present suggestions in the form of meaningful phrases.

The author points out that the approach covers SELECT and ASK queries, and that it doesn’t cover nested queries and aggregations -> could you add an example here and perhaps extend on why?

Moreover, as the author points out, only a limited number of suggestions can be presented to the users for the approach to scale. I was missing a discussion on the paper on how these suggestions are selected, i.e., how do you rank which results to present to the user?.

In this paper a new evaluation based on collected user logs through the online demo is presented. However, the paper has been already evaluated in previous publications using the QALD corpus and a usability study. While, you don’t need to present all the results again in here, I was missing a discussion on the limitations of SPARKLIS based on the current and previous evaluations. For instance, how often queries were unreachable because of partial results? How scalability affected usability? How do the users distinguish between out of scope queries or queries that can’t be build because a very large number of suggestions or not using the right word (synonyms)? This is one of the main issues on faceted and guided interfaces and it isn’t clear here how the authors tackle it for large and open domain endpoints such as DBpedia.

While I agree with the author that through a query building interface one avoids the need for complex NL understanding while reaching comparable expressivity, and that guidance avoid the ambiguity problem for small datasets. I am not convince it totally avoids the problems of disambiguation, as the user still needs to figure out how the knowledge and vocabulary is modelled in large datasources. The author states that suggestions could also be based on learning or user preferences in future work , but is there any intelligent ranking currently in use for this? is it limited to DBpedia categories or do you also include YAGO hierarchy? what about lexically related words?

The evaluation presented in this paper based on users logs shows some statistics on usage, a list of endpoints users selected, and a nice distribution of query sizes. For instance, in the example given “show me a drug that has a title..” it may have been non obvious for the user that he has to look for the relation “has a title” . The paper states untrained users need to first learn by trial and error to finally reach a user query (an example is given for a query of size 6 - 14 steps and less than 5 minutes). Besides just given an example, could you measure this? it would be a nice contribution from this paper (with respect to previous ones) to measure the average times and number of failed attempts according to the size or the learning curve (if not for all, at least for some of the latest logs in for example DBpedia). An extended evaluation and discussion on real users logs (with respect to benchmarks) could be very interesting. Also, it will be nice if the questions asked by users to the different endpoints and used for this evaluation could be published somewhere online.

Moreover, I would have like to see an extended discussion on the coverage of the type of queries this system could tackle (now or in the future, following a similar approach), for example, what about more complex spatial or temporal queries? or analytical or statistical queries which may required additional processing such as (just to come up with examples) “What is the most common cause of mortality across … ?” , “How many movies in xxx were released in xx year?” , “ what is the average..?” . Some examples to extend on the discussion will improve the understanding of the capabilities, limitations and potential of this kind of systems.

To sum up, the paper is nicely written, the topic it tackles is very relevant, convincing evidence is provided, but it can improve on coveying the limitations and challenges (e.g., ranking of suggestions, usability vs. scalability, ..) as well as by improving the discussion and analysis of results from the evaluation, which is why I ask for major reviews (although they are not that major, just an extension).