Exploratory querying of SPARQL endpoints in space and time

Tracking #: 1163-2375

Simon Scheider
Auriol Degbelo
Rob Lemmens
Corne van Elzakker
Peter Zimmerhof
Nemanja Kostic
Jim Jones
Gautam Banhatti

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
The linked data Web provides a simple and flexible way of accessing information resources in a self-descriptive format. This offers a realistic chance of perforating existing data silos. However, in order to do so, space, time and other semantic concepts need to function as dimensions for effectively exploring, querying and filtering contents. While triple stores, SPARQL endpoints, and RDF were designed for machine access, large burdens are still placed on a user to simultaneously explore and query the contents of a given endpoint according to these dimensions. First, one has to know the semantic concepts and the type of knowledge contained in an endpoint a-priori in order to query content effectively. Second, one has to be able to write and understand SPARQL and RDF. And third, one has to understand complex data type literals for space and time. In this article, we propose a way to deal with these challenges by interactive visual query construction, i.e., by letting query results feedback into both (space-time) exploration and filtering, and thus enabling exploratory querying. We propose design principles for SPEX (Spatio-temporal content explorer), a tool which helps people unfamiliar with the content of SPARQL endpoints or their syntax to explore the latter in space and time. In a preliminary user study on a repository of historical maps, we found that our feedback principles were effective, however, that successful question answering still requires improvements regarding space-time filtering, vocabulary explanation and the linking of space-time windows with other displays.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Mariano Rico submitted on 14/Sep/2015
Review Comment:

Fine to move it to full paper

Review #2
Anonymous submitted on 06/Nov/2015
Review Comment:

The issues raised in my reviews were sufficiently addressed by the authors in the revision of the manuscript. The comparatively long time needed to familiarize with the tool is no longer a serious issue, as the authors changed the addressed target group from "lay users" to "motivated experts". My impression that the tasks of the user study are of similar type has been convincingly countered by the authors. Furthermore, the authors added missing references and updated the related work section as requested. However, I would encourage them to carefully go through the list of references before publication and correct minor inconsistencies (e.g., capitalization of terms like SPARQL) as well as update and complement the bibliographic information. They might need to do this anyway - see SWJ FAQ10: http://www.semantic-web-journal.net/faq#q10

Review #3
By Heiko Paulheim submitted on 10/Nov/2015
Minor Revision
Review Comment:

For this revision, the authors have taken a significant effort to take into account the reviewers' comments. In particular, I like the idea of introducing additional use cases, but the authors could spend more space on those, since they are quite superficially described. With this, I do not mean that they should conduct yet another user study for those use cases, but discussing which information needs could be better addressed in those scenarios using the space/time visualizations in SPEX would be appropriate.

My main concern, however, still remains. As stated in my review for the previous revision:
>The authors themselves state that their "study was preliminary" and "does not yet allow drawing representative conclusions" - this is not the degree of maturity expected for a journal publication.

I can see that it is difficult and laborious to repeat or extend the user study. On the other hand, the evidence in the paper is not what I expect for a journal publication. In essence: what is shown is that the tool at hand serves a particular use case. It is not shown, however, that it does so better than any state of the art tool. The functional comparison in table 1 also provides no such evidence (cf.: for many tasks, quite a few people are surprisingly effective with a text editor, despite the existence of much more advanced visual toolkits).

In the current state, my feeling is that this work is a very good conference publication, but lacks the significance of a journal article.

Furthermore, the conclusions in section 6.3 are quite weak. Apart from the very small number of participants, many of the observations can also have other causes. For example, the authors state that the participants who were less successful than others also made use of the space/time visualization less often. Although it is not explicitly stated, the text suggests that the reverse should hold, i.e., using those visualizations leads to more successful task completions. This, however, would not be a valid conclusion.

Another point that is questionable is the dataset size, made explicit by the authors in this revision. 3,000 triples is not a very large dataset, my assumption is that this corresponds at most a few hundred maps. There are datasets in the LOD cloud that are by several orders of magnitude larger than that, and that may come with very different challenges (both in terms of user interaction as well as implementation) - in fact, I believe that SPEX could be particularly helpful with such larger datasets, given that it is implemented in a scalable fashion.

Furthermore, I am a bit puzzled that the authors claim that they do not have information about the usage of time and space vocabularies in Linked Open Data. Two standard sources for estimates are
* LODStats, see http://stats.lod2.eu/ (tab: Vocabularies)
* The latest state of the LOD cloud report, http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/ (section 4.2.1)