Exploratory querying of SPARQL endpoints in space and time

Tracking #: 947-2158

Authors: 
Simon Scheider
Auriol Degbelo
Rob Lemmens
Corne van Elzakker
Peter Zimmerhof
Nemanja Kostic
Jim Jones
Gautam Banhatti

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
Abstract: 
The linked data Web provides a simple and flexible way of accessing information resources in a self-descriptive format. This offers a realistic chance of perforating existing data silos. However, in order to do so, space, time and other semantic concepts need to function as dimensions for effectively exploring, querying and filtering contents. While triple stores, SPARQL endpoints, and RDF were designed for machine access, large burdens are still placed on a user to simultaneously explore and query the contents of a given endpoint according to these dimensions. First, one has to know the semantic concepts and the type of knowledge contained in an endpoint a-priori in order to query content effectively. Second, one has to be able to write and understand SPARQL and RDF. And third, one has to understand complex data type literals for space and time. In this article, we propose a way to deal with these challenges by interactive visual query construction, i.e., by letting query results feedback into both exploration and filtering, and thus enabling exploratory querying. We propose design principles for SPEX (Spatio-temporal content explorer), a tool which helps people unfamiliar with the content of SPARQL endpoints or their syntax to explore the latter in space and time. We test these principles in a user study on a repository of historical maps.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Feb/2015
Suggestion:
Major Revision
Review Comment:

The paper discusses design challenges and principals for a combined exploration and querying of Linked Data (LD). Based on the proposed design principals, a tool is introduced that supports the spatio-temporal exploration of LD contents. The tool is evaluated in a user study and the study results are discussed and generalized.

Space and time are often key dimensions to LD exploration (not only in the geospatial web) and are therefore of interest to this special issue. Although focusing on space and time is somehow restrictive, the authors do a good job in generalizing from this context to an extent that is still acceptable. Therefore, one could argue that most of their thoughts and results are not only limited to the geospatial domain. Several thoughts and arguments are interesting and convincing, and I like how the conceptual dimensions are contrasted and discussed (e.g., text search vs. data querying, exploration vs. retrieval, etc.). The authors ask the right questions and identify important challenges from my point of view. They use a convincing and pragmatic approach to address the challenges that is theoretically founded with models and literature references.

I would agree with the authors' observation that the data is often there but the user does not know how to formulate the right query. I also agree that exploration and querying are often tightly integrated tasks and that this fact is not sufficiently addressed by many of the current LD exploration tools. And I agree with the authors that empirical studies are missing in LD research and therefore appreciate that they extensively presented and discussed the results of their user study.

However, there are also some points of criticism that require a major revision from my point of view:
- Unfortunately, it is unclear how this work relates to and is different from previous work, in particular the work referenced in [27] and [28]. This is important information, not only for the reader but also to check whether the manuscript contains sufficient new content to justify another publication in this special issue.
- The summary and discussion of related work is somehow hidden in Sections 2.3 and 5.1. It should be more explicit and exhaustive and could include further tools, references and links. For instance, literature references for the tools LESS and RelFinder are missing, whereas related approaches like RDF-GL, gFacet or the newer SparqlFilterFlow are not mentioned at all.

Furthermore, there are some critical aspects that would require additional justification in order to recommend an acceptance of the manuscript:
- Since the participants of the user study needed at least 45 minutes to familiarize themselves with the tool, one could question if the tool is really appropriate for lay users? Can it be expected that lay users are willing to learn a tool like this for nearly an hour in non-controlled settings?
- The tasks of the scenario for the user study all seem to be of a similar type and to follow a similar pattern. Could this have had an effect on the user study? Is a larger variation in task types a problem for the approach and/or developed prototype?
- The current version of the tool looks rather like a prototype than a mature software. As this could have affected the study results, it should be addressed more prominently, while the text might be aligned accordingly.

There are also some further questions that the authors should consider to address in the revision of their article:
- How is the concept of "exploratory querying" different from the concept "exploratory search"?
- What if a user explicitly wants an empty result set (e.g., to confirm that there is really no data)? Is this possible? How is it supported by the presented approach?
- Why are CONSTRUCT and ASK queries as well as the distinction of bound and unbound variables not needed for exploration purposes?
- Why was the eye movement of the participants recorded? In how far did the eyetracking data help in the evaluation.
- Why is Rhizomer "less tightly integrated" if "query results feedback into a query only via facets"? I would consider faceted browsing a comparatively integrated approach.

Further suggestions for improvement:
- The readability of Section 4 could be improved (especially the clarity of the third paragraph).
- The structure of the paper (especially the use of headings) should be rethought. For instance, the design of the user study is reported in Sec. 5.3, while I would rather expect it to be part of Sec. 6.
- It would be interesting to get at least an idea of how the approach could be extended to also include UNION, OPTIONAL and other SPARQL constructs.
- It remains somehow unclear that the approach has been designed especially for librarians, until this is mentioned on page 9 (the restriction is not clear from the scenario and task only).
- Time sliders are one popular way to represent the temporal dimension, but there are also others (calenders, etc.). The recommendation concerning time sliders is therefore a bit too strong and concrete for my taste.
- Fig. 4 has a lot of white space. I would recommend to zoom the view in order to use the space more efficiently and to be able to display the texts and graphical elements in larger size.
- Also, the tasks in Section 5.1 could be presented in more condensed form (e.g., as a table).
- "... is made for exploration of lay persons" is maybe not what is meant here.
- The caption of Table 1 seems incorrectly aligned.
- Contractions like "didn't" are usually avoided in scientific papers.
- The list of references must be revised and made more consistent (e.g., add page numbers, add or remove all DOIs, etc.). Some references are incomplete (e.g. [2]...)

Review #2
By Heiko Paulheim submitted on 18/Feb/2015
Suggestion:
Major Revision
Review Comment:

The paper proposes an interface that supports users in querying SPARQL endpoints, in particular those with data holding geographic and/or temporal data. Here, maps and timelines are used for visualization of the results.

While I like the general idea of the paper, and the paper itself is well written, I have a two major remarks on the contents. The first is on the presentation of the motivation and the related work, the second is about the user study.

The motivation and related work section are not well tailored to the problem. The problem statement "sucessful queries require exploration, and scalable exploration requires queries" holds true for any data, not just for time and space indexed Linked Open Data. Furthermore, most of the related work section (2.1 through 2.4) addresses general LOD interaction, while only a small paragraph (2.5) is devoted on actual problems related to time/space indexed data.

In addition to that, I do not agree with some of the statements made in that section. While the authors state that concatenations of relative clauses are difficult for end users, there is the gFacet tool [1] which allows the visual construction of such queries (and the paper even presents geographic visualizations of facets in such queries). Furthermore, the statement "we know of only one data type standard and triple store technology which allows handling of geospatial data on a level comparable to a GIS" leaves me confused why Virtuoso, which has built-in support for geospatial data [2], is not mentioned here. There may be some aspects missing to make it "comparable to a GIS", but this needs to be explained in more detail for the general audience of SWJ. In addition, I miss a reference to Linked Geo Data [3], which, after all, features a map-based browser.

In addition to that, the motivation would benefit from a clearer statement about the need for spatio-temporal interaction support in terms of the datasets that are available as LOD. The authors use a runinng example, which is fine, but at the same time seems to be a rather borderline case of Linked Open Data. Some general remarks about the number of LOD datasets using spatial and/or temporal dimensions would be appreciated. Further, a clearer statement about which interaction problems exist for such data which do not hold for LOD in general would strengthen the motivation.

In section 5.2, the authors discuss some related tools, while the user study only investigates SPEX. It would have been appropriate to include one of the other tools in the user study and compare the users' performance with both tools. In particular, Rhizomer would have been a good choice, since it shows the same characteristics in the feature matrix in Table 1, except for the support for space and time queries. By that comparison, the authors could have shown that their approach brings an advantage over the state of the art. Without that comparison, all that the user study shows is that users can work with the tool, and that they are able to solve some tasks in some time. This does, on the other hand, not prove that the work presented in this paper is a significant advancement over the state of the art.

I suggest that the authors reorganize the related work section, shortening the parts that address interaction with LOD in general, fleshing out the particular challenges that exist for geospatial linked data, and emphasizing their contribution in the area of spatiotemporal data, and more thoroughly including recent works on linked data interfaces w.r.t. temporal and spatial data. In particular, a recent survey [4] lists quite a few approaches that have special interaction means tailored towards spatial and temporal data which are not mentioned in this paper.

Furthermore, the user study should be extended and compare to related tools. If that is infeasible, at least a control group should be used which gets the same tasks and the same tool, but with the support for spatial and temporal querying being disabled. Only such a study would unarguably prove the utility of the proposed solution.

Minor remark: in section 2, the authors seem to mix up metadata and schema for LOD, which are usually not used synonymously.

[1] Heim et al.: gFacet: A Browser for the Web of Data.
[2] see, e.g., http://docs.openlinksw.com/virtuoso/rdfsparqlgeospat.html#rdfsparqlgeosp...
[3] Auer et al.: LinkedGeoData: Adding a spatial dimension to the web of data.
[4] Pena et al.: Linked Open Data Visualization Revisited: A Survey

Review #3
By Mariano Rico submitted on 03/Mar/2015
Suggestion:
Minor Revision
Review Comment:

It is well written and easy reading, with a clear statement of the problem and a good evaluation. The work is a good review of the state-of-the-art and the solution proposed is original.

However, there are some issues that should be solved or clarified.
- There is a gap in the argumentation in listing1. The meaning of listing1 is not clear, neither its relation to the user interface. The reference to fig 7 is a very weak link to understand the syntax of listing1.
A detailed step-by-step description of one of the questions in listing1 would be helpful. Idea: a two column table showing the construction of one question, left column shows the syntax, right column shows the user interface, each row is a step.

- The web application is not running (3rd March 2015, with a Chrome browser). I get message "No class suggestions found!" when the starting node is clicked. I can not follow up the examples described in the paper.

- I miss a link to the "Help file" you mention on page 12. The help window could have this link.

- I miss a reference to the methodology used in the "user study design" (section 5.3). Any previous experiences using this method? Is an ad hoc method?

- There is no comparison of usability with other tools (like Rhizomer). The unique comparison is the functionality table. Perhaps it is out of the scope and it is reserved for a UI specialized journal.