Exploring User and System Requirements of Linked Data Visualization through a Visual Dashboard Approach

Paper Title: 
Exploring User and System Requirements of Linked Data Visualization through a Visual Dashboard Approach
Authors: 
Suvodeep Mazumdar, Daniela Petrelli, Fabio Ciravegna
Abstract: 
One of the open problems in SemanticWeb research is which tools should be provided to users to explore linked data. This is even more urgent now that massive amount of linked data is being released by governments worldwide. The development of single dedicated visualization applications is increasing, but the problem of exploring unknown linked data to gain a good understanding of what is contained is still open. An effective generic solution must take into account the user’s point of view, their tasks and interaction, as well as the system’s capabilities and the technical constraints the technology imposes. This paper is a first step in understanding the implications of both, user and system by evaluating our dashboard-based approach. Though we observe a high user acceptance of the dashboard approach, our paper also highlights technical challenges arising out of complexities involving current infrastructure that need to be addressed while visualising linked data. In light of the findings, guidelines for the development of linked data visualization (and manipulation) are provided.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Krzysztof Janowicz
Decision/Status: 
Accept
Reviews: 

Review 1 by Christophe Gueret

In this revised version, the authors addressed most of the comments I had with the first submission. I am pleased with the results and suggest accepting this paper after a last set of few details/typo are fixed.

Things to fix:
* There is a confusion between Open Data and Linked Data in the introduction. Despite our efforts not everyone willing to publish open data does it following the linked data guidelines. In fact, most of the open data currently available is not linked data. They mostly reach one or two stars on Tim's 5-star linked data ranking scheme.
* Page 3, for the sake of readability, "JavaScript" should be used instead of "JS".
* Page 7, the resource http://dbpedia.org/ontology/city does not exist. "city" is not defined under the ontology namespace, it is defined as a property.

Review 2 by anonymous reviewer

The authors seem to have addressed sufficiently the issues raised concerning the initial version of the paper.

-

Revised manuscript after an "accept with minor revisions." The reviews for the original submission (under the title "A Visual Dashboard for Linked Data: An Exploration of User and System Requirements") are below.

Review 1 by Christophe Gueret
This paper proposes a visual dashboard to explore data served by a triple store. Compared to other approaches for data exploration, the dashboard metaphor offers several views over the data thereby providing the users with different interpretations of it.

In general this is a very well written manuscript, with a sound test plan and analysis of results that I would warmly recommend for publication.

I have however three concerns:
* The dashboard approach reminds me the Paggr system that was presented by Benjamin Nowack in ISWC2008. The generic views also sounds like what Sparks proposes. I missed references and comparison to both systems and would suggest to add them in a revised version of the article.
* The problems related to the choice of the vocabulary and the general construction of the SPARQL queries are not described in the paper. One can wonder how the two examples SPARQL queries can be automatically generated. The interactive selection of properties by the user is surely part of the explanation (only indicated in footnote 19!) but I don't think that, for instance, users will be asked to manually select both a "Place" concept and the specific "point" property to get a location on a map.
* I don't think the in depth performances comparison with a database (in Figure 10) really makes sense. They are too many aspects that could explain the differences observed. I would suggest summarising this into one or two lines just stating that SQL systems are faster.

Some minor details
* There are too many footnotes and most of the them would fit into the main text nicely.
* Page 2, "a list flatten" -> "a list flattens"
* Page 2, "section 3" -> "Section 3"
* Page 2, "available to try graph" sounds weird
* Page 2, "a generic (...) visualizations" singular/plural mismatch
* Page 4, the header from the browser could be remove in Figure 1 to get a result similar to that of Figure 2. BTW, are the two figures showing the same version of the software? The dashboard looks different
* Page 8, the usage of a footnote after a number is not a good idea. I first thought there were 1090^19 choices, which is indeed a high number! ;-)
* Page 8, "Inspite" -> "In spite"
* Page 9, rename Fig 8 into Tab 1
* The title of Section 7 doesn't match its content
* Please add a reference for the "Likert scale"

Review 2 by anonymous reviewer
A generic visualization tool called .views. is presented, providing several simultaneous visualiazions (map, timeline, bar chart, tag cloud etc.) of a Linked Data (LD) repository using SPARQL.
Using the tool, novice user's of data would able to explore, analyse, and create visualizations of LD based of simple filters on properties.

The paper presents an end-user tool evaluation of the first version of the system with some promising results with the given datasets. A key issue here is, however, how well the approach generalizes into a general LD visualization approach: how easy it is to the end-user to adapt it to a novel dataset? For example, if coordinate is missing or incomplete it does not make sense to use a map visualization, or if data instances have several coordinates (e.g. a book published somewhere and describing some other place), how to select what to show.

The second evaluation reported in the paper tests the system from a system perspective. The main result is that depending on the query and its result set, different functionalities of a
SPARQL-based system like .views. can consume resources in an unpredictable fashion.

The conclusion pulling together the two evaluation studies is that the interplay between the user's and system's views has to taken into account of, and some guidelines based on the two evaluations are presented.

The paper addresses an important aspect of LD usage, visualization and data analysis, and although the notion of providing several simultaneous visualizations is not not ground breaking, the dashboard system showing different views on the same screen with a provision of dynamic filters looks interesting.

I also found the evaluation-based, lessons learned approach of the paper useful and the experiments well done -- there are in general too little evaluations in semantic web research papers.

I think this work is worth publishing. However, following clarifications are suggested before publishing the paper:

1. The mechanism of (global and local) filters (around page 5., e.g. Fig 3.) is not explained explicitly enough. How are the filters created in terms of the underlying data model in RDF?
How well does the generator actually work from an end-user's perspective using the example data sets and in the general case -- I assume, there must be some difficulties involved here?

2. Add more discussion about how well .views. generalizes to abribrary LD datasets. How is the dashboard adapted if the data is not suitable to a visualization from some view? How can the user find our if this is the case?

3. The process of adapting the system to a particular data set should be explicated better. How is it actually done? For example, Fig. 3 presents filters and Fig 4. automatic suggestions, but their interplay is not well documented.
There is a "" field in Fig. 4, but its meaning is not explained. A more explicit description of how the filters and user input are transformed into SPARQL is needed, in addition to the examples in section 5.

4. There were some minor typos to be corrected in the text listed below.

p.1
the systems -> the system's
that, -> that
some, -> some
Bad layout in footnote 1.

p.2
flatten -> flattens
For example -> For example,
([8]) -> [8]

p.3
Points of View apprevition already mentioned on p. 2
familiar, -> familiar and

p.4
twitter -> Twitter
flickr -> Flickr
data set -> dataset
geo information -> geo-information
a numeric data -> numeric data
as tabel -> as a table
Indeed .views., -> Indeed, .views.
GrassPotal( -> GrassPotal (

p.5
Flickr( -> Flickr (
Twitter( -> Twitter (
In Fig. 3 you mention "brackets" but there are no brackets in the figure.
typing pa -> typing "pa"
right query -> right query.

p.6
Fig 4: explain elements there better
fron-tend -> front-end
used too -> used, too
reference-able -> referenceable
In Fig. 5: Linked Data endpoint -> Linked Data Endpoint
In paragraph "Once the user ...": text not fully understandable, something is missing there

p.7
Fig. 6: text is too small to be readable
Heading 6: Capitalize words systematically
end users -> end-users

p.8
etc -> etc.
resultsets -> result sets
Add paragraph break before "New version ..."
and foster ->and to foster
30-40 -> 30--40
responses for -> responses of
criteria(ten) -> criteria (ten)
follow up -> follow-up

p. 9
students(Left -> students (left
experts(Right -> experts (right
data for -> data, for
)and -> ) and

p.10
"hard to use" -> 'hard to use'
Heading 7: Capitalize words systematically
linked-data -> linked data
Add paragraph break before "The system evaluation ..."
Select all ... -> "Selects all ..."

p.11
1 - 2200 -> 1--2200, and similarly with other ranges in Fig 9 caption
Paragraph "The high ..." has wrong spacing in the right column
however -> , however,

p.12
etc -> etc.

p.13
discreet -> discrete
etc -> etc.
to this -> to this:
higher instance -> more instance
i.e. -> , i.e.,

p.14
result providing -> results by providing
focus-group -> focus group
equip users -> equip the users

p.15
even them -> even they
positively on -> positively, on

References

There are capitalization errors of names in articles in the following refs (use double brackets is Latex):
8, 10, 12, 14

Publisher or other bibl. data missing in several refs, e.g.: 2, 4, 8, 10

Review 3 by Akrivi Katifori

An interesting approach for a tool to support the objective for a usable Web of Data. The paper is very well-written and with a clear description of all aspects of development: design, implementation and a special attention to user evaluation issues.
The simple visualization/exploration dashboard approach suggested seems a very valid and effective choice in order to be able to really take advantage of the wealth of data "hidden" in the Semantic Web.
Although the evaluation focus groups were small and results cannot really be considered conclusive, the fact that domain experts were more positive towards the tool than computer scientists, in the second set of focus group evaluations, is very promising and suggests that the tool is in fact intuitive and easy to use by the people who are really interested in working with the data. This result merits further investigation with more users.
The guidelines concluding the paper are precise and pose performance issues that should be taken into account be the semantic web community towards improving infrastructure issues.

Tags: