Visualizing Statistical Linked Knowledge Sources for Decision Support

Tracking #: 953-2164

Adrian M.P. Brasoveanu
Marta Sabou
Alexander Hubmann-Haidvogel
Daniel Fischl
Arno Scharl

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
Most Decision Support Systems (DSS) are tailored towards specific domains and use relevant information for certain types of decisions. In today’s interconnected world, enriching DSS with external data about events such as financial crises and climate change can improve the decision-making process. One method to build DSS tools that leverage such cross-domain information is to look at the summary of these events as expressed through statistical data. Following the RDF Data Cube (QB) standard there was an increase in the publication of such data and related visualizations, but less effort was dedicated to integrating visualizations into analytical platforms to answer complex questions. After reviewing the relevant work in the field of Linked Data Visualization, this paper describes: (i) a methodology to integrate cross-domain statistical data sources by applying selected QB principles (observations and slicing, for example) to a visual dashboard; (ii) a set of visualization scenarios for cross-domain datasets from multiple sources including Eurostat and the World Bank; and (iii) a dashboard prototype developed following these principles and scenarios.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Bernhard Schandl submitted on 02/Feb/2015
Minor Revision
Review Comment:

This paper describes a Linked-Data based approach for visualizing statistical data from different sources in the context of Decision Support Systems for tourism management. The paper clearly describes the problem that the approach addresses — it is obvious that the authors clearly know what they are talking about. The use case that the authors envision is clearly valid and definitively an important one. The paper also gives a comprehensive overview of related work.

In Section 4, principles for Linked Data visualization are introduced. Part of these principles is a "workflow" — the purpose of this section is not really clear to me. It seems to me that the items mentioned there are a mixture of tasks, documents/artefacts, features, and SW development principles. It's not a workflow, as it does not describe sequences of activities or decision points. I recommend to slightly rewrite the wording in this section, and to put more emphasis on the concrete impact of considering these principles.

Section 5 clearly describes the chosen approach and the focus of the works. What is not so clear is on which data the decisions were made — "tourism colleagues" are mentioned here, but who is that? Did the authors conduct structured interviews? How many persons in which positions were asked? Some information on the background would help to better justify the information given in the paper.

Section 6 gives a very detailed and informative overview on the implementation, and gives lots of details how users can interact with the system. A major concern here is scalability: it is correctly mentioned that typical data sets in this space may contain millions of data points — the question remains how these amounts of data affects the usability of the tool. Is it still responsive? How fast can data be crawled, processed, and displayed?

The biggest flaw of this paper is that no user evaluation has been performed. It is unclear whether end-users will be able to use the system appropriately, and whether they will understand the complexity that the integration of different data sources imposes. I strongly suggest to perform at least a small-scale evaluation (say, with a handful of representative users) in order to obtain a rough estimate on the usability of the proposed implementation.

Several big issues were not at all addressed in the paper: for instance, if I understand it correctly the prototype uses three data sources (TourMIS, World Bank, EuroStat) that are "hard-coded" in the system. I also assume that most of the tasks related to data preparation (e.g., consolidation, linkage, duplicate detection, cleansing etc.) have been performed beforehand by Linked Data experts. If that is correct, the question remains which additional benefit the user gains from using Linked Data if the system does not permit to integrate new data sources on the user's demand. If the system actually permits the user to add new data sets, who will perform the preparation tasks? I do not assume that the targeted audience (tourism managers and executives) are able to perform these tasks.

Overall, the paper is a good contribution, but the lack of any kind of evaluation diminishes its value. Further, since it's not clear how the prototype deals with unknown/new data sources several questions remain open. If the authors address these issues it's definitively a valuable paper.

Review #2
By Luc Girardin submitted on 11/May/2015
Minor Revision
Review Comment:

The paper offer an original, important, and well-written attempt to address some of the challenges of Linked Data Visualization. One of its key virtue is that it extends from conceptual considerations all the way to a fully implemented (although prototype) system.

Relying on a centralized index (using Elasticsearch) seems to somewhat defeat the purpose of Linked Data, as the data need to be integrated in the index prior to any analysis. While Linked Data doesn't explicitly suggest that queries can be distributed to multiple sources, there is nevertheless the expectation that it should be possible to retrieve data flexibly from many sources without requiring them to be pre-integrated. I would value a small discussion of this issue into the paper and get the authors' take on whether some kind of Google of Linked Data is the only way to address the challenge. Moreover, it would be interesting to know more about the size of the index and an indication on whether it could be scaled to large amount of Linked Data.

OLAP approaches usually address the problem of efficiently computing the data cube through full or partial materialization of the cuboids. I assume there is no such possibility in the proposed system and all aggregation queries probably need to be fully recomputed every time. It would be good to discuss this topic and assess the performance implications.

While I did value a lot the survey of related work, I believe it should be shortened in this particular paper and potentially be turned into a full-fledged survey paper. Indeed, I did not find particularly useful many of the details of section 3 for the understanding of the following sections, which would benefit from being further expanded.

Review #3
By Emmanuel Pietriga submitted on 13/May/2015
Major Revision
Review Comment:

This paper describes a system that enables subject-matter experts (in the particular case considered, people working in the tourism industry) to visually explore linked datasets relating to tourism and economic indicators such as those published by the World Bank, using interactive visualization components that can be put together and linked (coordinated multiple views) to form a full-fledged visualization UI.

The topic addressed is highly relevant and definitely falls in the scope of this special issue. The overall project is interesting, and the paper generally well-written. However, I have several strong reservations about how the work is presented in the paper, and call for major revisions, as detailed below.

The main issue is that the research contribution is not clear. There is no research question clearly formulated, and no validation whatsoever of the proposed system. The fact that this might indeed be "the first visual semantic DSS that uses multidomain knowledge in tourism" does not make this work a _research_ contribution. I do believe there is a research contribution in this project, but the paper needs to be significantly revised and focused on this contribution. Actually, I see two potential contributions: 1) the software architecture and general approach (what the authors call the workflow) to generating interactive visual representations from RDF data cubes; and 2) the elements of the user interface that let users configure the views on the underlying linked data. Unfortunately, those two potentially very interesting contributions are barely discussed, and always at a too-high level of abstraction that does not enable the reader to fully understand (let reproduce) the approach.

Too much space is wasted describing features that are not particularly novel (e.g. coordinated multiple views) and walking the reader through anecdotal examples (Section 6) that do not say anything about the added value of linked data. Most observations made by the hypothetic user in the scenarios could have been made with any visualization system in which the corresponding data coming from the World Bank and other data providers would have been pre-loaded. There is _nothing_ particular about linked data in these examples (given that the data has to be pre-loaded and the system pre-configured for a particular application domain before it can actually be used by subject-matter experts). Thus, there is no validation of the approach. The paper, as it stands now, is little more than an overview description of a system, without enough information to replicate the work. This does not qualify as a research contribution that can be published as a full-length research paper.

The paper has to focus on either or both contributions suggested above (or any other I might have missed, as the authors see fit) and provide some validation of these claimed contributions. This can be performance figures and a discussion of the system's scalability for the software architecture/workflow part (loose claims such as "our solution supports an unlimited number of datasets" without any backing evidence are hardly convincing). For the UI part, it can be a more elaborate scenario that truly illustrates the benefits of having linked data under the hood (as mentioned above this is not the case in the current examples). Or it can be a more elaborate, more strongly rationalized list of user requirements informed by a user-centered design approach (interviews, observational studies, etc.; here again, the "requirements" identified in the paper are fairly loose, and it is very unclear where they come from - who are these "tourism colleagues" and how where they interviewed?). Or even a user study (controlled lab study, longitudinal study, ...) though it is unlikely that the authors have the time to setup such a study within just a few weeks.

What I really miss about the system's UI design is a better understanding of how users can declare new data sources, and how can they navigate cubes/slices/... to create new visualizations and link them with existing ones in the UI.

A few additional comments:
- The paper keeps refering to "Decision support systems". I do not understand what is specific to "decision support" in the approach presented. To me, this system could be used for any visual analysis task (exploratory visualization included), and I do not understand how it is specifically tailored to "decision support" (as opposed to more generally helping users gain insights about their data through visualization, which has applications far beyond "decision support"). Please explain.
- The paper is generally well-written, but there are a few typos, missing words and grammar mistakes scattered throughout the paper.
- p9, "not covered (or really difficult to cover) by traditional database-style systems and where LD technologies could provide a real benefit": this needs to be elaborated upon.
- For a paper about a visualization system, there are very few illustrations, and Section 6 is sometimes hard to follow because of the lack of backing illustrations. It is hard to get a clear impression about the UI and user experience based on this mostly text-based description.
- p13, "By looking at the charts we can also derive new knowledge, this be- ing the main purpose of designing a visual DSS.": but can this knowledge be captured in the tool (through annotations, or input of more statements)?
- Section 6.3 contains a lot of pretty anecdotal observations about the data, that are unlikely to be of much interest to the reader given that, as mentioned earlier, there is nothing specific about the system's interface or linked data (the reported observations made by the hypothetical user could have been made with any decent visualization tool).
- p16, "Our solution integrates data analysis and visualization, but also comes of as a hybrid between multiple view coordinated solutions and LDP, and it can be easily reused or adapted.": the latter part of the sentence is a very loose claim. Please provide evidence about this ease of re-use and adaptation.