Creating dashboards and data stories within the Data & Analytics Framework (DAF)

Tracking #: 2129-3342

Authors: 
Michele Petito
Francesca Fallucchi
De Luca Ernesto William

Responsible editor: 
Marta Sabou

Submission type: 
Application Report
Abstract: 
In recent years, many data visualization tools have appeared on the market that can potentially guarantee citizens and users of the Public Administration (PA) the ability to create dashboards and data stories with just a few clicks, using open and unopened data from the PA. The Data Analytics Framework (DAF), a project of the Italian government launched at the end of 2017 and currently being tested, integrates data based on the semantic web, data analysis tools and open source business intelligence products that promise to solve the problems that prevented the PA to exploit its enormous data potential. The DAF favors the spread of linked open data (LOD) thanks to the integration of OntoPiA, a network of controlled ontologies and vocabularies that allows us to describe the concepts we find in datasets, such as "sex", "organization", "people", "addresses", "points of interest", "events" etc. This paper contributes to the enhance-ment of the project by introducing the process of creating a dashboard in the DAF in 5 steps, starting from the dataset search on the data portal, to the creation phase of the real dashboard through Superset and the related data story. The case study created by the author, concerns tourism of Sardinia (a region of Italy). This case study is one of the few demonstra-tions of use on a real case of DAF and highlights the ability of the DAF to transform the analysis of a large amount of data into simple visual representations with clear and effective language.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Adrian M.P. Brasoveanu submitted on 20/Mar/2019
Suggestion:
Reject
Review Comment:

The paper presents two contributions related to semantic services visualizations OntoPiA and OntoNetHub.
The authors rarely put their solution face to face with the current State-ot-the-art dashboards. P
ictures, while interesting, are rather too small or not really well-annotated and do not really help clarify the contribution.

The Related Work section misses a lot of interesting papers.

First of all, The ommision of many Linked Data Visualization papers from the 2017 Special Issue from Semantic Web
makes me question the validity of this submission to the SWJ. In the end when you are targeting a paper to a certain conference and journal,
you do so because similar papers were published there to begin with. I would have expected them to at least mention the survey paper

Aba-Sah Dadzie, Emmanuel Pietriga. Visualisation of Linked Data – Reprise. Semantic Web, Volume 8, Number 1 / 2017, pp. 1-21.

A second serious omission is the fact that Linked Data Platforms are not mentioned at all,
and some of them do have strong visualization components (e.g., ELDA).

A third omission would be the failure to understand some current trends in visualization in general, namely the fact that
we are now slowly moving into design space exploration and parameterized visualizations due to the rise of Data Science
and Deep Learning movements (both related to the data deluge we experienced in the last 10 years).
This means that at least some papers about these trends should have been included.

The whole section 3, which was supposed to describe an important part of their innovation (the OntoPIA tool), reads like an extended summary
instead of a large descriptive section. Perhaps due to the fact that often the authors simply mention lots of developments
in passing, without taking the time to properly explain why it was important to design the things that way.

For example, Figure 1 all but hides the role of semantics in this framework. Why?

Another example,

The microservices layer is described in the previous study [38]. [and several sentences that follow after it]

Nobody can reasonably expect most of the readers to actually go and read previous papers. The best bet, if this layer is truly important,
is to offer an executive summary of that paper (somewhat longer than your description).

Next two pages are somewhat better written, but Figure 2 definitely needs to be improved as it is not clearly directly from the profile
what the various elements represent (e.g., profile, restrictions).

Section 4, is the second section related to the main contribution of the paper (the OntoNetHub visualization tool). This is definitely much better written than Section 3,
however the title is forced. I would suggest something along the lines of "The Dashboard Construction Process" to enhance readability.
Section 4.2, however needs to be extended a little bit, as it is not enough to simply mention lots of visualizations as
being the most used. There should be an explanation as to why this happens.

Section 5 is definitely interesting. However, there doesn't seem to be a clear contribution in visualization in here,
as the visualizations look rather like stock D3 visualizations. There are no visualization operators described and not a lot of interactivity.
In my view any new visualization tool needs to come with some new visualizations and some distinctive branding elements
(e.g., nice tooltips, interactivity, some context menus, etc).

Last section describes the future of the application.

In my view, OntoPiA seems to be a mature tool, whereas OntoNetHub definitely needs more work.
This should be resubmitted when OntoNetHub reaches maturity and also delivers nicely integrated visualizations within the generated
dashboards, otherwise it will not be better than run-of-the-mill Elastic or Spark visualization tools based on D3.

Review #2
By Armin Haller submitted on 10/May/2019
Suggestion:
Reject
Review Comment:

The paper describes the Data & Analytics Framework (DAF) that has been developed for the Open Data portal of the Italian Government. It integrates several Open Source software tools to allow users to create dashboards from open datasets that have been annotated with metadata using the OpenPiA framework. The proposed tool is tackling part of a pressing issue with Open Public Administration data, namely, that they are largely created in isolation and there is little to no linking and integration between them. The presented work aims to ease this integration through a dashboard visualisation process.

Although overall the presented work is very interesting and it is evident that some problems have been solved with the described application in the context of the Italian data portal, the paper lacks in focus and it is hard to understand what are the actual contributions of the proposed application. The paper in large parts also reads more like a software documentation (minus the screenshots) while there are several sections that feel unfinished (see also minor comments).

Contribution of the authors vs reused work: Although it is evident that the authors try as much as possible to reuse existing software (e.g. Metabase Apache Superset, WebVOWL), it is unclear which part of the DAF has actually been developed by them and which part has been proposed by others. For example, there is a large section in the paper that describes the OntoPiA framework, but it is unclear if it has been developed by the authors or if it is a tool that has been developed by others. The citation [7] is missing.

Adding metadata to datasets: One of the main issue, and also identified as such by the authors, is the reuse of ontologies to describe concepts in datasets, e.g. "sex", "organization", "people", "addresses", "points of interest", "events" to open datasets. It is stated that it is the user's responsibility to define the ontological information and controlled vocabularies associated with the data structure, through the meaning of the semantic tags. It is unclear, however, if the DAF allows user to add concepts from external ontologies that are stored in OntoPiA to datasets, or if the existence of these metadata is a precondition for the dashboards to work.

Lack of detail on how dashboards are created: Section 4 describes the process on how to build dashboards. This section lacks detail in order to actually understand how that can be achieved. Some of the screenshots that are presented in the use case section would help a lot to understand the process here. However, the screenshots are illegible in the first place, and need to be included in the paper in much better quality to be useful to the reader. Section 4 also seems to miss some content before the paragraph beginning with "of a given administration and accessible according to the relative privacy policy". In the use case there is some detail on how the process would work, but there are several more phases than described in Section 4, while the five phases described in Section 4 are not really apparent in the use case. For phase 1, for example, how are relevant datasets selected? Then there is some mentioning of a data integration process, which seems to use OpenRefine. Also, Juptyer Notebooks can be used in this phase, but it is totally unclear how and why there is "no need for an in-depth analysis of the dataset" for the user while doing so? The authors mention that Fig.5 shows how the Sardinian tourist movements dataset has been linked to the Jupyter and Superset tools, again, it is entirely unclear how. Later it is mentioned that "Superset automatically assigns the correct properties in relation to the type of data declared", and again, there is no mention on how that is achieved. The remainder of this use case description from Page 9 onwards makes little sense to the reader. It mentions slices and how slices can be created, but it is unclear what is meant with a slice. An RDF Datacube slice? It apparently has a dimension and a metric, but how they can be defined remains unclear.

Summarising, overall the paper lacks focus on specific contributions of the DAF framework, while the current description of the application is unclear in too many parts of the paper to be acceptable for publication in its current state. As the journal operates a two-strike rule, my suggestion is a reject and resubmit rather than a major revision that may then end up in a final reject.

Minor Comments:
To enhance interoperability DAF make use of -> the DAF
we focus on DAF -> we focus on the DAF
OntoPiA use -> use what? 
provides the public and free -> provides a public and free
This blocking dialogues with the ontology -> ???
a notation property has been created -> annotation property?
the ontology design pattern has been used -> which ontology design pattern? See http://ontologydesignpatterns.org/wiki/Main_Page
The use of exclamation marks throughout the paper needs to be reduced.

The application itself at https://datapor-tal.daf.teamdigitale.it at the time of review was unavailable, i.e. a 503 error code was received.

Review #3
By Andrea Giovanni Nuzzolese submitted on 19/Jun/2019
Suggestion:
Reject
Review Comment:

The paper presents a solution for creating visualisations of the data managed by the Data and Analytics Framework (DAF). The DAF is a platform for open data ingestion and big data analytics.
The data visualisation solution presented relies of Superset, which is a project currently incubated by the Apache Software Foundation.
The visualisation of semantic data is a central topic to the SWJ. As a matter of fact, a special issue [1] on Linked Data visualisation has been published to the SWJ in the recent past.

==== Overall comments ====
The paper needs to be reworked significantly in order to improve the readability and, more in general, its quality.
In fact, the paper is hard to read in most of its parts.
Additionally, most of the solutions and tools introduced in the paper (i.e. OntoPiA, DAF, OntoNetHub, Superset) are largely described thought they are not an original contribution by the authors. Accordingly, those parts should be reduced significantly in order to make more room the the real contribution by authors, which to be honest is not clearly identifiable within the paper.
In my opinion the paper is not acceptable to the SWJ as it is in its current form (see weaknesses listed below).

=== Strengths ===
As already argued, data visualisation is a central topic in the semantic web community especially if applied to large knowledge graphs like those managed by the variety of open datasets coming from the Italian Public Administration and semantically enriched and managed by the DAF. The authors fairly points to a couple of articles that identify core requirements that a data visualisation tool should provide. Nevertheless, the authors fails in the mapping of those requirements with respect to the functionality provided by their solution (see next comment).

=== Weaknesses ===

+++ Lack of proper architectural design +++
The authors points to a couple of articles that provide core requirements for data visualisation tools. However, those requirements are not contextualised for the design of the visualisation solution. More in detail, it is not clear what are the design choices realised by the authors in order to implement the visualisation solution. Why do the authors provide specific dashboards with certain information visualisation paradigm? What are the requirements their platform implements? Is there any correlation with the requirements identified by Dadzie and Rowe [14] and Shneiderman [47]? Who is the target user, i.e., a tech-user (a user with a good knowledge of Linked Data and Semantic Web technologies) or a lay-user (a user with little knowledge of Linked Data and Semantic Web technologies)

+++ State of the art +++
The related work section should provide comparisons between solutions existing at of the state of the art and that presented by the authors in oder to clearly positioning the work and understanding its originality.
A good starting point is to include the works published to the SWJ special issue on the visualisations of linked data [1].

+++ Originality +++
A lot of room in the paper is dedicated to the DAF and its related projects (i.e. OntoPiA, OntoNetHub, etc.), which are worth to be mentioned but they are not an original contribution by the authors. The same holds for Apache Superset. Accordingly, it is not clear what the contribution in the paper is. Additionally, no link or reference to the source code is provided. I suggest to rework significantly the paper in order to make more room the description of the solution by including the design choices and a proper evaluation.

+++ Evaluation +++
Although a use case is presented, the paper lacks of a proper evaluation that provides an assessment of the quality of the work presented. The evaluation should be carried out in terms of user study by involving users. Additionally, the evaluation should be aimed at validating if the visual application addresses the requirements (that needs to be clearly identified).

[1] Dadzie, A. S., & Pietriga, E. (2017). Visualisation of linked data–reprise. Semantic Web, 8(1), 1-21.
[2] Dadzie, A.-S., Rowe, M., 2011. Approaches to visualising Linked Data: A survey. Semantic Web 2 (2), 89–124, DOI: 10.3233/SW-2011-0037.
[3] Shneiderman, B., 1996. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In: Proceedings of the IEEE Symposium on Visual Languages. IEEE, pp. 336–
343, DOI: 10.1109/VL.1996.545307.