LinkedDataOps:Quality Oriented End-to-end Geospatial Linked Data Production Governance

Tracking #: 3215-4429

Beyza Yaman
Kevin Thompson
Fergus Fahey
Rob Brennan

Responsible editor: 
Guest Editors SW for Industrial Engineering 2022

Submission type: 
Application Report
This work describes the application of semantic web standards to data quality governance of data production pipelines in the architectural, engineering, and construction (AEC) domain for Ordnance Survey Ireland (OSi). It illustrates a new approach to data quality governance based on establishing a unified knowledge graph for data quality measurements across a complex, heterogeneous, quality-centric data production pipeline. It provides the first comprehensive formal mappings between semantic models of data quality dimensions defined by the four International Organization for Standardization (ISO) and World Wide Web Consortium (W3C) data quality standards applied by different tools and stakeholders. It provides an approach to uplift rule-based data quality reports into quality metrics suitable for aggregation and end-to-end analysis. Current industrial practice tends towards stove-piped, vendor-specific and domain-dependent tools to process data quality observations however there is a lack of open techniques and methodologies for combining quality measurements derived from different data quality standards to provide end-to-end data quality reporting, root cause analysis or visualization. This work demonstrated that it is effective to use a knowledge graph and semantic web standards to unify distributed data quality monitoring in an organization and present the results in an end-to-end data dashboard in a data quality standards-agnostic fashion for the Ordnance Survey Ireland data publishing pipeline.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Umutcan Serles submitted on 23/Sep/2022
Review Comment:

the paper addresses my main concerns about the previous version sufficiently.

It would be still beneficial to do a final proofread as I have seen some "et al." written as "etal" and the first paragraph of the evaluation has section 3 twice.

Review #2
By Julian Rojas submitted on 28/Sep/2022
Review Comment:

All my previous comments were addressed and I think the rewriting applied to the paper have increased its quality and clarity.

I recommend to accept the paper.

A couple of typos found:
- Page 22, Line 46: "...naively..." → "...natively...".
- Page 23, Line 1: Missing closing parenthesis.

Review #3
By David Chaves-Fraga submitted on 04/Oct/2022
Minor Revision
Review Comment:

First of all, I would like to thank the authors for their effort in accommodating my previous comments and improving the quality of the paper. However, IMO the paper requires another round of review as I still have the following concerns:

1) There are still many sentences that are really long and very complex to understand and are key for the comprehension of the paper. For example, the 9th paragraph of the introduction (contribution description) is a long sentence with many technical words difficult to follow. I would encourage the authors to re-review the text and simplify sentences (better to be clear and concise) to enhance the readability and also to not increase the complexity with concepts or ideas that are not well introduced or explained in the text, it should be self-contained.

2) Missing a motivating example or a set of examples to clarify and enhance the understandability of some explanations. I would suggest adding it together with the description of the use-case in Section 2 as a specific real example, and it could be reused to support other ideas and explanations along the rest of the paper.

3) Review all repositories (neither DOI nor License is provided) because we do not know if they can be reused and how at this moment.

4) Review R2RML to be consistent. There are some cases where rr:class in the SubjecMap is used and others where is declared using rdf:type in the POM. There are ObjectMaps with templates, aiming to generate an IRI but without rr:IRI (so the engine would generate a literal). Listing 4 still contains RDF errors (e.g., daq#value object), datatype for isEstimate (which is defined in the mapping rules).

5) Fine-grained contributions: in my previous review I was concerned about the number of contributions in the paper but I was surprised that in the new version they have been removed. I would like to see the contributions of the work in detail but in a more concise and clear way.

6) Missing a Figure with the general procedure (maybe improving Fig4 with more details), that gives an overview of all the steps and processes involved. In addition, there are other figures (e.g., fig 5) that can be improved with more details and better organization (is difficult to see that there are arrows from data access to data principles).