Towards Provenance-Centric Spatial Data Supply Chains: A Review of Semantic Web Technologies

Tracking #: 4007-5221

Authors: 
Philip Langat
Arjun Neupane
Muhammad Azeem Sadiq

Responsible editor: 
Elena Demidova

Submission type: 
Full Paper
Abstract: 
Spatial data supply chains (SDSCs) require robust provenance mechanisms to ensure data quality, traceability, and interoperability across geospatial workflows. This study presents a systematic review of semantic web–based approaches to provenance modelling in SDSCs, synthesising evidence from 156 studies published between 2001 and 2025. The review evaluates the use of semantic technologies, including RDF, OWL, SPARQL, and GeoSPARQL, and benchmarks existing provenance models against criteria of granularity, scalability, and standards compliance. The findings reveal fragmented lineage practices, limited feature-level provenance representation, and persistent challenges related to real-time processing, interoperability, and scalability. To address these gaps, the study identifies the need for GeoPROV, a minimal and interoperable semantic framework that extends W3C PROV with spatial semantics while maintaining compatibility with ISO lineage standards and emerging catalogue specifications. GeoPROV can enhance trust in real-world spatial data ecosystems. The review concludes by outlining practical implications for operationalising GeoPROV in SDSCs, identifying research priorities for automated provenance capture and big-data scalability, and highlighting the role of semantic reasoning in improving trust, transparency, and reproducibility in spatial data governance.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Apr/2026
Suggestion:
Major Revision
Review Comment:

This paper presents a systematic review of semantic web-based approaches in SDSCs and proposes a framework named GeoPROV that extends W3C PROV. The authors discuss limitations of existing work and provide insights into implications and potential research directions.

Strengths

S1: A good coverage of existing work

S2: Valuable insights into open challenges in the domain

Weaknesses

W1: For a review paper, previous works are not sufficiently introduced or discussed. The introduction relies heavily on tables, but the tables are not very detailed. The columns are not always intuitive and sometimes appear arbitrary without further explanation, and the keywords are unclear. More discussion is needed to give readers a clear overview of the research field.

W2: While the paper lacks an effective overview of existing work, it spends a large portion discussing gaps. Without a solid grounding in prior work, it is difficult to relate to or evaluate these gaps.

W3: The review lacks a clear structure.

W4: GeoPROV, as one of the main contributions, is not clearly described, and its importance and potential impact are not well justified.
More comments

Table 2 presents a list of themes that mixes application domains, techniques, and research challenges. A more structured categorization of research directions would improve clarity.

The paper should first summarize existing approaches and their innovations before discussing their limitations.

The choice of columns in Table 2 is not well motivated.

The labeling system in the “SDSC relevance” column in Table 2 is unclear. It contains tags: Core, High, Central, Moderate, Direct, SDSC case study, Frames SDSC process, etc. They are used without a clear definition or consistent system.

The column names of Table 3 are not clearly described.

In Section 3.2, Table 3 is referenced, but this seems incorrect.

The statement “However, findings summarized in Table …” is difficult to follow, and the connection between text and table needs to be better grounded.

Section 3.3 discusses versioning as one technique within provenance concepts and standards. It is unclear whether this is intended as an example or a main focus. A review paper should provide more balanced coverage of techniques in the field.

In Section 3.4, the first paragraph is too generic. Statements such as “recent developments incorporate big data integration” are not sufficiently informative. More concrete examples and references are needed.

The description in Section 3.5 is also too abstract, and no specific data models or frameworks are clearly explained.

Similarly, Section 4 focuses on limitations without sufficiently reviewing or grounding them in prior work.

The scales used for “Complexity level” in Table 7 and “Compliance level” in Table 8 are not clearly defined.

Many statements lack sufficient citations.

Review #2
Anonymous submitted on 21/Apr/2026
Suggestion:
Reject
Review Comment:

This paper presents a comprehensive systematic review of semantic web–based approaches for provenance modelling in spatial data supply chains (SDSCs), covering 156 studies published from 2001 to 2025. It evaluates the adoption of key semantic technologies such as OWL, RDF, SPARQL, and GeoSPARQL. The review highlights significant fragmentation in current provenance practices, particularly the lack of fine-grained feature-level representation and limitations in interoperability and real-time processing. Based on these findings, the authors motivate the need for a unified framework, proposing GeoPROV as a minimal, interoperable extension of W3C PROV. GeoPROV incorporates spatial semantics while remaining aligned with ISO and emerging metadata standards.

Strong points

1) The paper presents a comprehensive review, refining an initial pool of 734 publications down to 156 relevant studies. This selection provides an overview of existing approaches in the spatial data supply chains (SDSCs) field.
2) The authors introduce GeoPROV, an extension of W3C PROV with explicit spatial semantics while remaining compatible with ISO standards.
3) GeoPROV is designed as a lightweight, enabling machine-actionable spatial provenance while supporting reproducibility and cross-domain reuse.

Weak points:

1) The paper is not well-written and formatted. Its lack of clear structure makes it difficult to read and follow.
2) The section “methodology and protocol” is not described in sufficient detail, reducing transparency. For instance, the author mentions that some records are removed for “other reasons” during the selection process, but these reasons are not specified. Many records (n = 281) are excluded without a clear justification of the exclusion criteria.
3) The analysis approach lacks clarity, particularly regarding why only specific domains are considered and whether this introduces bias.
4) The rationale for the chosen time periods and binning (2007–2011, 2012–2014, 2015–2017, 2018–2025) in Section 2.2 is not adequately explained, making it unclear how these groupings meaningfully capture o thematic shifts.
5) The research contribution is marginal.

Overall, the limitations of the paper outweigh its contributions, and I therefore recommend rejection.