OneForestKB: A Knowledge Base for Global Forest Data Curation and Exploitation Using Geospatial Services

Tracking #: 3841-5055

Authors: 
Felipe Vargas-Rojas
Vincent Armant
Isabelle Mougenot

Responsible editor: 
Guest Editors Geospatial Knowledge Graphs 2025

Submission type: 
Full Paper
Abstract: 
Global forests are critical for carbon sequestration and to face climate change challenges. Forests are distributed across the planet and both local and international organizations have led efforts to develop information systems to store and manage forestry data. Such data includes spatio-temporal observations, metadata about tree species and their traits, for example the trunk diameter at breast height. Forestry data is in nature heterogeneous, multi-scale, and multi-source. In practice, however, forest data often remains stored in local installations and is rarely shared as open data, thereby neglecting the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. The Semantic Web community has contributed several standards in related areas such as: phenotypic descriptions of species (OBO/PATO), spatial objects (GeoSPARQL), observations and measurements (SOSA), units of measurements (QUDT), among others. However, to our knowledge, there is a lack of comprehensive studies demonstrating how these standards can be arranged and combined to facilitate the curation, reuse, and exploitation of forestry datasets. To cope this gap, we propose the OneForest Knowledge Base (OneForestKB) that is based on a semantic profile and novel methods to validate and enrich forestry datasets. We follow a strategy of reusing existing ontologies as the definition of a semantic profile suggests. The relevance of this strategy is demonstrated through two use cases in the Amazonian forest of French Guiana, showing how to improve data quality through spatial validation rules based on the W3C constraint language, SHACL, combined with the OGC standard GeoSPARQL; the SHACL rules are generalised and shared to be applicable to any system using the GeoSPARQL model. Besides, this work provides a set of data enrichment rules relevant to forestry studies, which enable to enrich geographic regions with the calculation of ecological indexes as proxies of biodiversity. OneForestKB is designed to be extensible, allowing new validations and inferences to be added based on specific use cases.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 03/Jun/2025
Suggestion:
Minor Revision
Review Comment:

This study introduces a knowledge base OneForestKB, that models and validates heterogeneous forestry data. It meets the scope and requirements of semantic web journal for the following reasons:
The semantic profile was built upon established generally used ontologies, like SOSA, GeoSPARQL, and OWL-Time, etc, as well as domain-specific ontologies, such as Plant Ontology and Environmental Ontology, etc.
The OneForestKB utilized spatial validation rules based on SHACL with GeoSPARQL, and provides a set of data enrichment rules relevant to forestry studies.
The OneForestKB resolves specific challenges of the curation, reuse, and exploitation of forestry datasets.
The OneForestKB provides open access tools such as APIs, SPARQL endpoints, and web interface.
The two case studies clearly illustrate the spatial validation and data enrichment of OneForestKB.
The data file is well organized and is shared properly.
For these reasons, I would recommend a minor revision for this manuscript with the following comments:
Page 2 line 33: This paragraph is a little abrupt, I recommend putting it after the description of the problems you want to resolve in the next paragraph.
Page 2 line 42: “But it ignores certain aspects of forestry datasets”, what these certain aspects are and how “specialised” of the profile is?
Page 10 line 25: Can you explain how? A more detailed introduction of how SHACL works will make it more readable for readers who are not familiar with SHACL.
For figure 1, I would recommend adding an overall semantic profile diagram including all relevant domain-specific ontologies. Besides, a further introduction of each of these utilized domain-specific ontologies will make it easier to understand the ontology and will be helpful for the reuse of OneForestKB.
For conclusions and future work, I would recommend adding a discussion about how OneForestKB can be potentially extended (integrate more data sources), and a more detailed description of how to include more scenarios will make the future work clearer.

Review #2
Anonymous submitted on 21/Sep/2025
Suggestion:
Major Revision
Review Comment:

Overall Assessment
This paper introduces OneForestKB, a standards-based semantic knowledge base for forestry data curation and exploitation, integrating GeoSPARQL and SHACL for geospatial validation and ecological enrichment. The two use cases based on BAFOG datasets in French Guiana are well chosen and illustrate the system’s practical value for the environmental sciences The authors’ decision to build on existing vocabularies, rather than designing a new ontology, is welcome and in line with FAIR principles.
This paper is promising to provide meaningful contributions to the geospatial semantic web and environmental informatics communities. However, the paper would benefit significantly from revisions to address several critical modeling, implementation, and clarity issues that affect the soundness and reproducibility of the work. I recommend major revision to correct core modeling and reproducibility issues before acceptance. Below I outline the most pressing concerns along with specific suggestions for revision.
Major Comments
•The discussion in Sections 1–2 mentions related efforts such as the TERN ontology, Cross-Forest / Forest Explorer, and FooDS. These comparisons are currently too brief. Since the core contribution of this paper is a “semantic profile” that combines existing standards for forestry, I strongly suggest expanding the Related Work section with a feature comparison table that highlights the coverage (e.g., observations, taxon handling, spatial validation, inference services) across these efforts and clearly shows what OneForestKB adds or improves.

•As currently modeled, trees are typed as gemet:8664 (Listing 2), but GEMET concepts are SKOS terms and not OWL classes. Using rdf:type gemet:8664 is semantically incorrect and will lead to compatibility issues in reasoning. I recommend replacing this usage by assigning trees to more appropriate classes such as sosa:FeatureOfInterest, geo:Feature, or dwc:Organism. If linking to GEMET is still desirable, use dct:subject or skos:closeMatch.

•Listing 1 on page 5 appears to include a sosa:ObservationCollection with no explicit subject and uses it to declare properties such as sosa:resultTime, qudt:unit, and sosa:observedProperty, which are typically attached to individual observations rather than collections. Provide valid Turtle syntax with an explicit IRI (e.g., guyafor:oc1) or blank node for the observation collection. Also clarify whether these properties are inherited by member observations — if so, include an example showing this pattern explicitly, or otherwise move these properties back to the sosa:Observation instances to avoid confusion
• Several SHACL-AF rule listings in Appendix C are incomplete or inconsistent. For instance, Listing 7 is labeled “total trees” but contains a construct that builds ex:hasAaIndex grouped by family. In contrast, the later rule for relative abundance (Listing 11) depends on ex:hasTotalTrees, which is never defined. I recommend that the authors:
* Provide a complete and correct construct for total tree count (ex:hasTotalTrees),
* Clearly separate Aa (absolute abundance) from Ar (relative), and
* Align property usage and grouping levels (species vs. family) with the definitions in the main text.

•Section 4.1 reports that 279 invalid geo:sfWithin statements were found, while the conclusion states “260 trees … erroneous geospatial facts.” It is unclear whether this refers to unique trees or triples. Please clarify how these numbers are calculated. Are they counts of invalid triples, distinct trees, or some other metric? If some filtering or deduplication is applied, briefly describe the criteria used
•The system uses RDF4J, which does not natively support GeoSPARQL functions such as geof:sfWithin. It is unclear whether a custom function plugin, external spatial engine, or fallback is used, and no performance metrics are provided. I suggest explicitly describe how GeoSPARQL functions are executed in your pipeline (e.g., through RDF4J extension libraries, external SPARQL endpoints, etc.). Also, as the paper refers to "global forestry data", please include some discussion on performance and scalability—how long do validation/inference steps take, and how do you plan to scale beyond the current 146k triples?
Minor Comments and Clarifications
•The use of sh:zeroOrMorePath geo:sfWithin assumes sfWithin is transitively closed, but GeoSPARQL seems not defining it as transitive. Please clarify this modeling choice and guard against cycles where appropriate.
• The paper states that "knowledge base," "ontology," and "knowledge graph" will be used interchangeably. For clarity, I suggest using “ontology” for the schema and “knowledge graph” for the populated data.
•In Appendix B, Listing 5 uses BIND(1.0 AS ?selected) to flag target nodes. Consider using a boolean value or rdf:type ex:SelectedFeature instead, and update your SHACL shape targeting accordingly.
•Typographical fixes:
* Page 9, Table 1: “total number of threes” → “trees”.
* Section 4.2, Figure 7: “Arecacea” → “Arecaceae”.
* Page 8 refers to “Figures 10 and 11” — this seems to be a typo and should likely refer to Figures 2 and 3.