Review Comment:
Overall Assessment
This paper introduces OneForestKB, a standards-based semantic knowledge base for forestry data curation and exploitation, integrating GeoSPARQL and SHACL for geospatial validation and ecological enrichment. The two use cases based on BAFOG datasets in French Guiana are well chosen and illustrate the system’s practical value for the environmental sciences The authors’ decision to build on existing vocabularies, rather than designing a new ontology, is welcome and in line with FAIR principles.
This paper is promising to provide meaningful contributions to the geospatial semantic web and environmental informatics communities. However, the paper would benefit significantly from revisions to address several critical modeling, implementation, and clarity issues that affect the soundness and reproducibility of the work. I recommend major revision to correct core modeling and reproducibility issues before acceptance. Below I outline the most pressing concerns along with specific suggestions for revision.
Major Comments
•The discussion in Sections 1–2 mentions related efforts such as the TERN ontology, Cross-Forest / Forest Explorer, and FooDS. These comparisons are currently too brief. Since the core contribution of this paper is a “semantic profile” that combines existing standards for forestry, I strongly suggest expanding the Related Work section with a feature comparison table that highlights the coverage (e.g., observations, taxon handling, spatial validation, inference services) across these efforts and clearly shows what OneForestKB adds or improves.
•As currently modeled, trees are typed as gemet:8664 (Listing 2), but GEMET concepts are SKOS terms and not OWL classes. Using rdf:type gemet:8664 is semantically incorrect and will lead to compatibility issues in reasoning. I recommend replacing this usage by assigning trees to more appropriate classes such as sosa:FeatureOfInterest, geo:Feature, or dwc:Organism. If linking to GEMET is still desirable, use dct:subject or skos:closeMatch.
•Listing 1 on page 5 appears to include a sosa:ObservationCollection with no explicit subject and uses it to declare properties such as sosa:resultTime, qudt:unit, and sosa:observedProperty, which are typically attached to individual observations rather than collections. Provide valid Turtle syntax with an explicit IRI (e.g., guyafor:oc1) or blank node for the observation collection. Also clarify whether these properties are inherited by member observations — if so, include an example showing this pattern explicitly, or otherwise move these properties back to the sosa:Observation instances to avoid confusion
• Several SHACL-AF rule listings in Appendix C are incomplete or inconsistent. For instance, Listing 7 is labeled “total trees” but contains a construct that builds ex:hasAaIndex grouped by family. In contrast, the later rule for relative abundance (Listing 11) depends on ex:hasTotalTrees, which is never defined. I recommend that the authors:
* Provide a complete and correct construct for total tree count (ex:hasTotalTrees),
* Clearly separate Aa (absolute abundance) from Ar (relative), and
* Align property usage and grouping levels (species vs. family) with the definitions in the main text.
•Section 4.1 reports that 279 invalid geo:sfWithin statements were found, while the conclusion states “260 trees … erroneous geospatial facts.” It is unclear whether this refers to unique trees or triples. Please clarify how these numbers are calculated. Are they counts of invalid triples, distinct trees, or some other metric? If some filtering or deduplication is applied, briefly describe the criteria used
•The system uses RDF4J, which does not natively support GeoSPARQL functions such as geof:sfWithin. It is unclear whether a custom function plugin, external spatial engine, or fallback is used, and no performance metrics are provided. I suggest explicitly describe how GeoSPARQL functions are executed in your pipeline (e.g., through RDF4J extension libraries, external SPARQL endpoints, etc.). Also, as the paper refers to "global forestry data", please include some discussion on performance and scalability—how long do validation/inference steps take, and how do you plan to scale beyond the current 146k triples?
Minor Comments and Clarifications
•The use of sh:zeroOrMorePath geo:sfWithin assumes sfWithin is transitively closed, but GeoSPARQL seems not defining it as transitive. Please clarify this modeling choice and guard against cycles where appropriate.
• The paper states that "knowledge base," "ontology," and "knowledge graph" will be used interchangeably. For clarity, I suggest using “ontology” for the schema and “knowledge graph” for the populated data.
•In Appendix B, Listing 5 uses BIND(1.0 AS ?selected) to flag target nodes. Consider using a boolean value or rdf:type ex:SelectedFeature instead, and update your SHACL shape targeting accordingly.
•Typographical fixes:
* Page 9, Table 1: “total number of threes” → “trees”.
* Section 4.2, Figure 7: “Arecacea” → “Arecaceae”.
* Page 8 refers to “Figures 10 and 11” — this seems to be a typo and should likely refer to Figures 2 and 3.
|