Pioneering Easy-to-Use Forestry Data with Forest Explorer

Tracking #: 2480-3694

Guillermo Vega Gorgojo
José M. Giménez-García
Cristóbal Ordóñez
Felipe Bravo

Responsible editor: 
Christoph Schlieder

Submission type: 
Tool/System Report
Forest Explorer is a web tool that can be used to easily browse the contents of the Cross-Forest dataset, a resource containing the forestry inventory and land cover map from Spain. The tool can run in any device with a modern browser without requiring any further installation. The user interface completely hides the complexity of RDF, OWL or SPARQL from the user. An interactive map is provided, allowing users to navigate to the area of interest and presenting forestry data with different levels of detail according to the zoom level. Forest Explorer offers different filter controls and is localized to English and Spanish. All the data is retrieved from a SPARQL endpoint containing the latest versions of the Spanish forest inventory and land cover map. Forest Explorer uses a cache and smart geographic querying to limit data exchanges with the endpoint. A live version of the tool is freely available for everybody that wants to try it. Since December 2019, more than 1,800 users have employed Forest Explorer and it has appeared 12 times in the Spanish media. Future work includes the integration of new datasets from other countries and supporting advanced forest management capabilities.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Erwin Folmer submitted on 03/Jun/2020
Minor Revision
Review Comment:

The Forest Explorer is a web based tool for visualisation and analysis of Forest data, while using Linked Data technology is surpresses the complexity of the underlying technology (RDF, OWL, SPARQL).
The paper is easy to read, well written and structured. I could have lived without the Spanish Media listing…it doesn’t add much for me.
The tool has a good GUI, and in practice it works well. It seems to be based (and re-use) of existing linked data standards, and vocabularies.
I am impressed by the detail and quality of the dataset. (a pity that we don’t have it in the Netherlands). I would have liked to learn more on the implications of the size of the dataset: what triplestore is used, caching, spatial-index, performance issues, metrics… (although I understand that the tool is intended as front-end on top of a sparql endpoint; so in a way the data provisioning is not part of the tool, but then the tool probably has requirements for the sparql endpoint to be met)

I completely agree with the authors that there is much need for tools for browsing and analyzing linked data without the need of being a SPARQL professional. Tools for lay users, data analists, domain experts are needed. And the Forest Explorer seems a very interesting tool for that purpose.
The real challenge is however the ease of applying the same tool to other linked data sets. First of all other Forestry datasets from other countries (which is presented as further work). By just changing the Sparql endpoint will it then work? How much configuration needs to be done based on changes in the data model of the other forest related datasets? I don’t find requirements about the general applicability of the tool. I also don’t find any evidence that this tool might be used outside the context of forest data.
In my opinion I don’t see that many relevance (for SWJ) and value of this tool when it will only work easily on Spanish Forest data. In (my) practice we see very often big issues arise when we try to implement a generic linked data tool for our specific (spatial) linked data sets.

So I don’t question the tool itself for this specific purpose of Forest data in Spain: it is excellent. However I am worried about the general applicability as linked data explorer tool.
I would strongly encourage the authors to add a part about the general applicability of the tool, and even better provide evidence (use cases) about the application of the tool on other spatial data. When lacking examples of spatial SPARQL endpoints: I would be more than happy to provide a SPARQL endpoint with spatial government data from the Netherlands.

Review #2
Anonymous submitted on 29/Jun/2020
Major Revision
Review Comment:

The authors propose a Web-based tool named Forest Explorer for exploring and visualising forestry data.
The tool heavily uses Semantic Web technologies as an underlying layer, combining forestry data from various sources, but the UI is user-friendly and does not require prior knowledge of SPARQL from the end-users. The tool has already been used by a large number of users.

Main remarks
* The authors mention that "Both datasets have been connected with relevant external sources, such as DBpedia". How? using
an interlinking tool? which one? Please provide an example showing the linked entities and explain the methodology.

* Only SPARQL templates are mentioned. What about GeoSPARQL?

* I believe that some more information about the underlying technologies used to support the presented architecture is missing. For example, did you use any of the existing tools for converting data into RDF with geospatial support? What is the underlying triple store? Does it include geospatial support? If not, why not? How are the datasets combined/connected/interlinked (see the first comment).

* Expand Table 1 with some metrics about each dataset (size, number of triples, number of geometries, type of geometries) or create a new table.

* Discussion: Please include a "lessons learned" paragraph about the pros and cons of adopting
Semantic Web technologies for the described use case. Some things are already mentioned in the discussion (regarding the use of GISs) but I think that this part is worth expanding.

* What about evaluation?

Minor comments

* "Wikipedia, the Plant list" include the footnote index before the comma

My general opinion for the presented approach is positive. I believe that the tool is very useful and the fact that is already adopted by a user community is a strong asset. Novelty is borderline, however, and several improvements could be made on the approach (e.g., using state-of-the-art Semantic Web approaches, such as more GeoSPARQL). Although the authors highlight the geospatial dimension as a contribution of this work in comparison to related work more actual use of geospatial functionalities (e.g., GeoSPARQL queries) is expected. I find the described approach quite limited in this respect, as the only geospatial feature (apart from the map) is filtering features to the map bounds, which is a spatial selection. If more geospatial operations are supported, they should be described in detail, otherwise, design choices should be explained further.

Review #3
By Martin Tomko submitted on 22/Jul/2020
Major Revision
Review Comment:

This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions:

The manuscript, “Pioneering Easy-to-Use Forestry Data with Forest Explorer”, describes a Web-based system exposing the Spanish Cross-Forest dataset. The Cross-Forest dataset is a geocoded data resources capturing multi-faceted data about forests. This dataset appears to be directly implemented as a LinkedData resource – yet, this is not entirely clear. The Web-based implementation of the system uses Linked Data resources to present the data from the Cross-Forest dataset to users that are not experts in Semantic Web technologies. As such, this manuscript is in scope of the SWJ and has a potential to inform readers facing this common problem – exposing semantic Web data to domain experts without semantic web technology expertise.
I am reviewing this paper a Tools and Systems Report. As such, my focus is on the novelty and quality of the described technical implementation, and not as much on the novelty of the problem answered, or the scientific contribution of the manuscript.

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).

In general, this is a well written, clearly articulated report. It was easy to follow, and understand how the system works. Yet, I have concerns with the manuscript from the perspective of the readership of the Semantic Web journal ( and at times, form the perspective of the Geo-SW readership). In other words – I think this topic and the manuscript have potential, but I feel that the focus needs to be better tailored to the expected readership, following the points below:

1. Articulate clearly what are the needs and requirements of the users of the system. How have these been identified? Currently, the feature requirements of the system are discussed (Section 2, summarized as R1-R6), but not what is the reason to have the system at all. Who is using it? For what real-life application? How did the introduction of the system benefit the users – possibly in contrast to previous means to undertake the task? Answering these questions would demonstrate to the readership to evaluate the benefit of Semantic Web technologies for this domain – a desirable feature.
2. Better articulate what is the purpose of this system, and why it absolutely must be powered by a semantic technology/linked data ( or at least, what advantages this brings). I would recommend to get back to this in the discussion, and relate to limitations ( whether from the perspective of performance, maintainability, or development effort).
3. I am currently also unclear about how this system came to be – is the Cross-Forest dataset already a Linked dataset, that was available and the system has been put together to interface it, or has the dataset been adapted from some other legacy dataset into a Linked dataset, for the purpose of this project? What was the rationale if the later is the case? Are there features of Semantic Web technology that the functionality (now or in the future ) needs? From what I can tell, there is a single feature that somewhat may warrant this, and that is skimmed over in a single paragraph – the linkage with DBPedia for the tree species. This is actually an exciting, and nice demonstration of Semantic linkage ( and possibly could be further expanded by reasoning, for instance between soil types and species), yet this is very much neglected in the discussion. I believe that this paper can be much more than a simple report on the implementation ( which currently does not expose much advantage in using Semantic Web technology). I see this potential may be used further for the multilingual capabilities or the integration with Portugese datasets ( hinted at in the paper).
4. Finally, the spatial capabilities are discussed very briefly – with most of the focus on the query caching and BBox filtering. The caching is not particularly specific to the Linked data context. When it comes to the BBOX querying, GeoSPARQL provides substantial capabilities (such as sfWithin), yet all the demonstrated bbox queries are implemented as simple interval queries, which are actually semantically inefficient and inexpressive. Why would this be? What about issues with coordinate transformations, have these been encountered, or is WGS84 the only supported system? Does this impact on Spanish practitioners?

I hope these few points emphasize how this paper could be modified to maximise impact on Linked data practitioners facing the need to implement a similar user-facing Web-based system.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

The overall functionality of the tool is well described. The illustrations are adequate. I feel certain tables are not necessary ( such as the one summarizing newspaper feedback). There area few structural aspects I would suggest:

- I would expand on the background section, reviewing current approaches to similar Web-based semantic-powered tools. This is currently done in the discussions section, and I fear it is not the best place. Some of the paragraphs from the discussion (p9, paragraph starting “Some systems” until p10, until before paragraph starting “To wrap up”) could be moved forward in the paper, to inform a traditional background section. Also note the comment earlier about framing the need for this system.
- I believe that the Abstract should be re-written, it currently does describe the system’s appearance to a large extent, and how it is popular, instead of focusing on the technological aspect and what problem it solves.
- The description of the system usage on page 9 focuses on the number of sessions and average time, instead of what tasks this system has been able to solve, and how it has, say, minimized the efforts of people who were compiling these data before, or users waiting for experts to process reports – this would demonstrate true impact.
- Figure 1 has a large size for relatively little content. Could it be re-done, to gain more insights about the technical implementation, for instance? What environments is this running in, what software stack, how is the messaging done, etc? I emphasize, the single most interesting paragraph in the manuscript, for this reviewer – was the short paragraph under this figure discussing the linkage with DBPedia.