Observational/Hydrographic data of the South Atlantic Ocean published as LOD

Tracking #: 2583-3797

Authors: 
Marcos Zárate
German Braun
Mirtha Lewis
Pablo R. Fillottrani

Responsible editor: 
Armin Haller

Submission type: 
Dataset Description
Abstract: 
This article describes the publication of occurrences of Southern Elephant Seals Mirounga leonina (Linnaeus, 1758) as Linked Open Data in two environments (marine and coastal). The data constitutes hydrographic measurements of instrumented animals and observation data collected during census between 1990 and 2017. The data scheme is based on the previously developed ontology BiGe-Onto and the new version of the Semantic Sensor Network ontology (SSN). We introduce the network of ontologies used to organize the data and the transformation process to publish the dataset. In the use case, we develop an application to access and analyze the dataset. The linked open dataset and the related visualization tool turned data into a resource that can be located by the international community and thus increase the commitment to its sustainability. The data, coming from Peninsula Valdés (UNESCO World Heritage), is available for interdisciplinary studies of management and conservation of marine and coastal protected areas which demand reliable and updated data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 23/Oct/2020
Suggestion:
Minor Revision
Review Comment:

Congratulations to the authors they have greatly improved the description of their dataset. I have just two comments

I understand the modelling choice proposed by the reviewer 2. Define each specific dive as one specific Feature Of Interest (FoI) and define one specific observable property by FoI. Another choice could be to represent each dive as one samples of a generic FoI.
The reviewer 2 choice is understandable only if the link between one property and its associated FoI is specified. Unfortunately this link does not appear in the Figure 2. Note that the dataset contains only the isFeatureOfInterestOf link and not its inverse.

My other concern is about the lack of precision of the observable properties: temperature location and depth.
a dive is performed during an interval of time.
Temperature, depth and location measurement are instant measurement. The temperature is measured for one instant.
so it is not clear what means those measurements during the interval of the associated dive.
The property should be average temperature and average depth.
I do not know exactly what is the location of a column of water. Maybe the location should be the centroid of the column. The location maybe the average of all the location measured during the dive.

I understand that the authors have tried to find their observable properties in one of the collection of nerc vocable.

I have just find two uri concerning depth, but none of them is the average depth. If the authors can not find the uri in the nerc vocabulary, they should created their own property (with precised meaning average depth, average temp and centroid) and link to the nerv vocabulary using a skos:broader link.

MAX. DEPTH BELOW SEA SURFACE --
URI http://vocab.nerc.ac.uk/collection/P09/current/MAXD/

MIN. DEPTH BELOW SEA SURFACE --
URI http://vocab.nerc.ac.uk/collection/P09/current/MIND/
Identifier () SDN:P09::MIND
Preferred label (en) MIN. DEPTH BELOW SEA SURFACE

Review #2
By Simon Cox submitted on 26/Nov/2020
Suggestion:
Minor Revision
Review Comment:

This paper is a nice description of a operational system for observation data that is delivered as linked-data using a selection of 'standard' ontologies and RDF vocabularies. One of the ontologies - BiGe-Onto - was developed by the authors (and has been previously described elsewhere) but otherwise the paper shows how a sophisticated application can be developed by judicious re-use of existing components, denoted using their original URIs. The focus is on the 'linked data' aspect of the system: there is no description of any reasoning or inferences, and thus no analysis of whether this somewhat heterogeneous collection of ontologies are consistent with each other. But from an interoperability point-of-view the exercise appears to have been successful. I find the description highly compelling, and I believe it is useful for such a pragmatic yet principled example of ontology re-use to be reported to the semantic web community. While the work is not particularly original in terms of semantics, and the evaluation is rather anecdotal, the descriptions of the use of the ontologies and vocabularies and SPARQL etc here would be too much for a report to a domain-oriented journal, so I think it is appropriate as a 'practice' paper in SWJ.

The authors have responded well to comments on an earlier version of the m/s and appear to have fully assimilated the adaptations to the use of standard ontologies and vocabularies suggested by reviewers.

I have just a few additional minor suggestions for improvement:

1. The tabulation of reused ontologies and namespaces (Table 3) appears to be incomplete:
a. it does not mention QUDT, either the schema http://qudt.org/schema/qudt/ or the vocabularies http://qudt.org/vocab/unit/ etc
b. in the text it is clear that the NERC vocabulary S10 is used as well as P01, but this is not in the Table

2. The references to QUDT in the text (p4) appear to be to v1 vocabularies. There have been a lot of improvements in the QUDT catalogue, including the use of a more consistent naming pattern - e.g. http://qudt.org/vocab/unit/M http://qudt.org/vocab/unit/DEG_C

3. In the caption to Table 4 should the description of the second part of the table say 'occurrence' rather than 'observation' data?

4. In two places items from the NERC vocabulary service are denoted by SeaDataNet URNs (e.g. SDN:S10::S106) rather than NERC URIs (e.g. http://vocab.nerc.ac.uk/collection/S10/current/S106/ ). This should be clarified.

5. The data is issued under a CC-BY license. Why not CC0? If the system is configured so that proper metadata is included in every payload when dereferencing the linked-data URI, then 'attribution' is completely under the control of the URI and data provider.

6. SSN and SOSA are credited through the reference to the W3C recommendation. Suggest also citing at least one of the journal papers - https://doi.org/10.1016/J.WEBSEM.2018.06.003 and https://doi.org/10.3233/SW-180320

Review #3
By Dalia E Varanka submitted on 30/Nov/2020
Suggestion:
Minor Revision
Review Comment:

A project for publishing linked open data (LOD) from monitoring Southern Elephant Seals in marine and coastal environments over time scales and geographical space. High quality observation at the individual level and propagation of those numbers in semantic contexts. The dataset description is described in the context of Semantic Web vocabularies and practices for LOD. The project can be explored and instructions for dereferencing can be found at: http://linkeddata.cenpat-conicet.gob.ar/. A use case for analysis is presented.
I recommend publication based on my review that follows. I sought three general contexts for evaluating the paper: science, code, and institutional reviews. The paper begins by describing the environmental objectives for the data; Physical (hydrographic and bio-observational) and Temporality (annual, seasonal, and minute) capture of the data about the agents and activities.
The description of the technical implementation was clear and seemed complete. I praise the inclusion of the human manual data collection that is critical as metadata but often overlooked. Any challenges in the articulation of human observation to the ontology is particularly important, as these may be particularly informative or conversely, error-prone, but none are described. The description of the automated sensor representation included and informative, as is the attribution or publication data.
Information I would want to see about a LOD dataset, but that was mostly omitted are the following.
** Different vocabularies are reused, but how does that influence the semantics? Justification of the selection and linking of ontologies: for example, why have Relations Ontology (RO) and GeoSPARQL, yet ignore the spatial reasoning available in geosparql? Not supported? Why that bibliographic ontology? Discussion of the DwC in section 4.1.4 is a good example of semantic concepts of vocabulary, not just its description. Does the ontology support any reasoning?
** How was the scope of ontology linkages determined? For example, I notice in Fig. 2 that some metadata, the sensor brand and model, is stored as owl:comment. A ‘comment’ relation seems very general for something as important as equipment. Was this because of ontology limitations?
** Use of owl:sameAs in ontology linking may be a problem in the dataset due to known issues that the relation must be reflexive, symmetric, and transitive. For example, in Fig. 4, base:geometry/point_-64.36644_-42.23526 has the sameAs property as goenames:3863776. However, the coordinates in geonames and the base dataset stored as geo:asWKT do not match.
** What insights that could improve the development of LOD for this or similar studies in general became evident from the study? This question is important for data-driven science.