Review Comment:
This paper of 25 pages presents Glosis ontology used to harmonize description of soil information and measurements.
This paper presents a huge work on ontology design for data harminzation in the soil domain. Unfortunately the ontology does not seem to be finished. The documentation should be improved.
I have lot of questions and some clarifications, examples and diagrams should be added.
I do not understand why some classes extracted from iso standard is not replaced by equivalent classes in Glosis ontology. ISO are not web ontology. ISO classes are used in dataset description, that let suppose that Glosis ontology is not complete.
I am not sure to have understood the design choice of the authors.
I am not sure that the design pattern of SOSA is well used in Glosis ontology. Why only samples are described in Glosis ontology.
Moreover, the authors’ vocabulary should be normalised. It is quite confusing to understand which type of data model is involved.
--------------------------------------------
The introduction presents the motivation, the international project and its stakeholders e.g. FAO.
The goal of the project is to build a common data model to exchange soil data between stakeholders. The soil data should also be harmonized. A global system should be able to query the harmonized datasets to help decision makers.
The specification of the ontology is to enable the harmonization of local, regional or national datasets about soil content description and measurements.
---
The state of the art presents several international standards for soil data exchange : ANZsoilML, INSPIRE, ISO 28258, OGC Soil IE, Wosis, SOTER.
Some parts of them have been the sources for the Glosis ontology like controlled vocabularies or code lists.
Q1: The ISO 28258 standard was identified as a good input for the Glosis ontology. This choice is quite surprising. The authors claim in the state of the art that this standard was never used due to the fact that the standard seems too abstract. Maybe the authors could justify their choice.
The common elements shared between different soil standards have been identified to be part of the Glosis ontology. Unfortunally the common elements are not documented with textual description in the Glosis ontology. For example the skos:note annotation properties could be used to store the definitions providing from differents standards and a skos:definition property could store the harmonized definition written by glosis project that are based on previous standard definitions.
Q2: I would appreciate that the authors could clearly identify the type of data model for each standard: pure XML data schema, Relational database schema, object oriented schema or graph based schema or ontology (web data schema based on Semantic Web technologies). UML can be used to specify different types of data schema, thus it does not help. Maybe the authors could indicate if the UML data schema is implemented in a storage system or not. What will be the name of an UML data schema without real implementation in a storage system?
For example, (page 6 column 2 line 40) SoilIE is presented as an ontology. (page 7 column 1 line 11) SoilIE is presented as an XML schema that is not totally compliant with Semantic Web technologies.
The paper presents several closed expressions like domain model, data model, data semantic ontology, ontology, web ontology. Some standards are XML data model, or UML data model, others are web ontology. Thus the authors should normalize their vocabulary to make a clear distinction between UML data model, OWL ontology or SKOS model.
---
The methodology used for the ontology design is the Neon one. Several methods from Neon were used. As far as I understand the paper I have only recognised :
* the scenario 2 where existing data models have been transformed into an owl ontology to be the basic element of the Glosis ontology.
* the scenario 3 which reused existing ontologies. The Glosis ontology reuses well known ontologies and vocabularies like SOSA, GeoSparQL, QUDT and SKOS.
Q3: Unfortunately, I do not see where scenario 7 is used in the ontology development process, that is to say, when ontology design patterns are used.
The ontology is modular and belongs to a network. The ontology is developed in an iterative and incremental way.
(Page 3 column 2 line 36) “Glosis domain model and web ontology” means that there exist two models: an abstract one and its implementation as an owl ontology.
(Page 9 column 1 line 6) “GloSIS domain model was used as the base from which to derive the ontology.”
Q4: The authors should define what is the domain model. I understand only at page 9 that first a UML diagram was designed and then transformed into an owl ontology.
I would appreciate a schema that presents the whole design process with UML diagrams, existing ontologies, data model transformation.
Moreover I am not sure about the division of modules. The profile class and its components the layer horizon classes are separated in two distinct modules.
If the domain model is a kind of UML diagram, the documentation in the git repository should contain those diagrams.
---
The name of section 4 should be replaced. This section is not only about specification.
The requirements are listed precisely. All the elements from previous standards that need to be reused are identified.
Note that Soter data model is not presented in the article even if it is one of a reused standard.
In section 4, the words “data model” and “ontology” are used, that makes me more confused.
Q5: (Page 8 column 2 line 12) “Codelists/federation of vocabularies/registries (ontologies) shall be developed for linking the data model with explicit soil body properties.”
I do not understand that point. Maybe the authors should include “SKOS models” or “thesaurus” and differentiate it from web ontology. In order to help the understanding the authors should replace “property” word by “characteristic”.
Q6: (Page 8 column 2 line 12) “Include vocabularies/registries (ontologies), but in an abstract form. This means that vocabularies may be added/modified/deleted without changing the domain model itself.” I do not understand: what is an abstract form of an ontology?
(Page 3 column 1 line 44) “abstract ontology”. Could you explain what is an abstract ontology?
Q7: I would appreciate that the authors defined what is “observed property” in section 4.1. The words “concept” “attribute” and “observation” are associated with observed property. The sentence is also confusing (Page 8 column 2 line 31). The next sentence (line 36) seems redundant.
Q8: Figure 3 is illegible. Thus I do not understand what is a container class that links a spatial class with an abstract class.
It seems that the ontology designer wants to make a choice based on the various standards:
Each soil feature has some characteristics (for example the size of a plot) that could be represented as an data type property (one RDF triple) or as a sosa:Observation graph (a set of RDF triples) due to the fact that the value of the characteristic evolved over time or can be evaluated by several methods.
A CSV is used to summarize all the characteristics of soil feature represented in the various standards :
Q9: Each line represents a characteristic and each column represents a feature. The csv helps to make the decision. But what is the decision? It is not clear to me.
As far as I understand the paper, I understand that a decision is reached for each characteristic to decide its representation as: an data type property or an sosa Observation graph. The both representations can not belong to the final Glosis ontology. Or maybe the authors decide to take only the second representation, the sosa Observation graph. Section 4.2 is not clear to me. I need an example.
The ontology designer can also take another solution: keep the sosa Observation graph and add an object property that repeats and shortens the path between the feature and the value of the characteristic.
I would appreciate it if the authors change their vocabulary. At the end I do not understand the word property: rdf object property or datatype property, sosa observable property, soil property. The same word represent different notions.
The shapechange tool is used to transform the UML diagrams into owl ontology. The UML diagram contains classes from other standards that are not web ontology. Thus during the transformation some mapping rules are expressed to indicate which class from existing ontology will replace the UML classes.
From my understanding, the UML diagram designed at first is a relational database schema where lots of attributes and cardinality constraints are represented. Note that I can not read the UML diagram thus I have to imagine. Those UML classes are aligned with OWL classes from existing ontologies.
Most classes are designed with restrictions.
Q10: I am surprised that for an integration purpose of different datasets so many restrictions and constraints are defined. What happens to the original dataset when they can not fulfill those constraints? They can not be part of the final knowledge graph.
Moreover some of the constraints list the possible values defined a skos:Concept instances.
Could the authors explain a little bit those choices of creating so many constraints and how the original data are handled when those constraints can not be fulfilled.
Unfortunately the classes and instances are not well documented with a natural language definition. In the best case, there is just a label in English which is not enough to understand the meaning of the OWL/RDF element. Some element has a definition mentioning an HTML of an iso standard which is not freely available, thus the link is not useful.
The authors should associate to each element the textual description using rdf:comment annotation property or skos:definition property. Moreover those natural definitions should explain the constraints associated with the class. For example by mentioning mandatory object properties and so one. The authors should be inspired by the definitions provided in SOSA ontology.
Moreover none of the design patterns are described in detail. That makes the ontology quite difficult to understand. More diagrams should be provided. If the authors like UML they could have a look at the CHOWLK language to present some of their design patterns.
I have tried to search in the VOWL interface search but when no textual definition is provided is impossible to understand the graph.
Q11: I do not understand the distinction between Fragments and Fragments Value classes.
Fragments should be renamed FragmentObservations . Where is defined the sample: the soil fragment where the chemical analyzes are made. A fragment is composed of several horizons or layers.
Figure 4 let's suppose that all classes related to soil features are subClass of sosa:Sample. Thus why not name them FragmentSample and so one… This hypothesis is a strong one. How is represented the fact that a physical layer ( or an physical horizon or a plot) can generate several samples? SOSA ontology makes the distinction between the ultimate feature of interest (the whole soil area) and the sample extracted from the feature of interest.
Take care of the name of your class: what means GL in GL_Horizon and GL_Layer?
Q12: In the end, I am not sure that the SOSA design pattern is well understood and reused in this ontology. For example the class “gypsum weight” is defined as a subClass of sosa:Observation and for me it is more a subClass of sosa:ObservableProperty.
Q13: (Page 13 column 1 line 47) “ There are few cases where sosa:observedProperty links the observation with a code-list.” This sentence is unclear for me.
SOSA ontology does not contain observedProperty class but ObservableProperty class.
What is the difference between an observed property from the authors and an sosa:ObservableProperty?
It seems that most of this work is to transform code lists into appropriate SOSA classes (feature of interest or observable property or result or procedure) reusing some part of SKOS model.
This work is interesting but not well presented and hard to understand. A simple example illustrated by several diagrams should be added.
Q14: Does the SKOS hierarchy of sosa:Procedure is linked to or influenced by the SKOS hierarchy of sosa:ObservableProperty?
Finally, the authors should take care to normalize their vocabulary.
Glosis ontology contains lots of defined classes and skos:Concept instances . At the end, I am a little bit confused: what is the ontology (the web data schema) and what are the SKOS models (controlled vocabulary list) and what are the knowledge graphs (the dataset that populates the ontology and reused SKOS models?
All the qualitative values defined in code list and used as result of sosa:Observation are defined as instances of two classes: skos:Concept and a new domain class.
The iso 28258 data model is translated also into an owl ontology in order to provide some alignment between Glosis ontology classes and iso 28258 classes.
The Glosis ontology is available as TTL files from a git repository. The URIs are defined as permanent URIs.
The documentation is produced by widoco tool, but the resulting HTML pages are quite poor. Lack of definitions, problems in the label of classes, no static diagrams available.
----
The maintenance and evolution of Glosis ontology is based on exchange between soil experts and owl experts. To do so some csv files are produced to help soil experts to propose some modifications and review the ontology.
The transformation (csv ↔ owl) is performed by a new tool developed during this project.
If csv is used it will be easy to ask the soil experts to propose their own definition of Glosis elements!
---
Section 6 presents some knowledge graphs of soil dataset using Glosis ontology and related SPARQL queries.
The datasets are :LUCAS top soil dataset, global soil respiration database, World soil information service dataset (Wosis). Wosis dataset was not precise enough thus the authors explain their translation choices.
Q15: Why are iso 28258 classes used in those datasets? I would expect that Glosis ontology is complete and sufficient to represent all the soil information?
All the datasets are available and published on the web using a SPARQL endpoint.
Some SPARQL queries are presented related to different datasets and a federated query is proposed at the end to query all datasets.
Some rest api services were also developed based on previous SPARQL queries to access the results.
---
At the end, some future works are proposed. SOSA ontology does not describe the uncertainty of the measurement, that is one of the possible extensions.
mispelling--------------------------
semantic web should be replaced by Semantic Web.
Doublin core should be replace by Dublin core
Glosis should be written the same way in the whole document GLOSIS GloSIS...
page 2 column 2 line 30: “lack of heterogeneity” are you sure… I would expect lack of homogeneity. If it is not an error could you explain in detail why lack of heterogeneity is a problem.
Page 5 column 2 line 43 “INSPIRE ...most used ontology” INSPIRE data model is not an ontology, is it?
Page 6 column 2 line 27 “bur” I do not know this word.
Page 11 column 1 line 2 “xx” what means XX? I imagine that in “PP_name” expression, PP is the prefix that identifies the standard ! Thus what are TM, CQ and DQ?
Page 13 colmun 2 line 9: “sublass” → subclass?
Page 14 colmun 1 line 38 “SKOS ontology”… I would prefer that the authors make a distinction between thesaurus represented as SKOS model and data schema (ontology).
Page 19 colum 2 line 23 footnote problem
page 20 column 2 line 16: “sport” → support?
Page 22 column 2 line 46: “ the the” remove one
page 25 column 1 line 30 “swathe” do not know this word.
|