Review Comment:
This paper proposes a methodology to build a formal ontology from a thesaurus. The thesaurus is seen as structured resource that can be harvest to reduce the amount of time needed to build an ontology from scratch. The methodology is generally well organized, mostly motivated and well described; some examples are given by applying it to an existing thesaurus.
The paper is well structured, the presentation is fairly clear, the english is good, the topic is important and within the journal's scope.
This reviewer finds the approach and most of the results reasonable and interesting with helpful intuitions and examples. However, the authors make some choices that remain undiscussed and at times unclear. Some aspects seem to be inconsistent and need to be better presented/explained. My comments, following the text, are collected below. Some of them are general and impact the soundness of the approach; since I believe these can be quite problematic given the general framework, I request major revisions.
General observation: each step/action not exemplified in the application (for various reasons) needs to be exemplified in some other way. This is not the case now as I will point out in the comments.
From the introduction: “Controlled vocabularies…often incorrectly referred to as terminologies”, please explain and motivate.
“It is only ontologies using OWL that give hope for integrating independently developed ontologies in a logically consistent way and with correspondence to the represented reality.”
Questions:
1) generally OWL is preferred to other logics (in particular FOL) because of its computational properties but this is not taken as a motivation for your work, so the choice to use OWL cannot be just logical consistency (or formal semantics) and is not really justified;
2) the goal of a thesaurus is (often if not always) not to represent reality but to represent how terminology is organized. In particular, the focus is on the conceptual perspective. This is recognized in the paper in many places e.g. when saying that “such vocabularies […] contain several thousand up to hundreds of thousands of concepts”. One strong consequence of the realist position, at least as presented by the Buffalo school which is here correctly cited, bans even the use of the term `concept’. This shows a need for an explanation: what can we say to claim that the adoption of the realist viewpoint (wrt a more cognitive -or at least neutral- approach) is correct in this context? As far as I can get form the paper, the argument is that “there are authors […] particularly emphasizing the stance of ontological realism.” This is too weak as a motivation. It seems to me that the paper is mixing two issues: building an ontology from a thesaurus and building an ontology that follows a philosophical setting. It is unclear why these two issues are mixed up.
3) It would be interesting and important to know if and where the structure of a realist ontology can make a difference in the methodology (wrt a more cognitive/neutral ontological approach). If so, I suggest you emphasize this as a valuable result of your research and identify it with examples.
You write “Further, our method aims at developing a semantically adequate ontology[…]” and list: full use of the semantic expressivity of OWL, integration with other ontologies, consistency and reasoning results. These are good points but do not suffice to turn an arbitrary system into a semantically adequate ontology. Since the expression is relative to a goal (to be adequate “for” something), here you should remind the reader what the goal of the sought ontology is.
On the methodology: “The reason why we focus on the reengineering of thesauri is that there are structural differences between different types of vocabularies (e.g. simple lists of terms, thesauri, taxonomies or classification schemes [28]) and their reengineering may differ.”
This is an important observation and I fully agree. However, your methodology does not apply to ISO thesauri only since it starts with a check of the correctness of the thesaurus. Thus, being a thesaurus according to the ISO standard is not necessary. You now must make explicit what are the constraints a generic resource must obey to be used by the described methodology.
From Figure 1 it seems that one could work out all the steps up to position 7 and then being forced backwards up to position 1. This is awkward and one can doubt the robustness of the methodology. Can you divide the 7 steps in blocks so that once you move from a block to another you know that the steps from the previous block won’t be repeated again? Is this possible at all?
The name of step 4 should be improved since is sounds weird to “align” relations.
“Such relationships would be considered erroneous in term-based thesauri and should be “transferred” to concept-to-concept relationships, just like the relationships between preferred terms.”
The closing of this sentence is not clear, do you mean “and a similar change should be applied to the relationships between preferred terms” ?
Sect. 3.2, pg.6 “While, in principle, a choice between formal languages can be made, we focus on the popular OWL in its 2nd version”
Part of the content in point (a) does not belong here but to the application example, the methodology should tell about which language to choose depending on some conditions or set minimal constraints on the languages one can use.
This section is a bit a mix between specific and general considerations. I suggest to focus more on the general description of the step and the principles guiding it and, within this, to add pointers to the following application section for specific examples.
“Since modeling relationships between relationships is not subject of thesaurus work, there will be no use of the object subproperty axioms to assert generic relationships.”
I’m not sure I understand your point, could you explain/expand this sentence?
In the application section of pg. 8 you say “It turned out to be not useful to follow the actions described for this step in the case of the AGROVOC thesaurus.” This means that the methodology is not general and that you should expand it at least to include your own case.
I was expecting also examples of problems with the hierarchical relationships and of the mentioned workarounds towards the end of the application section but none is given (you just say that you take the is_a relationship as in AGROVOC). Actually, it is unclear whether the proposed separation between a syntactic step (step 2) and a more semantic oriented step (step 3) is correct since you give semantic arguments discussing concepts and relationships at step 2 also. There is something unsettled in this part of your methodology: while the distinction is fine in general terms, work does not seem to split in this way in practice. What can you say about this?
“It is also desirable to identify necessary and (jointly) sufficient membership conditions that define a class, because it is only defined classes under which other classes can be subsumed by automated reasoning.” Something is missing in the sentence.
Pg. 9 “Sometimes the terms have even multiple meanings in a single community, particularly if there are different schools of thought. In such cases, an ontology may need to contain several classes for a given term, each for every meaning.”
As said a the beginning, I’m confused about the actual goal: are we ontologizing a thesaurus or using a thesaurus to build a (realist) ontology? If the second, then you need to add an initial step to evaluate whether the thesaurus is suitable, i.e., if it can be taken to address reality (whatever one takes that to mean). In such a case, your methodology does not apply to any system classified as a thesaurus by the given ISO. If you don’t start with this initial step, you enter in an unclear loop where you combine two orthogonal goals: ontologizing a structure and changing its content. I’m afraid that requires much more attention on how and when to change the thesaurus content.
For example, at this point we already have identified a concept for each meaning of the terms (according to the thesaurus). Assume in another community, perhaps more scientifically or reality oriented, there are more distinctions for the same term, is this something to consider? how is this taken into account in the methodology? what if there are incompatibilities between your system and the other? This opens a series of problems…
Pg 11 “Since our approach is a scientific one, but also because the AGROVOC thesaurus did not provide any disambiguating hint, we used the reference to carbon to characterize the class ‘organic fertilizer’”
I believe that if this issue where spotted at step 1 you would have introduced two distinct concepts, one for each meaning. Why don’t you do the same here? Later you say “For example, we had to choose between different interpretations of ‘organic fertilizers’ and to decide what to count as ‘plant micronutrient’.” These comments, to be useful, must be combined with criteria or at least suggestions for deciding when one has to choose a meaning out of many and when one has to extend the number of concepts/classes!
Figure 5 is not very informative, it can be dropped. What is relevant (the elimination of some terms) is already explained in the text.
Instead, the relationship btw these classes and those of dispositions is quite important and should be addressed in details (this issue comes up also below)
“as well as to their state after they have been applied to the field and bound or solubilized plant nutrients.” Something missing in the sentence.
Regarding “guidelines for the choice between top-level ontologies”, perhaps you can look at the work of M. Keet from University of Cape Town, South Africa. Although from the foundational viewpoint I don’t consider her approach to be optimal, for the goals of this methodology it could be of help (just a suggestion).
The index of footnote 1 is not correctly displayed (in the text) at pg. 14
“This may be said a weakness of Protégé,” -> “This may be considered a weakness of Protégé,”
“In many cases, the urge to introduce new relationships is due to an insufficient ontological analysis.”
Agreed, very important point. Here an example is needed (since not in the following application description) otherwise the observation would not be understood.
An example is needed after the general claim: “The generic relationships in a thesaurus are prima facie candidates for becoming is-a relationships in an ontology. Since they may be mixed with hierarchical whole-part relationships in a thesaurus, organizing thesaurus concepts into an is-a hierarchy may imply re-combining fragments of the thesaurus that are not related by properly applied generic relations. This, in turn, may require introducing new classes to connect these fragments.”
“We must declare ‘fertilizer’ to be such compound since there is hardly any pure fertilizer material in real-life environments.”
Yet, you might want to talk about the substance in isolation (and it can be isolated): the choice of classifying it as a compound must be explained/motivated further.
“to which classes the newly introduced classes” (drop the first occurrence of “classes”)
“It would have been a tremendous advantage, if ChEBI had been more mature in terms of the mem- bership conditions specified for its classes. It would have saved us tremendous time and spared us to deal with amendments of ChEBI.”
This comment is not useful per se, you should rephrase it into a suggestion to prefer mature and well specified domain ontologies (but how to judge them?)
“It was not absolutely clear to us, if we should model a fertilizer disposition, a fertilizer function or even a fertilizer role. These distinctions need better clarification and guidance. This problem also applies to BFO [51], [52].”
This is a crucial point with no much guidelines, your experience is important: you should report on your choices and the motivations, even if they are just examples and you cannot give general principles.
“The formal specification of classes is realized by adding the necessary membership conditions identified in step 3 as anonymous superclasses using the subclass axiom. …” This part is too specialized, you can move it to the application section.
“and possibly description logics [in] general.”
The following claim relatively to class A is puzzling when one can use a relationship as well as its inverse: “any relationship from a class A to another class B has always the role and logical force of a necessary membership condition for the class A”.
There is an interesting paper at the WOP 2014 workshop by Elena Cardillo et al. which discusses the same topic of this paper with a similar approach. Please, include it in the literature review and state what differentiates the two approaches.
Explain the “**” (double star) in Appendix 2, step 3
Step 4, point a: are these one or two distinct choices ? if they are distinct, can you give an example? why is there no specification about what to do with the chosen formal relations?
|