Review Comment:
The manuscript presents a comprehensive framework for capturing, analysing and optimising the impact of changes in ontologies - and hence addressing the pressing issue of ontology evolution. In addition to taking into account the basic change operations, the framework models the evolution process in terms of dependencies across conceptual layers, types of impacts, optimal strategies for dealing with the change impact, as well as a method for computing the severity of the impact. The validation of the framework is performed in the context of an ontology-based content management system using a particular change operation and several scenarios for computing the severity of the impact of this operation. Finally, the authors also perform a usability study.
The authors should be commended for the effort put into this work, as well as in the manuscript. Overall, their framework is probably the most comprehensive change impact analysis solution to date, and in addition to its intrinsic value, the manuscript can very well be seen also as a review of the state of the art in the area. On the other hand, there are a few aspects that could be improved or at least addressed by the authors, and the most important ones are: a) the presentation of the manuscript (which sometimes is too abstract and other times too verbose); b) the justification of some of the design decisions - in particular in the context of the severity of the impact and the cost of evolution; c) the evaluation, which is, in my opinion, the weak point of the manuscript - especially since the effort put into the development and description of the framework does not seem to be very well balanced with that put into understanding its capabilities and limitations. Detailed comments are provided below.
1. Introduction
The major flaw of the introduction is the lack of a clear placement of the work in the general context of ontology evolution. The authors start by discussing the use of OCMS, without providing some concrete examples - can you name a few well-known OCMSs? - or drawing a line between the application of their framework in general to ontology evolution as opposed to in this OCMS context. This separation is extremely important because the rest of the technical description relies on a series of assumptions - such as the presence of the layered architecture introduced in the manuscript, which might not necessarily be the case in a typical ontology authoring / engineering setting. Finally a more concrete listing of the novel aspects introduced by this manuscript is required - i.e., provide a clear set of contributions in comparison to [12, 13, 14, 15, 16] (here, the authors could include the "literature overview" value of the manuscript).
Some other comments:
* The use of citations throughout the manuscript is fairly confusing. For example, why is [2] provided in that context of the first paragraph ([2] is Gruber's paper on ontologies, while the actual context set by the paragraph is about OCMS)? Similarly, the authors tend to use "coupled" citations in several places - e.g., [5][6] or [7][8] in the 3d paragraph of the introduction, but also in many other places in the manuscript - which are not justified. Does a general statement such as "a change of one entity may cause many unseen and undesired changes and impacts on dependent entities" require 2 citations? - the same applies for the next statement in the same paragraph.
* The use of DBPedia as an example of a large knowledge source is not really justified in that context, because the real issues appear in complex ontologies - i.e., the ones with heavy semantics (e.g., deep classification hierarchies, large number of axioms, multiple inheritance) - and not necessarily in large volume, yet flat knowledge bases, such as DBPedia. The authors should be able to provide a more convincing example here.
* "Moreover, a given change request can be realised using different evolution strategies" - please provide an example.
* The language could be improved throughout the manuscript - e.g, "This is achieved by embedding semantics by annotating the target content using ontologies" -> "This is achieved by leveraging semantics via ontology-based annotation of the content".
2. Related Work
This is, in principle, fine, although the language could be improved:
* 2nd paragraph: "The work … The work …"
* 3rd paragraph: "The authors [7][20][21] have proposed …" -> The research described in [7][20][21] proposes …
* 3rd paragraph: "Furthermore, the authors give emphasis …" -> "Furthermore, the authors emphasise …"
3. OCMS Principles
Before discussing the actual framework, it is probably worth restating the application or focus of this work. Again, it is not clear if the framework is targeted strictly towards OCMSs - which is quite a limitation in my opinion since they're not that many of them used in real-world settings - or is generic and hence applicable to any ontology engineering / authoring context.
With respect to the first part of section 3 - most of it is unnecessary, and can be reduced to one paragraph. The authors should have the readers and Journal in mind when writing. For example, the description of the Ontology Layer is, in fact, the general definition of an ontology and it should be described as such and not as a contribution of an ontology layer in the OCMS. Furthermore, taking into account the readership of the journal, this whole section is de facto knowledge and can be left out.
Some other comments:
* In 3.1 is not clear if the provided example is a running system or a toy example created for illustration purposes only.
* Why was it necessary to introduce so many new ontologies to deal with aspects as simple as a document structure - surely the authors could have used some of the existing ones? - see for example DOCO (http://lov.okfn.org/dataset/lov/details/vocabulary_doco.html)
* Does the DocBook ontology really require 3 citations? [31, 32, 33] ?
* The Help and Software ontologies are underspecified - What are these? Why are they really needed? What are some concrete examples of their use? Are they known ontologies or they were created for the purpose of this exercise?
* "The domain ontology is also known as the application ontology" -> fairly strange statement - could the authors rephrase it?
* The description of this application ontology is also underspecified
* The graph notation and the example described in 3.2 introduce a series of arguable aspects. For example, the use of properties as both nodes and edges is confusing - especially later on in the manuscript when change operations are applied on them. Furthermore, Figure 3 contains, beside some typos (e.g., instaneOf -> instanceOf), the rdfs:instanceOf property which is undefined. Could the authors clarify this aspect?
* Please add a reference to the set definitions on the following page in the context of the edge definition (i.e., that large union in the Ontology Graph paragraph)
* "The edges are referred as triples" -> not quite clear what the authors try to state here (the Annotation Graph paragraph on page 5)
* The Attributes of the Graph description mixes formal definitions with programmatic methods - it would be better if the authors would be consistent and use only formal definitions. Moreover, this paragraph is not really needed.
4. Change Request Capturing and Representation
* Firstly, the title could be improved - e.g., "Capturing and Representing Change requests"
* The authors could improve the readability of the first paragraph by adding a couple of examples. Also, the multi-citation issue is present here again - [5][39] and [40][41] - are they really necessary? can the authors provide a better context for these citations?
* In 4.1, in the last sentence - "we identify dependencies that are useful for implementing …" - how exactly is this identification performed?
* 4.1.1 could be significantly reduced. Both algorithms presented here are unnecessary - they are very basic ontology / graph operations that bring no added value to the manuscript and take a lot of space. The authors should probably resume here to providing a simple schematic definition for the three types of dependencies
* In the definition of the partial dependency - since N1 is defined as partially dependent on N2 Pdep(N1, N2) -> why is N2 then mentioned in the existential quantifier?
* In 4.1.2: "Using an empirical study …" -> could the authors provide additional details on this study? A better justification of the list of chosen dependencies is required.
* Concept-Concept dependency - this is where the use of Properties as both nodes and edges becomes confusing. The textual definition of this dependency mentions "concept nodes", while the formal definition restricts it to classes. In order to enable a clearer and easier way to follow the various definitions, the authors could perhaps simplify the graph notation, or start the description of the framework with a terminology subsection that clarifies the use of certain terms, such as 'concept'. Furthermore, a short discussion on the effect of using other properties as foundation for this type of dependency would be interesting - e.g., in the biomedical domain (in particular contexts) "part-Of" is an equally important relation - how do the authors see this being integrated into their list of dependencies?
* In 4.2 in Cascade Strategy: an example is required to help understand this strategy, specifically in the context of the statement: "In case of addition, when we add an entity, we need to add all other entities that make the new entity semantically and structurally meaningful."
* In 4.2 in Attach-to-Parent / Root Strategy: how does this strategy look like for something else than "Delete"? Can the authors provide an example? ("… link all affected entities to the parent entity …")
5. Change impact analysis process
This section is fairly well structured and written and it provides a comprehensive overview of all the aspects required to be captured in the process of analysing change impact. There are only a few things that could be improved:
* some of the types of impacts could have better names: e.g., 'entity more / less described" reads slightly strange. Similarly, 'entity incomparable'
* in 5.2 the reference to Table 2 should probably be a reference to Table 1
* Algorithm 3 is again not needed
* In 5.3 an example of a composite change would improve readability
6. Optimal strategy selection
While the aspects described so far in the manuscript are to a large extent derived from previous work, in my opinion, the computation of the severity of impacts and the cost of evolution are the most important contributions of the work, however unfortunately, also the ones that were the most shallowly treated. It is understandable that the authors went for the most straightforward options when creating the computational framework, but then a clear justification is required, in addition to a discussion on how would it look like if one would like to go beyond the simple linear aggregation of the impact weights or strategy costs.
For example, in 6.1: "… we use heuristics to measure the severity … such as tolerance, […] amount of time and expertise required …" - all these elements are important, yet hard to quantify, and thus, it is quite disappointing to see everything reduced then to a 'heuristically' chosen value of 0.6 or 0.8. The authors should try to do a much better job in justifying the choice of values if a proper framework for computing them is not provided. This is particularly important, because the cost of evolution and especially the validation process depends on these values.
7. Validation
The validation description is, unfortunately, the weakest aspect of the manuscript because it provides a very limited view over the full capabilities of the framework. Assuming that the example provided in the manuscript is enough to exhibit the application of the framework (although it would have been good to provide at least another operation - for comparison purposes), a detailed and quantitative (not only empirical) analysis of the choice of weights is crucial. Otherwise, there's really no difference between what the authors provided and a random choice. Furthermore, the authors motivate their work, in the beginning of the manuscript, via the need of a framework that is able to handle large and complex ontologies. However, the validation is performed on a small dataset - which is unrealistic for a proper, real-world context. Hence, unless the authors expand their validation to a much larger scale, they should at least discuss the behaviour of the framework in such a setting.
8. Evaluation
The usability study detailed in this section is similar, in setting, to the framework validation - i.e., low number of participants, underspecified study details, shallow results analysis. The authors should at least provide the complete study design, including aspects such as the time allocated for learning the tools or details on previous experience with ontology engineering tools and the OCMS framework. This should be then complemented with a detailed discussion of the results. For example:
* a score of 3.3 in the last question is not so positive - what is the reason behind / interpretation of this result.
* "The cost estimation is suitable to measure impacts" = 4.0 - in my opinion, this is a very surprising result considering that the participants should have a deep understanding of the weight assignment scheme and of the actual impact of the change operation, which is quite hard to believe that a general ontology user and a novice user may have.
Finally, 8.3 should be expanded with a more detailed discussion of the state of the art tools.
|