Non-ontological sources transformations for Ontology Engineering: knowledge acquisition using redundancy

Tracking #: 741-1951

Authors: 
Fabien Amarger
Jean-Pierre Chanet
Ollivier Haemmerlé
Nathalie Hernandez
Catherine Roussey

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
Non-ontological sources like thesauri or taxonomies are already used as input in ontology development process. Some of them are also published on the LOD. Reusing this type of sources to build a Knowledge Base (KB) is not an easy task. The ontology developer has to face different syntax and different modelling goals. We propose in this paper a methodology to transform several non-ontological sources into a single KB. We claim that our methodology improves the quality of the final KB because we take in account: (1) the quality of the sources, (2) the redundancy of the knowledge extracted from sources in order to discover the consensual knowledge and (3) Ontology Design Patterns (ODP) in order to guide the transformation process.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 13/Aug/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject
-1

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)
4

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
3

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
3

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present
3

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4

Review
Please provide your textual review here.

The paper presents a three steps methodology to transform non-ontological sources (NOS) into a single KB (extracting mainly taxonomical information), by exploiting a notion of "source quality" / "trust". The paper reports mainly on the first two steps of the methodology.

The approach is based on many established works (NEON, LogMap, etc.) so the main novel part seems the introduction of this notion of source quality and its exploitation in the integration of the contributions coming from the different sources (merging). The applicability of the former part is indeed not clear. For instance, how are the source criteria assessed (especially the first and the third one)? Which guidelines should experts follow to do this? The three criteria are then straightforward combined in a linear combination, although it is not clear even what a score for each of them could be (e.g., value(S,Crit_i) in (1)).

Concerning the aligning and merging step, a sigmoid function appears out of the blue, and, although the rationale is somehow understandable, it is not clear why it has been defined exactly that way.

Finally, the evaluation part is rather preliminary/weak. No details are provided for instance on the definition of the quality value of the sources. The evaluation is applied on the task of building two taxa. No quantitative measure on the candidates extracted is provided (3, 30, 300, 3000?). It is not clear also how the approach performs against the output produced by each single input source, that is, it is not clear if the methodology actually improves the quality of the resulting KB (as claimed by the authors).

So, all in all the authors should workout the approach more in details, and present a more thorough evaluation.

Review #2
Anonymous submitted on 25/Aug/2014
Suggestion:
[EKAW] combined track accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

1
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.

3
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
5
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Review
This paper describes a methodology for combining multiple Non-Ontological
sources (NOS) into well-defined Knowledge Bases. The proposed method builds
on the NeoN methodology. This methodology is extended through the idea of
linking different Neon modules with Ontology Design Patterns (ODPs), more
specifically transformation patterns for converting NOS constructs into
ontological constructs. NOS constructs from various data sources are merged
through an alignment process. The paper introduces notions of trust in the
NOS, and a combined trust for the merged constructs is calculated.

The paper presents a running example in the agricultural domain and the
results are evaluated by domain experts. Results indicate that the trust
value is a good predictor for evaluated precision, even though recall is
suboptimal.

The paper has a very clear structure and is both on the abstract level and
through the use of a concrete example clear in its content. The methodology
is relatively straightforward and intuitive, but based on well-established
methodologies and presented in sufficient detail. I agree with the authors
that much of the actual effort is domain- and context-dependent and that the
method is presented at the right level of abstraction. The state of the art
is well-described and it is clear how the proposed method is an extension of
previous work.

The experiment and the subsequent evaluation are interesting and provides a
good initial validation of the proposed method, although more details should
be given:
- For the two examples how many ontology constructs are actually created and
how did the process go. How much effort did it take the Ontologist and the
Domain expert to do this? Who filled these roles in this experiment?
- What is the influence on the evaluation results of the various steps: the
alignment, the transformation patterns or the exact trust scores calculation?
(an extensive exploration here might be something for future research)

Some other issues I would like to see resolved:

- The proposed methodology assumes that one source receives a single source
quality. It is more intuitive that parts of a source are more trusted than
others. For example in one agricultural thesaurus I might trust the
label-relations (scientific names) completely, but be unsure of the taxonomic
relations or vice versa. How realistic and influential is this
simplification? Could the authors expand on this?

- Furthermore, for the source quality, three criteria are needed. It is
unclear how values for these should be determined and how the weights are
determined. Is this done by the ontologist? A domain expert?

- Many (most) references in the bibliography are incomplete, often with
missing journal names and volume / numbers missing.

- Although the paper structure is very clear, there are a lot of grammar
issues and typos in the text. I suggest the authors do a thorough
proofreading to correct these. Below I've listed a number or errors, but
there are most likely more.

abstract: "also published on the LOD" -> published as Linked Open Data.
p2: "societal and environmental issues, regulatory framework," -> frameworks
(or a regulatory framework)
p3: some metrics to evaluate thesaurus, -> some metrics to evaluate a
thesaurus (or thesauri)
p4: Due to its genericity we choose to work on the Neon methodology. -> what
is meant here, is it an extension of the methodology or is it based on the
methodology?
p4: scnerario -> scenario
p4: this scenario enrich -> eriches
p9: for some project -> projects
p12: which ontological object were -> objects were

In conclusion, I think this is an interesting paper for the EKAW conference,
and I think an extended version of the paper would also be of interest to the
SWJ. These extensions could be in the form of more detailed experiments and
the suggested future work of the authors.

Review #3
Anonymous submitted on 27/Aug/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

-1

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

5

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

3

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

3

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

2

Review
Please provide your textual review here.

The paper address and important and pertinent problem, i.e. the acquisition of knowledge from non ontological sources in order to bootstrap the ontology construction process for those domains that are rich in structured or semi structured data, or published as LOD. These non ontological sources can contribute to different stages of the ontology engineering lifecycle. This paper proposes to start building a new ontology by developing a module through ontological patterns and details a transformation process that extracts knowledge from multiple non ontological sources to adapt and enrich the module with ontological elements that are common across the sources, and hence are deemed trusted.

The main issue with the paper is that the presentation is high level, and not provide sufficient details to fully assess the relevance of the contribution and to assess the experimental results provided in the evaluation. The proposed process assesses the quality of knowledge that is common to the different non ontological sources according to three criteria: source reputation, source freshness and its level of agreement with the module used to bootstrap the ontology construction. These are just some of the many measures proposed in the literature. However, these measure are only described in words, and no precise mathematical formulation is provided with their formal characteristics, but the paper only mentions the weighted aggregation function. It would be useful to have a much more precise definition, and especially of the similarity between the sources and the given module. This impacts the experiments, because it is not clear the effect that each of these measure has on the evaluation and whether changing any of the used measures would change the results obtained.

One other issue concerns the merging of the different knowledge bases using logmap. One of the issues with aggregating redundant knowledge sources is that when aligning them the resulting alignment might present more candidate mappings for each given entity in that is being aligned. Logmap uses reasoning to resolve some of these cases, but for some mappings it requires some form of validation by the knowledge engineer. Is this an aspect that affects the proposed process, and if so, how?

More detailed comments
Page 8, Source quality definition. The mathematical formulation of the quality measures and criteria proposed are missing.

Page 10, definition of degree. If the degree(e_i, e_j) is 0 does this mean that there is no mapping or that there is no confidence or trust in the mapping between e_i and e_j? These are two different aspects, and the paper should be more clear. If a different alignment system was used, could the value of the degree function be different and hence could the mapping be considered?

Page 10, Figure 4: is x i-j on the arcs in the graph the similarity between xi and xj (i,j: 1..3)?

Page 10: definition of the sigmoid function. The sigmoid function is used to normalize the intuition of confidence, but wrt what?

Page 11: The relation hasHigherRank is extracted but not fully explained What is the role in the transformation pattern. If this is not always hierarchical it should be defined better, is the measure module dependent? is it just about generality?

Page 13, section 5.2: Only candidate with a trust score of 0.9 are extracted. How is this value determined? Have you tried varying the trust score to see whether there is a minimum trust score below which the experimental results degrade? Why not consider a trust score of 1 to consider only the entities that are certain?