Crowd-based Ontology Engineering with the uComp Protege Plugin

Tracking #: 894-2105

Authors: 
Gerhard Wohlgenannt
Marta Sabou
Florian Hanika

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Full Paper
Abstract: 
Crowdsourcing techniques have been shown to provide ef- fective means for solving a variety of ontology engineering problems. Yet, they are mainly being used as external means to ontology engi- neering, without being closely integrated into the work of ontology engi- neers. In this paper we investigate how to closely integrate crowdsourcing into ontology engineering practices. Firstly, we show that a set of basic crowdsourcing tasks are used recurrently to solve a range of ontology engineering problems. Secondly, we present the uComp Protege plugin that facilitates the integration of such typical crowdsourcing tasks into ontology engineering work from within the Protege ontology editing en- vironment. An evaluation of the plugin in a typical ontology engineering scenario where ontologies are built from automatically learned semantic structures, shows that its use reduces the working times for the ontology engineers 11 times, lowers the overall task costs with 40% to 83% depend- ing on the crowdsourcing settings used and leads to data quality com- parable with that of tasks performed by ontology engineers. Evaluations on a large ontology from the anatomy domain confirm that crowdsourc- ing is a scalable and effective method: good quality results (accuracy of 89% and 99%) are obtained while achieving cost reductions with 75% from the ontology engineer costs and providing comparable overall task duration.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 06/Jan/2015
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

This paper focuses on the problem of crowd-sourcing a few of the sub-tasks in ontology engineering. Because little is known about the effectiveness of crowd-sourcing for ontology engineering, this is welcome and novel piece of research that contributes to the literature on methods for ontology construction.

The authors have clearly articulated three research questions: Which ontology engineering tasks can be crowd sourced? How should they be implemented? Can they scale? The authors present a good review of the prior work and suggest three broad areas: building vocabularies, aligning vocabularies and annotating data. Their own work falls in the category of building vocabularies.

They identify four major tasks that concern building vocabularies: verification of term relatedness, verification of relation correctness, verification of relation type, and verification of domain relevance. All four of these are micro tasks. Verification of relation correctness and relation type require a higher degree of knowledge engineering expertise than the other two tasks.

The authors have implemented a plugin in Protege to crowd source each of these four tasks.
Most of the features in the tool are straightforward and obvious. The feature to perform
recursive tasks takes advantage of the taxonomic structure of the ontology to efficiently
setup tasks to be crowd-sourced.

The evaluation is reasonably well thought out and uses metrics for time, cost, usability and the quality of the ontology produced. The authors use four small ontologies in the domains of finance, wine, tennis and climate change, and one large ontology for humans for their scaling experiments.

The results show that by crowd sourcing the authors are able to observe significant reductions in cost and time taken for domain relevance, subsumption correctness, and
instance of correctness tasks. The usability was observed at approximately 85%.
For the relation domain specification, the inter-rater agreement went down. This was
expected as this is a more complex task and the authors did nothing to ensure
good accuracy. For the scalability experiments, the authors were able to show similar improvements.

While I found the research reported in the paper to be systematic and thorough, but it is
still limited to very simple tasks. It is unclear how much fraction of the overall
ontology development cycle is devoted to each of the tasks studied. For example, if the
ontology relevance task takes only 1% of the overall ontology development time, and if
the authors even achieve 50% improvements, it makes little contribution to reducing
the overall cost.

The authors seem to be stuck in the old-fashioned paradigm for crowd sourcing where
each micro task is a low skill task. It is unclear if ontology development really
lends itself to such decomposition, and whether it will always require higher quality,
and perhaps, even paid crowd labor for the tasks. To see one example of going beyond
low-skill micro tasks, see the recent work on flash teams: http://stanfordhci.github.io/flash-teams/

The authors really need to be thinking beyond the micro task/low skill crowd labor model.
Given that ontology engineering will require higher level of skills,could they think of
ways in which some of the expertise required could be codified in the tool? or, the crowd
could be enlisted to go through some minimal training in exchange for paid work? One possible approach is to build in relation selection guidelines that provide a framework for
users for choosing relations. For example, consider the work reported in: http://cogsys.org/pdf/paper-9-3-17.pdf

Review #2
By Benjamin Adams submitted on 29/Mar/2015
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

This is an significantly updated version of the previously submitted EKAW paper on the uComp Protege Plugin, designed to enable easy crowdsourcing to support ontology engineering. The paper describes an original system that is verified through extensive evaluation and is well-written apart from some comments described below. The authors have adequately addressed my comments on the previous EKAW version, and have extended the evaluation and have demonstrated that crowdsourced ontology engineer is a valid approach in a number of cases (providing high quality results at reduced cost) and is scalable. By developing a working plug-in and performing a strong evaluation of its use, I think this paper makes a very good contribution to the journal. The authors have demonstrated that crowdsourcing can help in a variety of domains, and places groundwork for future work on expert sourcing, which I think will be necessary requirement for many ontology engineers working in specialised domains.

I have some comments regarding the writing, which follow:

In the abstract and introduction in particular I find the language is unnecessarily vague in places and could benefit from some rewriting.

Abstract:

“they are mainly being used as external means to ontology engineering” what is meant by “external means” here?

“costs with 40% to 83%” — “with” should be changed to “by”
“reductions with 75%” - “with” should be changed to “of”

Introduction:
In the first paragraph, what is meant by the Web “changed the context surrounding knowledge rich systems”? What kinds of systems are you talking about? Do the *systems* integrate knowledge sources? The first two sentences could be rewritten so there is less repetition and meaning is clearer.

The shift from the Web and “knowledge rich systems” being distributed to the discussion of WebProtege and ontology engineering seems abrupt. In the first case I get the sense you are talking about content generation on the Web, which is something fundamentally different from ontology engineering.

Also in first paragraph: phrases like “a natural step” should be removed. They don’t add to understanding and are just filler.

Second paragraph: “Similarly to” should be “similar to”

Third paragraph: “re-using existing or automatically derived ontologies” do you mean re-using existing ontologies or automatically *deriving* ontologies? There should be a citation for the claim that “extracted ontologies” “typically contain questionable or wrong ontological elements …”. The notion of ontology verification being “tedious” for an ontology engineer is somewhat amusing considering that ontology engineering is that person’s job. Makes me wonder about the quality of any ontology that does not have engineers who are willing to attend to detail.

Top of page 3: “micro-workers” should be defined.

On page 6 (and elsewhere): A different font or italics should be used to differentiate relations like *other*, *is a sub-category of*, *is identical to*, etc. from the normal text.

Page 7 (under T3 and elsewhere): You do not need to repeat citations to the same reference in one sentence — all citations should just go at the end of the sentence.

Bottom of page 9: “Task Specific Information such as the concept selected by the user for validation.” is not a complete sentence.

Figure 8: This question seems very ambiguous to me because the directionality of the relation is not clear, unless the person interprets *subject* *object* and *relation* in the correct way. I wouldn’t assume that for average users. It might make sense to also add a validation on the user with a pre-set number of questions that ensure the users are understanding the task correctly (projects like Galaxy Zoo do this).