Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

An extended study of content and crowdsourcing-related performance factors in Named Entity Annotation

Submitted by Markus Luczak-Roesch on 01/06/2017 - 14:40

Tracking #: 1535-2747

Authors:

Oluwaseyi Feyisetan

Elena Simperl

Markus Luczak-Roesch

Ramine Tinati

Nigel Shadbolt

Responsible editor:

Guest Editors Human Computation and Crowdsourcing

Submission type:

Full Paper

Abstract:

Hybrid annotation techniques have emerged as a promising approach to carry out named entity recognition on noisy microposts. In this paper, we identify a set of content and crowdsourcing-related features (number and type of entities in a post, average length and sentiment of tweets, composition of skipped tweets, average time spent to complete the tasks, and interaction with the user interface) and analyse their impact on correct and incorrect human annotations. We then carried out further studies on the impact of extended annotation instructions and disambiguation guidelines on the factors listed above. This was all done using CrowdFlower and a simple, custom built gamified NER tool on three datasets from related literature and a fourth newly annotated corpus. Our findings show that crowd workers correctly annotate shorter tweets with fewer entities, while they skip (or wrongly annotate) longer tweets with more entities. Workers are also adept at recognising people and locations, while they have difficulties in identifying organisations and miscellaneous entities which they skip (or wrongly annotate). Finally, detailed guidelines do not necessarily lead to improved annotation quality. We expect these findings to lead to the design of more advanced NER pipelines, informing the way in which tweets are chosen to be outsourced to automatic tools, crowdsourced workers and nichesourced experts. Experimental results are published as JSON-LD for further use.

Full PDF Version:

swj1535.pdf

Previous Version:

Towards hybrid NER: an extended study of content and crowdsourcing-related performance factors

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 21/Feb/2017

Suggestion:
Accept

Review Comment:

The paper has improved substantially from the initial submission. While the previous revision was good in quality, it presented the problem that it was a nearly verbatim copy of the ESWC paper published by the authors. I appreciate the effort that the authors have made on this occasion to rewrite and flesh out parts of the paper to differentiate it from the published ESWC paper, as well as to address issues pointed out by reviewers.

The work is now clearly presented and well written, and the methodology and analysis of results are sound.

Review #2

Anonymous submitted on 09/Mar/2017

Suggestion:
Accept

Review Comment:

The paper is a significant improvement on the prior version and for me, ready to be accepted. It begins to raise some interesting questions, but rather than address them all here, it seems best to go ahead with this paper now and leave the questions it raises to future research. A noteworthy and rounded contribution.

Review #3

Anonymous submitted on 15/Mar/2017

Suggestion:
Accept

Review Comment:

The paper has been revised and extended since its original submission. It addresses well all reviewer suggestions from the previous version. It is now ready to be accepted.

For the camera ready version, I recommend just extending the related datasets section to include a recent large crowd-sourced NER dataset:

Leon Derczynski, Kalina Bontcheva, Ian Roberts. 2016. Broad Twitter Corpus: A Diverse Named Entity Recognition Resource. 26th International Conference on Computational Linguistics (COLING)

Log in or register to post comments
11708 reads

Main menu

Editorial Board

Syndicate

An extended study of content and crowdsourcing-related performance factors in Named Entity Annotation

Tracking #: 1535-2747

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

An extended study of content and crowdsourcing-related performance factors in Named Entity Annotation

Tracking #: 1535-2747

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles