Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Sebastián Ferrada
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Krzysztof Janowicz
Sabrina Kirrane
Agnieszka Lawrynowicz
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
Sanju Tiwari
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Information Extraction meets the Semantic Web: A Survey

Submitted by Aidan Hogan on 05/31/2018 - 17:51

Tracking #: 1909-3122

Authors:

Jose L. Martinez-Rodriguez

Aidan Hogan

Ivan Lopez-Arevalo

Responsible editor:

Andreas Hotho

Submission type:

Survey Article

Abstract:

We provide a comprehensive survey of the research literature that applies Information Extraction techniques in a Semantic Web setting. Works in the intersection of these two areas can be seen from two overlapping perspectives: using Semantic Web resources (languages/ontologies/knowledge-bases/tools) to improve Information Extraction, and/or using Information Extraction to populate the Semantic Web. In more detail, we focus on the extraction and linking of three elements: entities, concepts and relations. Extraction involves identifying (textual) mentions referring to such elements in a given unstructured or semi-structured input source. Linking involves associating each such mention with an appropriate disambiguated identifier referring to the same element in a Semantic Web knowledge-base (or ontology), in some cases creating a new identifier where necessary. With respect to entities, works involving (Named) Entity Recognition, Entity Disambiguation, Entity Linking, etc. in the context of the Semantic Web are considered. With respect to concepts, works involving Terminology Extraction, Keyword Extraction, Topic Modeling, Topic Labeling, etc., in the context of the Semantic Web are considered. Finally, with respect to relations, works involving Relation Extraction in the context of the Semantic Web are considered. The focus of the majority of the survey is on works applied to unstructured sources (text in natural language); however, we also provide an overview of works that develop custom techniques adapted for semi-structured inputs, namely markup documents and web tables.

Full PDF Version:

swj1909.pdf

Previous Version:

Information Extraction meets the Semantic Web: A Survey

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Simon Scerri submitted on 14/Jun/2018

Suggestion:
Accept

Review Comment:

The journal has properly addressed the comments and suggestions provided in the first round. It now specifically contains a detailed explanation of the survey methodology which was lacking in the first submitted version.
Furthermore, the authors have integrated the additional recommended references in their work.
We have no further concern or significant remarks and therefore recommend to accept this journal paper.

Review #2

By Dat Ba Nguyen submitted on 18/Jun/2018

Suggestion:
Accept

Review Comment:

The updated version has addressed all of my concern and comments. Thus, I would be happy to accept this manuscript.

Review #3

Anonymous submitted on 31/Jul/2018

Suggestion:
Accept

Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

Log in or register to post comments
26488 reads

Comments

Review

Permalink Submitted by Michelle Cheatham on 08/07/2018 - 05:42.

This is a well-written survey of information extraction techniques that involve the semantic web. The survey is clearly organized and very comprehensive.

One challenge of a survey like this, where relevant work has been done by researchers from many different communities, each with their own terminology, is to group and label the subject matter in a meaningful and accessible way. The authors’ groupings on the whole make sense (though there were a couple of distinctions that I’m not sure are important from a practical perspective), and they use footnotes to list common synonyms for each major grouping’s label.

Each major section contains a list of currently outstanding research questions in the sub-field discussed in that section. These tend to be someone repetitive across the different sub-fields; however, they do highlight issues that I frequently find are overlooked, such as the need for more fine-grained evaluation approaches that avoid treating techniques as monolithic black boxes, and the need to evaluate runtime and memory usage in addition to traditional performance metrics such as F-measure.

The discussion of overall trends in the field near the end of the paper is well done and very useful.

The paper contains an appendix that overviews traditional NLP and information extraction techniques, allowing it to stand alone for its intended audience (Semantic Web researchers and practitioners). I originally thought the references given for more comprehensive surveys of this material were somewhat dated, but in looking further myself I couldn’t find anything more current.

Of minor note, Table 2 indicates that 89 papers were surveyed but the conclusion says, “In terms of the 109 highlighted papers in this survey…” which seems inconsistent.

Response to Open Review by Michelle Cheatham

Permalink Submitted by Aidan Hogan on 09/01/2018 - 11:42.

The authors thank Michelle for her comments. Indeed, integrating the terminology used in different papers from different areas was a significant challenge when preparing the survey, both in terms of finding papers, and ultimately in writing the text. We have tried to strike a good balance between being faithful to the terminology used in the literature while adopting a coherent, self-contained terminology in the survey text.

Regarding the open questions that we present, while we acknowledge there is repetitiveness across the sections (where issues like evaluation procedures, scalability, etc., appear in all sections), we ultimately decided that this was a necessary compromise to have an independent list of questions per section (especially since there are important nuances to these common themes that are specific to each section).

Regarding Table 2, these 89 papers include only those works accepting text as input (the main focus of the survey). A further 20 papers that consider semi-structured inputs are added later in Section 5. We add a footnote to the text describing Table 2 to clarify this issue in the camera-ready version.

Main menu

Editorial Board

Syndicate

Information Extraction meets the Semantic Web: A Survey

Tracking #: 1909-3122

Comments

Review

Response to Open Review by Michelle Cheatham

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Information Extraction meets the Semantic Web: A Survey

Tracking #: 1909-3122

Comments

Review

Response to Open Review by Michelle Cheatham

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles