Contextual Information Retrieval in Research Articles: Semantic Publishing Tools for the Research Community

Paper Title: 
Contextual Information Retrieval in Research Articles: Semantic Publishing Tools for the Research Community
Authors: 
M.A. Angrosh, Stephen Cranefield, Nigel Stanger
Abstract: 
In recent years, the dramatic increase in academic research publications has gained significant research attention. Research has been carried out exploring novel ways of providing information services using this research content. However, the task of extracting meaningful information from research documents remains a challenge. This paper presents our research work on developing intelligent information systems that exploit online article databases. We present in this paper, a linked data application which uses a new semantic publishing model for providing value added information services for the research community. The paper presents a conceptual framework for modelling contexts associated with sentences in research articles and discusses the Sentence Context Ontology, which is used to convert the information extracted from research documents into machine-understandable data. The paper reports supervised learning experiments carried out using conditional probabilistic models for achieving automatic context identification. The paper also describes a Semantic Web Application that provides various citation context based information services.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/special-issue-new-models-semant...

Revised manuscript after an accept with minor revisions - now accepted for publication.

Review 1 by Sudeshna Das

The authors present a sentence citation context ontology, supervised learning approaches to extract citation contexts and a Linked Data application for articles from the European Semantic Web Conference (ESWC) series. They define their own set of citation contexts stating that existing ones were hard to reuse/adapt, which is ok. The new Sentence Context Ontology is a positive contribution and the machine learning approaches and Linked Data application are methodologically sound as well. The only issue I have is that on the whole there is no compelling use case for understanding the context of citation sentences.

Other minor issues that need to be corrected:
* typo on page 2 "While [4] report s on model-ing the contexts of sentences in related work sections of research articles and supervised learning experi-ments for context identification"
* page 8 - 4.2.9 is not following 4.2.8
* page 34 - 8.1.3 is mixed up
* page 35 - "View Contexts of citation sentences" is mingled with the Discussion section (seems like this occurs everytime there is a figure or table spanning the page)
* Figure 9-18 are so blurry that we don't get a good sense of the Linked data application.

Review 2 by Tim Clark

The authors present an interesting and potentially useful approach for classifying and organizing the contexts of scientific citations. Noting that it "is becoming increasingly difficult to keep abreast of research developments in one's field" due to a dramatic increase in output of published research, the authors suggest that their method of textual analysis and organization can provide "value added information services for the research community.

This is a sound motivation for the work reported upon. I have no issue with the computational methods used and believe the authors are correct to attempt to build an application for scientific readers based upon their methods. The authors used the CORESE semantic web engine and it's associated SEWESE server at INRIA (Durville & Gandon 2007) to build an application which is able to query a SPARQL endpoint containing their results and present them in a fairly reasonable looking way to scientific users via Simile's Exhibit interface (Huynh et al. 2007, Karger et al. 2009, etc.).

This would be highly publishable and re-usable result if the following were provided as well:

- URL to the application (we cannot see anything except screen shots provided by the authors) and some instructions on how to use it;
- URL to acquire, inspect, execute and potentially re-use the source code, with licensing terms - preferably open source;
- URL to the SPARQL endpoint where their results are available.

In addition, to validate the authors' claim that their application could provide improved approaches to "strategic reading" across large sets of scientific publications, it would be useful to see the

- statement of the general use cases, specifically, they were trying to build for; and
- some trial of the application with real scientific users other than the authors themselves; with
- an assessment of the outcome, plusses and minuses, and conclusions.

Unfortunately these are missing. If these were provided I would certainly recommend publication, but as it is I cannot.

Some additional points to consider:

- the work is not as well-referenced as it ought to be.

For example, the references I pointed out above for SEWESE and for Exhibit, are in published conference proceedings. The authors reference, in their place, web pages for informal presentations and the like. This type of issue ought to be cleaned up when and if the authors re-submit a modified version, which I would recommend they do.

Tags: 

Comments

REVIEWER 1

Review Comment: The only issue I have is that on the whole there is no compelling use case for understanding the context of citation sentences.

Our Reply: The last section is rewritten as a use case

Review Comment: Other minor issues that need to be corrected:

* typo on page 2 "While [4] report s on model-ing the contexts of sentences in related work sections of research articles and supervised learning experi-ments for context identification"

Our Reply: Corrected

* page 8 - 4.2.9 is not following 4.2.8

Our Reply: Corrected

* page 34 - 8.1.3 is mixed up

Our Reply: Corrected

* page 35 - "View Contexts of citation sentences" is mingled with the Discussion section (seems like this occurs everytime there is a figure or table spanning the page)

Our Reply: Corrected

* Figure 9-18 are so blurry that we don't get a good sense of the Linked data application.

Our Reply: Corrected - Created new screenshots in bitmap format

REVIEWER 2

Reviewer Comment:
- URL to the application (we cannot see anything except screen shots provided by the authors) and some instructions on how to use it;
- URL to acquire, inspect, execute and potentially re-use the source code, with licensing terms - preferably open source;
- URL to the SPARQL endpoint where their results are available.

Our Reply: We have added the link to the website for accessing the
Sentence Context Ontology. Regarding the training dataset and the application, please note that they are developed using the data available from Springer. We have written to Springer for making the application and the dataset public and are waiting for their response for making them publicly available. However, the dataset and the application can be given to the users for experimentation upon request provided they have appropriate access to Springer database. If we hear back from Springer before the paper is accepted we will send an updated version with URLs for the application and data.

Reviewer Comments: In addition, to validate the authors' claim that their application could provide improved approaches to "strategic reading" across large sets of scientific publications, it would be useful to see the

- statement of the general use cases, specifically, they were trying to build for; and
Our Reply: The last section is rewritten and the use cases are included in the Conclusion Section

- some trial of the application with real scientific users other than the authors themselves; with; an assessment of the outcome, plusses and minuses, and conclusions.

Our Reply: We are currently working on a web application for ScienceDirect following the methodology and techniques discussed in the paper. An evaluation study would be carried out as our future work in order to assess the application. It would be difficult to provide results of evaluation in a short time. However, these facts are mentioned as our future work in our conclusion section.

Some additional points to consider:

Reviewer Comment: the work is not as well-referenced as it ought to be. For example, the references I pointed out above for SEWESE and for Exhibit, are in published conference proceedings. The authors reference, in their place, web pages for informal presentations and the like. This type of issue ought to be cleaned up when and if the authors re-submit a modified version, which I would recommend they do.

Our Reply: References Cleaned Up