Applying Linked Data Approaches to Pharmacology: Architectural Decisions and Implementation

Paper Title: 
Applying Linked Data Approaches to Pharmacology: Architectural Decisions and Implementation
Alasdair J. G. Gray, Paul Groth, Antonis Loizou, Sune Askjaer, Christian Brenninkmeijer, Kees Burger, Christine Chichester, Chris T. Evelo, Carole Goble, Lee Harland, Steve Pettifer, Mark Thompson, Andra Waagmeester, Antony J. Williams
The discovery of new medicines requires pharmacologists to interact with a number of information sources ranging from tabular data to scientific papers, and other specialized formats. In this application report, we describe a linked data platform for integrating multiple pharmacology datasets that form the basis for several drug discovery applications. The functionality offered by the platform has been drawn from a collection of prioritised drug discovery business questions created as part of the Open PHACTS project, a collaboration of research institutions and major pharmaceutical companies. We describe the architecture of the platform focusing on seven design decisions that drove its development with the aim of informing others developing similar software in this or other domains. The utility of the platform is demonstrated by the variety of drug discovery applications being built to access the integrated data.
Full PDF Version: 
Submission type: 
Application Report
Responsible editor: 
Guest Editors

Submission in response to

Revision of a manuscript which was conditionally accepted pending major revisions. Now accepted for publication. First round reviews are beneath the second round reviews.

Review 1 by Boris Villazon-Terrazas

I am happy to see that my comments have been addressed properly. I am
in favor of accepting this publication as it is now.

Review 2 by Rafael Valencia-García

I am satisfied by how the authors have solved the issues that I detected in my previous review.

Review 3 by Jesualdo Tomás Fernández Breis

The authors have addressed all my comments in a satisfactory way. They have updated the content of the paper and included new information that will certainly be of interest for the readers.

I thank the authors for allowing us to access the system for performing this review. I have been able to check that the system provides the functionality described in the paper.

I think the public release of the system by the end of 2012 will attract the interest of many researchers and that it will certainly make an impact in the community.

The paper does not need any linguistic improvemen and the quality of the figures is good. Some references might be completed, in particular:
[14] Nucleic Acids Res. 2012 Jan;40(Database issue):D71-5
[19] Nucleic Acids Res. 2012 January; 40(D1): D1100–D1107.
[29] Nucleic Acids Res. 2012 January; 40(D1): D1301–D1307.

First round reviews:

Review 1 by Boris Villazon-Terrazas

This paper is an application report that details the architectural decisions for implementing a LD platform to support drug discovery that integrates available data from semantic web resources.

The authors present the related work and motivate the research questions of the paper. Then, they present the design decisions by providing an architecture overview. Finally, they provide the implementation details of the system.

Minor comments:
- Add a reference to the things the first time they appear. For example: UniProt (page 2) has a reference the second time it appears (page 4). Please fix it accordingly.
- Figure 1. I think some component names have to be updated, e.g., Identify Mapping Service and Identify Resolution Server components. They are not reference on the text.

Review 2 by Rafael Valencia-García

The paper presents a nice linked data platform called OPS for integrating multiple existing pharmacology datasets, is well written and it fits into the Special Issue.

The architecture of the platform is explained focusing on seven architectural decisions and components that form part of the platform.

The work is sound, and it illustrates the design and implementation decisions for the development of the OPS platform.

There are a few points that might be clarified/addressed:

- In order to facilitate readers to understand the complexity of the work, the questions shown in section 3 should be explained in a more exhaustive manner (i.e. which data have to be integrated and which filtering has to be done).

- Figures 2 and 3 should be described in detail.

- The resolution and visibility of figures 3, 4, 5 and 6 should be improved.

- It would be desirable to extend the related work section explaining some details of the works cited, avoiding multiple citations such as [4,9,16], and to provide a more detailed discussion.

Review 3 by Jesualdo Tomás Fernández Breis

Quality, importance, and impact of the described application

The authors describe in this paper how their platform based on Linked Data to support Drug Discovery processes and applications. The results correspond to a joint effort between academic institutions and pharma companies. Being able to develop useful tools that provide a bridge for both worlds using Linked Data technology is certainly a contribution of interest. The description of the application seems promising and could represent a clear example of how developments should be done in this area to have a success beyond academia. It provides some methodological information relevant for other researchers and the expected impact in its research area is positively considered in this review, since the approach provides mechanisms for adding data into the platform and different ways of exploiting the linked data repository.

However, the paper does not include a validation of OpenPhacts or reports the deployment of the system. The paper describes how the data of the platform is being integrated into third-party applications. At it is said that the alpha version of the platform is available only for the members of the consortium, but will not be available for the community until october.

Clarity and readability of the describing paper

The paper is well written and is easy to follow and understand even for researchers with not much experience on semantic web technologies. The paper is correctly motivated and the relevant related work is discussed appropriately. The authors provide an explanation for their main architectural decisions, which are very useful for other researchers willing to develop Linked Data systems. However, the content is sometimes too brief, which might also be due to length constraints. Section 4 describes the decisions and Section 5 describes the implementation of OpenPhacts. Given that Section 4 is intended to be generic and the most reusable part, I would have expected not reading OpenPhacts specific content in this section. This applies specifically to 4.1.

The use of technological solutions such as IRS and IMS seem very interesting to have a flexible, adaptable platform. Regarding IRS, it would be interesting to see how this relates to on-going efforts like in which representatives of OpenPhacts participate.

If I have not misunderstood the text about the CoreAPI, it seems that a series of predetermined parametrized queries can be issued by using such API. I wonder to what extend this limits the exploitation of the Linked Data by third-party applications and how such parametrized queries were designed.