LOPDF: A Framework for Extracting and Producing Linked Open Data of Scientific Documents

Tracking #: 1669-2881

This paper is currently under review
Ahtisham Aslam
Naif Radi Aljohani
Rabeeh Ayaz Abbasi
Ali Daud
Saeed-Ul Hassan

Responsible editor: 
Guest Editors LD4IE 2017

Submission type: 
Full Paper
The results of scientific experiments and research conducted by both individuals and organizations are published and shared with the scientific community in various types of scientific documents such as books, journals and reference works. The metadata of these documents describe important properties, such as has_Author, has_Affiliation, has_Keyword and has_Reference. These can be used to find potential collaborators, discover people with common research interests and research work, and explore scientific documents in matching domains. The major issue in obtaining these benefits from the metadata of scientific documents is the lack of availability of this data in a well-structured and semantically enriched format. This limits the ability to pose smart queries that can help to perform various types of analysis on scientific publication data. To address this problem, we have developed a generic framework named Linked Open Publications Data Framework (LOPDF). The LOPDF framework can be used to crawl, process, extract and produce machine understandable data (in RDF format) about scientific publications from various publisher-specific sources such as web portals, XML exports and digital libraries. In this paper, we present the architecture, process and algorithm used to develop this LOPDF framework. The RDF datasets produced can be used to answer semantically enriched queries by employing the SPARQL protocol. We present quantitative as well as qualitative analyses of the LOPDF framework. Finally, we present the potential usage of semantically enriched RDF data and SPARQL queries for various types of analyses of scientific documents.
Full PDF Version: 
Under Review