Detecting new and arbitrary relations among Linked Data entities using pattern extraction

Tracking #: 1454-2666

This paper is currently under review
Subhashree Balachandran
P Sreenivasa Kumar

Responsible editor: 
Claudia d'Amato

Submission type: 
Full Paper
Although several RDF knowledge bases are available through the LOD initiative, often many data entities in these knowledge bases remain isolated, lacking metadata and links to other datasets. There are many research efforts that focus on establishing that a pair of entities from two different datasets are indeed semantically same. Also, many research efforts have proposed solutions for extracting additional instances of an already existing relation. However, the problem of finding new relations (and their instances) between any two given collections of data entities has not been investigated in detail. In this paper, we present DART - an unsupervised solution to enrich the LOD cloud with new relations between two given entity sets. During the first phase DART discovers prospective relations from the web corpus through pattern extraction. We make use of paraphrase detection for clustering of text patterns and Wordnet ontology for removing irrelevant patterns, in this process. In the second phase, DART performs actual enrichment by extracting instances of the prospective relations. We have empirically evaluated our approach on several pairs of entity-sets and found that the system can indeed be used for enriching the existing linked datasets with new relations and their instances. On the datasets used in the experiments, we found that DART is able to generate more specific relations compared to the relations existing in DBpedia.
Full PDF Version: 
Under Review