Extracting Entity-specific Substructures for RDF Graph Embeddings

Tracking #: 2060-3273

Muhammad Rizwan Saeed
Charalampos Chelmis
Viktor K. Prasanna

Responsible editor: 
Guest Editors Knowledge Graphs 2018

Submission type: 
Full Paper
Knowledge Graphs (KGs) have become useful sources of structured data for information retrieval and data analytics tasks. Enabling complex analytics, however, requires entities in KGs to be represented in a way that is suitable for Machine Learning tasks. Several approaches have been recently proposed for obtaining vector representations of KGs based on identifying and extracting relevant graph substructures using both uniform and biased random walks. However, such approaches lead to representations comprising mostly popular, instead of relevant, entities in the KG. In KGs, in which different types of entities often exist (such as in Linked Open Data), a given target entity may have its own distinct set of most relevant nodes and edges. We propose specificity as an accurate measure of identifying most relevant, entity-specific, nodes and edges. We develop a scalable method based on bidirectional random walks to compute specificity. Our experimental evaluation results show that specificity-based biased random walks extract more meaningful (in terms of size and relevance) substructures compared to the state-of-the-art and the graph embedding learned from the extracted substructures perform well against existing methods in common data mining tasks.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 23/Dec/2018
Review Comment:

The paper is much better than the previous version. The authors organize the structures, languages, graphs, and experiments. And this paper is easier to read and understand.

The additional Fig. 6 is a good example to show the progress of generated document-like representation. But it will be better to append the different weights to different edges.

Review #2
By Meng Zhao submitted on 11/Feb/2019
Review Comment:

The authors have proposed in the paper the hierarchy-adjusted specificity score as a metric for measuring semantic proximity in the context of RDF graphs. In addition to rigorous definitions and efficient computational algorithms, it might be interesting to include some of the following items in evaluations/discussions:
1. Comparison between the proposed method and popular ones like DeepWalk and/or Deep Graph Kernels.
2. Benchmark the ranking algorithm on popular LTR datasets, as the argument that director,knownFor is more relevant to a film than director,subject might not be impartial enough.
3. More demonstrations might be needed regarding the beta factor. For instance, apart from treating the parameter as a hyperparameter, could there be any benefit to learn the estimations from data?