A Study of Concept Similarity in Wikidata

Tracking #: 3330-4544

Filip Ilievski
Kartik Shenoy
Hans Chalupsky
Nicholas Klein
Pedro Szekely

Responsible editor: 
Harald Sack

Submission type: 
Full Paper
Robust estimation of concept similarity is crucial for applications of AI in the commercial, biomedical, and publishing domains, among others. While the related task of word similarity has been extensively studied, resulting in a wide range of methods, estimating concept similarity between nodes in Wikidata has not been considered so far. In light of the adoption of Wikidata for increasingly complex tasks that rely on similarity, and its unique size, breadth, and crowdsourcing nature, we pro- pose that conceptual similarity should be revisited for the case of Wikidata. In this paper, we study a wide range of representative similarity methods for Wikidata, organized into three categories, and leverage background information for knowledge injection via retrofitting. We measure the impact of retrofitting with different weighted subsets from Wikidata and ProBase. Experiments on three benchmarks show that the best performance is achieved by pairing language models with rich information, whereas the impact of injecting knowledge is most positive on methods that originally do not consider comprehensive information. The performance of retrofitting is conditioned on the selection of high-quality similarity knowledge. A key limitation of this study, similar to prior work lies in the limited size and scope of the similarity benchmarks. While Wikidata provides an unprecedented possibility for a representative evaluation of concept similarity, effectively doing so remains a key challenge.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 12/Mar/2023
Minor Revision
Review Comment:

The revised version of the paper reads significantly better than the previous one. All the major concerns raised earlier have been addressed in this version, including a clear statement of the paper contributions, detailed information on the design choices and implementations as well as improved figures and interesting discussion on results.

There are a few minor comments and suggestions for the current version as follows -

- Section 1, page 2, line 26-27 --> This statement is missing a reference to the previous methods that have combined the structure and semantics. Please add them.

- Section 1, Figure 1 seems to a be a bit misplaced here - since the reader is not yet introduced to the details of the implementation, the different metrics already shown here seem confusing and the figure cannot be understood fully. Perhaps a simpler variation of the figure or a part of the figure can be shown here as an example.

- Section 1, page 3, line 10 - the evaluation of entity similarity in KGs --> do the authors mean concept similarity here?

- Section 2.2 Related work --> this section needs some restructuring, while the paragraph heading is Word similarity, there are mentions of concept similarity as well in paragraphs 1-3. Then again we come back to word similarity in para 4. Perhaps a separate heading for concept similarity works can be added?

- Section 3.1, page 7, line 16 --> were all the nodes in Wikidata mapped to an abstract in DBpedia? if not, what is the overlap and how is the missing information handled?

- Section 3.1, page 7, line 34 --> The explanation for the Class similarity seems to be a bit vague. Could the authors add an example here to elaborate on how this similarity measure is obtained?

- Section 3.3, page 9, line 33 --> what is the rational behind choosing the lowest numeric ID here? does this choice make any significant difference or is it inconsequential?

- There are several different links to google drive for the different datasets as footnotes in the paper. Are these dataset links also included in the main Github repository? If yes, then perhaps a single reference to the repo where one might find the dataset links (clearly specified) would be better. If not so, can the links be all added in the readme file for better organization and ease of reuse?

Review #2
Anonymous submitted on 19/Jun/2023
Review Comment:

I would like to thank the authors for considering the raised concerns and providing the necessary details as was pointed out in the previous review.
They have mindfully addressed the concerns raised namely the difference between the existing research and their proposed approach and hence I would like to accept the paper.
However, I would like one more addition to the manuscript, which is a brief description of the application area of this concept similarity measure, how generalised the proposed approach is and how the results obtained in this paper i.e., the achieved similar concepts in Wikidata can be further used in research. Therefore, it would be interesting to have future research directions of this work.

Review #3
Anonymous submitted on 30/Jul/2023
Minor Revision
Review Comment:

This paper addresses an interesting topic which is a study of concept similarity in Wikidata. It analyses the impact of retrofitting on both KG and text-based embeddings of concepts/entities.

Review Comments:

- The reason for selecting TransE and ComplEx among many other KGE methods is not discussed.

- Similarly, the reason for selecting DeepWalk is not justified.

- The different problems, such as skewed relations, duplicate relations, and so on, that may arise when preparing datasets to train KGE methods are not considered.

- Even though the retrofitting idea is interesting, I have two concerns with the work: i) the sizes of the datasets presented in Table 1 are too small. ii) The authors conclude that combining structure with text does not provide any benefits over using only text-based embeddings. However, they used TransE and CompLex which do not use any literals (text or numeric). Instead, they could have used other KGE methods which utilize textual descriptions/abstracts of entities in their architecture. This way it would be fair to compare KGEs with LMEs.

- Suggestion: Add the statistics of the datasets used to train the KG embedding models and the Node embedding methods in a table.