A Complex Network Model for Knowledge Graphs' Relationships

Tracking #: 3666-4880

Authors: 
Hassan Abdallah
Beatrice Markhoff
Arnaud Soulet

Responsible editor: 
Elena Demidova

Submission type: 
Full Paper
Abstract: 
When dealing with Knowledge Graphs (KGs) structure, content, and quality, the focus is generally on entities. We show here that modelizing individual relationships, with their evolution, is also possible. This brings new opportunities for conducting various analyses on KGs or for improving benchmarks. Relationships matter: we present KRELM, the first – simple yet powerful – graph generative model able to (i) closely mimic a large set of crowdsourced KG relationships and (ii) simulate well their evolution. In particular, for crowdsourced KGs, we show that the decentralized process of crowdsourcing is able to produce distribution patterns that are reproducible using KRELM. In this model, the facts of a relationship are added one by one, either by adding new entities or by describing existing ones, with asymmetric attachment between subjects and objects. The theoretical analysis of KRELM enables us to understand the fundamental dynamics of a knowledge graph, where the distribution of facts for each relationship follows an exponential law for subjects and a power law for objects. Our experiments show on several major KGs that KRELM perfectly reproduces the structure of a large part of their relationships. Moreover, a longitudinal study of Wikidata underlines our model’s relevance in predicting this structure’s evolution.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 16/Sep/2024
Suggestion:
Major Revision
Review Comment:

The paper presents a novel approach to understanding and simulating the knowledge graph's structure evolution. While the paper is quite interesting to read, I find it difficult to follow. The authors have produced a significant piece of work, however, its presentation as a paper lacks structure and flow between the different sections. The resources and code are openly available and have been documented which is a plus.

The abstract does not provide a good overview of what the main research question is, what motivates the work and why a new approach is needed. Currently, the authors provide only 2 sentences on this at the start of the abstract. I suggest they provide more context here.

Crowdsourcing does not directly result in a knowledge graph. Social editing is only one aspect of crowdsourcing. The start of the introduction needs to be paraphrased as it lacks flow of content and the facts cannot be so easily generalised.

The role of semantics needs to be better elaborated in the introduction.

It would be better if the paper 1st presented the main motivation of the work and then the research questions.

"This approach has gained momentum in fields as diverse as computer networks and biology."-> Provide references to examples

"To answer these two research questions, this paper explores a novel approach that has never yet been applied to KGs: the modelling of complex network structure and dynamics" -> this does not clearly present the approach. What is the approach exactly?

Missing definition of "bipartite".

In the contributions outline, what do the authors mean by "generative model"? Which model exactly? Clarify what the main contributions are: approach, an AI model, or an algorithm? What are the constraints to implementing your approach?

The start of the related work is good, however, here the authors should also mention what the subsections present. Conclusions can be kept for the end of this section.
Now subsection 2.1 discusses deep learning methods and I am missing the connection to what the authors are doing. The introduction was mostly about crowdsourcing and the research questions. I suggest better interconnecting and aligning the work thus far in the paper.

Missing references to the global and sequential approaches mentioned on page 3, line 51. In this section, I believe the authors are confusing crowdsourced data with crowdsourced knowledge graphs. It should be clarified what the authors mean.

A section presenting the followed methodology and the approach itself in short is missing. I would advise the authors to check similar papers accepted in this journal. Usually, such a section is needed after the related work and before the implementation details of the approach.

Why should one note the implementation language of the approach? Does it matter if Python or Java are used? If yes, explain why.

"In this way, all subsequent experimental calculations were based on SQL queries on a local RDBMS, 50
51 which is highly efficient and not harmful to the planet." This needs to be paraphrased. Do you mean sustainability and the environmental impacts of the developed solution?

Define what "ablation study" entails.

Overall, the evaluation strategy and methodology need to be better presented at the start.

The conclusions should start with a summarisation of what the paper presented, followed by the evaluation findings, current limitations and directions for future work. The conclusion now is too long and detailed. It needs to be more concise and up to the point. Not introducing new knowledge but summarising and concluding the paper. The findings, which are presented in the conclusion, can be put in a separate section before the conclusion.

Minor comments:
It is not common practice for sentences to start with a reference. The names of the authors followed by a reference should be mentioned if this is what the authors want the sentence to start with.

Double-check for consistency in the capitalisation of terms.

Avoid jargon such as "craze".

Terms should be defined and abbreviated when 1st mentioned. Later on, the abbreviations should be used consistently. This is not the case now.

Missing introduction to section 4 and its subsections.

Missing definition of RDBMS.

Review #2
Anonymous submitted on 30/Sep/2024
Suggestion:
Major Revision
Review Comment:

This paper describes a complex network generation approach proposed to simulate the statistical structure of KGs. The approach focuses on replicating the degree of subject/object nodes and the coverage of relations.

The approach seems sound and is explained in detail. Experiments have been performed to demonstrate the effectiveness of the approach.

I have a few questions and concerns that I would like to see addressed:
- Although I think the assumptions and theoretical framework make sense in general, I do not see much consideration of the specifics of KG. In other words, how do you distinguish this approach from general graph/complex network generation approaches? The authors claim that this is the first approach for KG, in which case why should we consider it important for KG rather than general networks?
- No baseline is considered. As mentioned in the previous point, I do not see why general complex network generation approaches in other domains cannot be used as baselines.
- Please make it clear throughout the paper: whether the approach is to replicate the entire KG structure or a specific property at a time.
- Please clarify early in the paper which aspects of the structure the approach focuses on.
- Please further justify how the attachment assumptions can apply to the relations in a KG. A more complete data analysis than the two example properties would be useful.
- Please go through the paper and make sure the notations are clearly described/defined before used, e.g. p_e.
- It would be interesting to see some evaluation results for the relations, e.g. a bin chart showing the number of relations that fall into different ranges of Prec and Cov. Correspondingly, some discussion and error analysis on the relations that do not result in good performance would be useful.