A Complex Network Model for Knowledge Graphs’ Relationships

Tracking #: 3780-4994

Authors: 
Hassan Abdallah
Beatrice Markhoff
Arnaud Soulet

Responsible editor: 
Elena Demidova

Submission type: 
Full Paper
Abstract: 
When dealing with the structure, content, and quality of Knowledge Graphs (KGs), most analyses focus on entities, overlooking the significance of relationships and their evolution. In this paper, we introduce KRELM, a novel and efficient graph model that mimics the behavior of facts accumulation in crowdsourced KG and accurately simulates the evolution of their structure. By modeling the decentralized process of crowdsourcing, KRELM reproduces key distribution patterns found in relationships, demonstrating that the facts in a KG can be generated incrementally, either by adding new entities or by further describing existing ones. Our theoretical analysis of KRELM reveals that the distribution of facts for relationships follows an exponential law for subjects and a power law for objects, enabling a deeper understanding of knowledge graph dynamics. Experimental validation on major KGs shows that KRELM successfully captures a large part of the structure of real-world relationships, and a longitudinal study of Wikidata confirms its effectiveness in predicting relationship evolution. This work opens new avenues for analyzing and benchmarking KGs.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Dec/2024
Suggestion:
Accept
Review Comment:

The authors have thoroughly revised the manuscript in line with the reviews, and my comments have been properly addressed. In my view, the paper is now ready for publication.

Review #2
Anonymous submitted on 19/Feb/2025
Suggestion:
Minor Revision
Review Comment:

First, I would like to thank the authors for addressing my previous comments. The revisions for them are more or less sufficient. However, I am still not entirely convinced the paper is ready for publication in its current form. The paper's structure and writing style still need improvement.

My comments are as follows:

p 1, l 43 - would be good to provide examples of such crowdsourcing techniques (e.g. X, Y)
p 2, l 48 - 49 - this statement needs a bit more elaboration. Why is it not possible? Provide a reason.
p 3, l 8 - knowledge semantics -> using only "semantics" here is sufficient
p 3, l 32 - should be "to better understand..."

The introduction to section 2's content is good however it is missing an overview of what the section will present. Some of the statements sound more like a summary, which should be either in each paragraph that supports them or at the end of the section in a separate subsection. It would be good to start by elaborating why you overview related work on deep learning etc. in this starting paragraph instead of directly stating facts.

p 4, l 40 - Do you mean "fortunately"? The sentence is contradicting.
p 4, l 34 - Why do these authors ignore these assertations and what do they focus on instead? This sounds currently like a half statement and needs more substance.
p 4, l 37-37 - This reads as a start of the paragraph instead of where it is located now. Further, what has changed since the first menthods. Inserting such statements without explanation or further context is not helfful to the reader.

Since this is a paper submitted to the Semantic Web journal, I do not think it is necessary to provide 2 paragraphs of definitions of terms such as knowledge graph. A sentence in the introduction, when the term is first mentioned is sufficient. It will also be good to acknowledge one or a few of the already existing definitions commonly used as standard within the Semantic Web community. The same comment stands for bipartite graph.

Having this in mind, the methodology should be a separate section and not a subsection. In the same paragraph, the phrase "in the next sections" is incorrect. The section should also first present an overview of the methodology and then elaborate on each step.

p 6, l 32-35 these statements need references.

"It is easy to see that Algorithm 1" - not necessarily. It is better to elaborate or reference.

At times the paper reads not so much as a scientific research paper but rather as a chapter from a (text) book or a lecture. I would suggest avoiding starting sentences with "Remember that", "Of course", "Let us come back to...", "Notice that".

The source code is online and well documented which is as expected and overall a plus for the paper.

Section 5 should start again with an overview of the sections explaining what the reader will learn in the section.

The following statement -"Notice that after loading and filtering the dumps, we stored all the relationships’ statistical features in a relational database. As this preprocessing is very costly, all subsequent experimental calculations were based on a local relational database management system (RDBMS), which is highly efficient and not harmful to the planet" - is unclear and does not have added value for the paper in my opinion. This is usually not the writing style in papers. Further, the provided facts (about sustainability if that is what the authors mean and about cost (in what sence)) are also unclear.

"The experimental study in the previous section has validated the ability of our model to reproduce real-world crowdsourced KGs at a given point in time, but also longitudinally. " This needs to be better phrased.

The discussion section is interesting and well presented. It adds value to the paper.

The extensive experimental study on four KGs, with a longitudinal study concerning Wikidata, also yields clear lessons - such as???
The limitations of the work in the conclusions section are not clear (or even missing).

Minor comments:
Mixing up British and American English spelling of some words.
Inconsistent capitalisation in references' titles and publication venues (e.g. [2][3][38]).
Terms such as RECOIN should be defined and abreviated and then used consistently in the whole paper.
Figure 2 should be closer to where it is descibed in the text.
There is a missing reference to the Gini coefficient.
Terms should be defined, abreviated and used consistently in the paper (e.g. knolwedge graph and KG)