Diversified Top-k query research on temporal knowledge graph

Tracking #: 3504-4718

Authors: 
Li Song
Wang Zhe
Zhang Liping

Responsible editor: 
Elena Demidova

Submission type: 
Full Paper
Abstract: 
In order to quickly query the results with high weight and low redundancy on the temporal knowledge graph, this paper proposes a diversified Top-k query algorithm for the temporal knowledge graph. Firstly, the storage scheme and index structure of the temporal knowledge graph combining snapshot and log mode are given. A unit matching algorithm and an overall matching algorithm based on snapshot mode are proposed to obtain a set of snapshot matching results. A matching algorithm based on log modes is proposed to obtain a set of log matching results. Then, the TgTop algorithm, a diversified Top-k query algorithm, is proposed, and the results with high weight and low redundancy are obtained. In addition, an efficient pruning algorithm, TgTop-Pruning algorithm, is proposed to reduce the time overhead of diversified Top-k processing. Finally, experimental results on four real datasets show the effectiveness and efficiency of the proposed query algorithm.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Simon Gottschalk submitted on 24/Aug/2023
Suggestion:
Reject
Review Comment:

The authors propose a set of index structures and algorithms for the task of diversified Top-k querying in temporal knowledge graphs. There are several issues with writing/paper style, redundancies as well as formalism, intuition and evaluation, plus a missing attribution to a re-used figure.

=== Intuition ===
- Abstract: The abstract mentions several concepts (snapshot and log mode, unit matching) and algorithm names (TgTop, TgTop-Pruning) which are not explained in a way that the reader gets an intuition about the approach while reading the abstract. Similar for the description of the second contribution in Section 2.
- Example: Fig. 1 as an example should be larger, and node and edge labels should be added directly to the figure.
- Snapshot and log mode: In abstract and introduction, the terms “snapshot and log mode” are introduced but not properly explained. In Section 4, authors mention “RDF graphs include snapshots or incremental logs” which still does not well explain these concepts. Formally, the term “snapshot” should be mentioned in Definition 2. Logs are denoted as “Lt_i^j” in Section 4 but not introduced before.
- Node weights: One of the main objectives of the approach is to return results with high weights. However, the idea behind weights is only insufficiently introduced. Authors mention “high-weight results”, “importance of the results” and node weight is later computed based on node degree, but given how important this concept is, I would like to see more discussion about node weights.

=== Formalism ===
- Members of a set should be in lower case letters (e.g., in Definition 5: “nodes {V_1, V_2, …}”).
- Weight matrix: As the weights of nodes change over time, W shoud be defined to map from VxT to R.
- Temporal Matching (Definition 4 and Section 6.4): I do not understand the need for introducing temporal matching in such details. According to the definition, query time and graph time match if they overlap or have the inclusion relationship. However, if one time interval includes another, they also overlap. The resulting time interval “[max(t_s,t_a),min(t_e ,t_b)]” is valid for all cases. Consequently, I do not see the need for Definition 4 (specifically, for Table 1) and for some computations in Section 6.2.
- The adjacency table in Section 4 mentions IDs and labels of nodes that should be introduced earlier. The sentence “We first classify each Label according to its type” is unclear.
- Problem Statement: Equation (8) itself is not the problem; the problem is to find the set M that maximises F(M). The equation is taken from [31] which needs to be mentioned.

=== Approach ===
- Overview: Since the approach introduces 10 algorithms and three types of matching (unit, overall and log), I would appreciate to get an overview of these components first.
- Node types: Section 5 mentions “there are not many node types”. What does node type refer to here? Is this a characteristic of the dataset?
- Labels: The different indexes use the notion of “labels” but it is not clear how these labels are used.
- Redundancy: Section 6.2.1 is highly redundant: The texts before and after Fig. 9 are repetitive, and Algorithms 1-6 are highly similar. I would suggest only to have one algorithms where only specific lines are changed w.r.t. the pattern.
- Is the (S,P,O) a SPARQL “ASK” query?
- Contributions: The paper solves the same problem as [31], but on a temporal knowledge graph. Since results are similar (except for the proposed pruning approach which is clearly faster), a more detailed comparison to [31] should be provided. This way, the authors should make clear the novelty and contributions of this paper, which seem low given the text repetitions and the similarity to [31].

=== Evaluation ===
- Datasets: It is not clear how exactly the evaluation datasets are created. For example, Wikidata only has 1.1M nodes, less than YAGO, although Wikidata has many more nodes. It is also unclear how annual update interval is created and which years are considered. Are the annual Wikidata snapshots historic versions of Wikidata or are they created w.r.t. the temporal properties in a recent Wikidata snapshot?
- Queries: No information is given how the queries in Section 7.3 were created and no examples are given. For the overall matching algorithm in Section 7.4, more details are given (“randomly”) but not sufficient.
- Linearity: The text states that the proposed algorithm “does not increase the query time linearly”. But Fig. 13 shows a linear trend.

=== Presentation ===
- Redundancy in text: Several parts of the article are rather repetitive. An example are the repetitions of “In this paper, … datasets is selected for the study of temporal knowledge graphs” in Section 7.2 and the sentence “In this paper, the diversified Top-k query problem of the temporal knowledge graph is referred to as the diversified Top-k query (DTQ) problem.” in Definition 8.”
- Chinese characters in diagrams: The diagram labels in Fig. 12 and the caption in Fig. 12 f) are Chinese.
- Text style: The font and formatting changes regularly throughout the paper (e.g., in the last paragraph of page 1 and in Section 7.5).
- Citations: References to literature should be formatted properly. Not “Zheng W[5] et al”, but “Zheng et al. [5]”. Also not “Literature [22]”.
- There are several sentences which are hard to read and thus it is hard to understand some core messages of the article. Examples include “A temporal knowledge graph matching Q is called if and only if the following conditions hold. denoted as M_i [t_s,t_e ](t_s

Review #2
Anonymous submitted on 08/Oct/2023
Suggestion:
Reject
Review Comment:

In this paper, the authors have proposed various algorithms for indexing and matching for diversified top-k querying in the temporal knowledge graphs.
The related work mentions current methods, but none of them are used as baselines to compare the work.
One of the major issues is the citations. The work is based on the baseline paper ('Diversified Top-k Querying in Knowledge Graphs') but it is not referred to in the problem statement section. Figure 1 is a replica of the figure from the baseline paper with some minor changes. There are many claims that have not been cited.
The writing is not up to par with the standards of the journal (uneven font, unclear figures, uneven spacing).
Some of the information is in Chinese and cannot be comprehended.
On the technical level, the contributions, such as storage, indexing, and matching algorithms, are not compared to any state-of-the-art methods.
No supplementary material is provided.
In its current state, I would suggest rejecting the paper.