ChronoGrapher: Event-centric Knowledge Graph Construction via Informed Graph Traversal

Tracking #: 3725-4939

Authors: 
Ines Blin
Ilaria Tiddi
Remi van Trijp
Annette ten Teije

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
Event-centric knowledge graphs help enhance coherence to otherwise fragmented and overwhelming data by establishing causal and temporal connections using relevant data. We address the challenge of automatically constructing event-centric knowledge graphs from generic ones. We present ChronoGrapher, a two-step system to build an event-centric knowledge graph from grand events such as the French Revolution. First, an informed graph traversal retrieves connected sub-events from large, open-domain knowledge graphs. We define event-centric filters to prune the search space and a heuristic ranking to prioritise nodes like events. Second, we combine a rule-based system and information extraction from text to build event-centric knowledge graphs. ChronoGrapher demonstrates adaptability across datasets like DBpedia and Wikidata, outperforming approaches from the literature. To evaluate the utility of these graphs, we conduct a preliminary user study comparing different prompting techniques for event-centric question-answering. Our results demonstrate that prompts enriched with event-centric knowledge graph triples yield more factual answers compared to those enriched with generic knowledge graph triples or base prompts, achieving groundedness scores of 2.85, 2.24, and 1.11 respectively, while preserving succinctness and relevance.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/Nov/2024
Suggestion:
Major Revision
Review Comment:

In their article, the authors present ChronoGrapher – an approach for the automated creation of event-centric knowledge graphs. Since the semantic modelling of events is a challenging task required for creation of event knowledge graphs, this is highly relevant. The authors show how to extract and model relevant information from generic knowledge graphs, evaluate ChronoGrapher, amongst others, through a RAG-based user study and made the code to re-run experiments available. However, I see several issues with the clarity of the goal and problem statement, the evaluation and methodology as listed in the following.
1. Clarity of goal: The general goal of the KG extraction process, specifically the graph traversal, is not described clearly. For example, within Section 1, step-1, or RQ1, or “first component” (Section 3.2) is described as “searching cues in a massive memory”, “event subgraph extraction”, “How to extract relevant events from a generic KG?” and “extracting relevant content from the generic KG”, in Section 3.2 as “relevant content extraction” and “Informed graph traversal”.
2. Related Work: In the first paragraph, the connection of “(i)”, “(ii)”, and “(iii)” is rather unclear; specifically, for “(ii)”. In “(i)”, a stronger motivation should be given to distinguish between “manual” KG creation and automatic KG creation (also see my comment 7). The connection of the different methods in (ii) to the task of event-centric KG creation should be made more explicit. In (iii), the impression is that RAG would help to understand the implicit knowledge in LLMs, while it rather does the opposite (explicitly input knowledge).
3. Problem statement: Section 3 and maybe also 3.2 could benefit from a more formal problem statement. For example, I struggle with “searches all sub-events”. If it searches for all, why is any filtering needed? And doesn’t the graph also contain non-events such as persons? Even in the case of events, are they all sub-events (e.g., what about preceding events)?
4. Definitions: Several definitions miss motivation (specifically, Definition 8 and 11), maybe some intuitive examples (Definition 8) and some definitions are unclear: If, according to Definition 5, expanding a node is about finding all its ingoing and outgoing nodes (-> Definition 3 and 4), why does Definition 6 suddenly turns this into a ranking problem (what do scores even mean in this context)?
5. Filters: The role of the filters should be motivated more clearly. At first, it seems counterintuitive to skip locations and persons since these entities typically are the core elements of an event (I assume it means these nodes are not further expanded but still relevant parts of the event graph).
6. Extraction from text: RQ2 combines two tasks (semantic modelling and information extraction from text) which seem rather unrelated. In general, the extraction from text plays a very minor role before Fig. 5 and should be better motivated already in the beginning. Two questions: (i) In an event-centric KG, what is the fraction of triples generating during the extraction from text compared to the other triples? (ii) Is this step solely following what was done in [7] or does it include original research?
7. Evaluation: Many parts of the evaluation are performed as a comparison to EventKG. Here, it would be good to better describe how exactly the precision, recall and F1 scores are computed. Also, while it is stated that EventKG was created “manually” (which is confusing since also EventKG is extracted automatically from different generic KGs) it can be assumed not be perfect/complete, so I would like to see a more in-depth comparison of an event represented in EventKG and with CronoGrapher. This comparison could, on the one hand, show that ChronoGrapher misses out relevant information, but, on the other hand, also show that it succeeds in skipping non-relevant information. This should motivate why ChronoGrapher is favoured over (or at least complementary to) EventKG in some settings.
8. Evaluation (RQ3): While I like the setting of Section 4.3 since it actually tests the applicability of the KG in realistic settings, I miss lots of details (specifically, in the first paragraph of Section 4.3): (i) How are the metrics defined? (ii) How are the prompts created; what prompting strategy do they follow? (iii) Are there examples of the 6 types of questions?
9. Evaluation (Data): All examples and all events in Table 6 are about conflicts, which leads to the question how the approach performs on, for example, political and sports events.
10. Data: It would be great to see an actual example KG file in the repository.
11. Methodology: In general, the whole methodology (except for extraction from text, which is, however just application of a pre-trained model) is rather basic and heavily relies on heuristics. While I do not require that AI methods need to be part of each paper, there is a need for a strong motivation why the selected approach is superior over learning-based methods.

Minor:
- Algorithm 1 needs some re-working. The part with input parameters is strangely formatted and some more mathematical/pseudo-code notations would help (e.g., instead of “add node in N”). The comments in curly brackets are confusing, should be formatted differently. “to_extract” is not used.
- Fig. 1: It is unclear if “French Directory” and “French Consulate” are really events. This should, at least, be discussed.
- Notations: Try to use a more standardised notation: For example, “e={e_1, …}” is confusing; a set should be upper-case (“E”). Also, the double-use of “n” for “node” and “number of iterations” is confusing.
- Fig. 3: A proper description of what is going on in Fig. 3 would be helpful. Since Section 3.2 gives for each of the four stage a description (Page 7, lines 25-30), an algorithm description (line 30-36) and an example sub-image in Fig. 3, maybe restructure it and describe it stage-by-stage with these three elements?
- Definition 11 comes out of nowhere and should be better introduced/motivated.
- The notation in the evaluation section is sometimes unclear. For example, the parameter “domain_range”, the notion of “who = 1”.

Very minor:
- Abstract: The three groundedness scores in the abstract are a bit too detailed / non-intuitive if metric not known to the reader.
- Page 4, line 42f: I don’t understand the “output …is regions” sentence.
- Fig.3: “DB” is mentioned in the caption but nowhere else.
- Page 7, line 17: Sentence “For each pattern…” is unclear.
- Page 7, Lines 25-35: Consistently use “Stage” and don’t switch to “step”.
- Page 7, Line 33: “ ; “
- Page 19, Line 2: “and and”
- Page 19, Line 47: “[are] available on GitHub”
- Table 8: Runtime formatted differently in bold.

Review #2
By Luis-Daniel Ibáñez submitted on 21/Nov/2024
Suggestion:
Minor Revision
Review Comment:

This paper presents an approach to generate Event-Centric KGs, understood at those constructed with the SEM and NIF ontologies,
from generic KGs. The approach is comprised of two steps, (1) a Breadth-first search to extract events from the input KG and (2) a rule-based translation of extracted events into an event-centric KG.

Experiments are extensive, including quantitative and qualitative dimensions

I think the motivation for Digital Humanities is fair enough, and I wonder how this can be extended to other types of events or things like the Prov vocabulary (that IMHO also yields event-centric graphs)

Things I believe should be improved for publication:

* In you preliminaries you formally define "events", but you don't define "sub-events"

* You mention in the introduction that your objective is to "extract sequences of events as a plausible answer of a question", but a question is never an input of any of your algorithms, you use it as part of an evaluation of usefulness of the KGs

* I don't think you answer RQ3 "how to evaluate the utility of event centric KGs"? You propose a single method to do it, to me, to answer that you would need to compare different methods of utility evaluation. Following from my two previous remarks, I feel that what you did is a third component where you feed the event-centric KG to an LLM to provide the plausible answer to a question, and you are evaluating if this adds value over using the plain LLM. I would suggest to amend RQ3 accordingly.

* Even if the editor overrules my previous suggestion, section 4.3 lacks detail about the experiment setup (in stark contrast with the quantitative experiment). It is not how the prompts were constructed, we are able to reconstruct the base ones by looking a tthe forms, but we don't know how the context triples were provided. Fortunately we know if we inspect the repository, so first I suggest to replace the links to the forms to links to the versions in teh respository (that I presume more stable), and second to explain the rationale of the particular prompt style you used (which seems to amount to appending to the base prompt "you should consider the following context triples" followed by a serialisation of the KG)

* Section 3.3, the second component, should have an algorithm just like the other component. Furthermore, from Figure 5 and Table 2 I can't understand how this part can assign the correct Cause and Effect relationships (or any at all, as this is not something you extract from the events)

* I believe the paper would also benefit from discussion (that can be embedded in the conclusion) about
1) The types of questions that you can answer using the ECKG, because of the ontological constraint and the presence of cause-effect properties, I have the impression that this could be quite limited.
2) How an end user, that you seem to imply is not an expert, could set up the filters of your approach, or who does it based on what. In your experiments, you already know the predicates that relate to the SEM ontology, does identifying this is a pre-condition to apply your method to a different KG?

* On algorithm 1, you define as input start and end dates and a to_extract set, but they are not used in the algorithm

Review #3
By Dylan Van Assche submitted on 14/Mar/2025
Suggestion:
Major Revision
Review Comment:

# Summary

ChronoGrapher creates event-centric KGs from existing KGs like DBPedia or Wikidata. Through these event-centric KGs applications such as LLMs have better context about the events instead of a huge dataset which is hard to search.

# General comments

I really liked reading this paper, it is very clear, written nicely, and covers all the parts I look for in a paper!

## Originality, quality, importance, and impact
This work is of high quality by properly introducing the topic, listing extensively the State-of-the-Art, formulating research questions, explaining the approach and covering the evaluation.

## Clarity and readability
Reads really well, almost no typos, the authors put a lot of attention into these details which makes it pleasant to read.

## Provided data and its sustainability
The source code is available on GitHub. Ideally, this would have been linked to a DOI on Zenodo. A release tag and in-depth instructions are available in the README of the GitHub repository. The authors are very subtle about this in the paper, but did a good job there. This can be more highlighted with an additional sentence in the paper, for example: ‘ChronoGrapher and instructions to set up the experiments are available on GitHub under the GPL-v3 license.

# Detailed comments

## Introduction
The introduction is clear and has several research questions (RQs) listed. The proposed solution is immediately mentioned for each RQ. It would improve the introduction even further if a hypothesis is given for each RQ before proposing how this is addressed in this work to better understand why certain decisions are made in the work.

On page 3: ‘we include information extraction from text’. This is a little bit vague, could this be more highlighted what kind of information extraction with some examples and how it is performed? Maybe it is mentioned in the work, but I definitely missed it then.

## Related Work
‘adapt and improve existing work to harness the potential of event-centric KGs’ -> Further in the section it becomes clear what kind of existing work with references, but adding here some high-level methods would make this more concrete.

Linked-traversal based methods: I think Communica is missing here such as ‘Taelman R. et al., Link traversal query processing over decentralized environments with structural assumptions’ or ‘Eschauzier R. et al., How does the link queue evolve during traversal-based query processing?’

Could it be made more clear why EventKG is used as the golden standard?

Typos
- We therefore -> Therefore, we
- To assess the usefulness of event-centric KGs, we lastly present -> Lastly, we present …. to access … (brings the most important part of the sentence to the front)
- XML date -> XML data?

## ChronoGrapher

The algorithm is discussed here to perform informed graph traversal. The text points to the algorithm but the algorithm is only shown a few pages further. Ideally, this would come much earlier to make it easier to read and understand.

ChronoGrapher RQ2 uses Wikidata and DBPedia data:
- Is ChronoGrapher flexible enough to apply it to other datasets?
- What happens if the datasets are updated, do you need to reconstruct the event-centric KG from scratch or can it be updated incrementally with the changes? This seems important when building event-centric KGs of live sport events for example.

## Evaluation
The experiments were conducted on 2 different machines with different hardware according to the evaluation.

- Why use different machines?
- Which machines were used for which experiments?
- How to compare experiments ran on different machines?

Previously, the EventKG was constructed automatically (introduction) and here it is mentioned that it is constructed manually. Can you clarify this?
Also, how is the EventKG golden standard here constructed?

The evaluation uses HDT in particular, is there a specific reason for using HDT or can you use any format?

Several metrics are mentioned that would be computed in the evaluation such as F1, Precision and Recall. Could you clarify why these or how these metrics address the RQs?

For RQ3 ChatGPT is used, is there a specific reason why GPT-4 is used and not another LLM? Would be good to discuss this in the evaluation section why this LLM was used.

Typos

- 40 CPU -> 40 CPU cores
- A missing introduction paragraph between 4.2 and 4.2.1
- Footnotes 12 and 13 after ‘2 forms’ have a lot of white space. Maybe one footnote with 2 links in the footnote?

# Decision

Major Revision