INK: Knowledge graph representation for efficient and performant rule mining

Tracking #: 3387-4601

Authors: 
Bram Steenwinckel
Filip De Turck
Femke Ongenae

Responsible editor: 
Guest Editors NeSy 2022

Submission type: 
Full Paper
Abstract: 
Semantic rule mining can be used for both deriving task-agnostic or task-specific information within a Knowledge Graph (KG). Underlying logical inferences to summarise the KG or fully interpretable binary classifiers predicting future events are common results of such a rule mining process. The current methods to perform task-agnostic or task-specific semantic rule mining operate, however, a completely different KG representation, making them less suitable to perform both tasks or incorporate each other’s optimizations. This also results in the need to master multiple techniques for both exploring and mining rules within KGs, as well losing time and resources when converting one KG format into another. In this paper, we use INK, a KG representation based on neighbourhood nodes of interest to mine rules for improved decision support. By selecting one or two sets of nodes of interest, the rule miner created on top of the INK representation will either mine task-agnostic or task- specific rules. In both subfields, the INK miner is competitive to the currently state-of-the-art semantic rule miners on 14 different benchmark datasets within multiple domains.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 01/Apr/2023
Suggestion:
Minor Revision
Review Comment:

Review:
Thank you for addressing the previous issues with the paper. The revised version is much better written and easier to understand. However, the paper's contribution and significance still need to be clarified.

Writing:
The writing in the revised version is much better than the original version. 

Originality and Significance:
As noted in the previous review, the proposed method is not original (considering you mentioned this in the introduction of the paper already), and the reported results and baseline comparisons need to demonstrate a significant improvement over existing methods. There is still considerable overhead regarding time and memory, and the feasibility of applying the proposed methods over larger datasets still needs to be determined. To improve the paper's originality and significance at this stage, I would suggest you discuss the limitations and future work in more detail. It is better to mention how the size of the datasets affects the overall performance of the proposed method and get more in-depth with the cases where the baselines perform significantly better than the INK.

Overall, the paper can be improved by addressing the issues raised in this review. I look forward to seeing the revised version.

Review #2
By Sahar Vahdati submitted on 17/Apr/2023
Suggestion:
Accept
Review Comment:

The authors addressed the concerns I had and edited the manuscript properly. I believe the community will benefit from this work and suggest its acceptance.

Review #3
Anonymous submitted on 19/May/2023
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

I would like to thank the authors for the appreciated efforts that have been made for improving their paper significantly overall.

Particularly appreciated it has been the the extension of section 3 and section 4 that currently helps a lot in making the main contribution of the paper more clear as well as a more extensive discussion of the experimental results.

Whilst some concerns related to the advance on the state of the art remains, I confirm that the experimental results appear promising now accompanied by rather improved version of the paper.

Nevertheless some aspects can still be improved. Details are reported in the following.

- Original Comment "Additionally, the approaches reported are significantly inspired by the ILP literature but they do not solve task-specific problems even if they can be applied also for that. As such, also the consideration concerning the second category of approaches introduced in section 1 results not very precise."" --> Reply and action from the authors: The authors agree that the techniques suggested by Reviewer 3 are influenced by the ILP domain and can be used to solve some task-specific problems. We added them to a separate subsection at the end of our related work. " The advancements within the ILP domain also resulted in task-agnostic
techniques that use the available schema information within the knowledge graph to mine generic rules [21]. They can even be used for scheme completion or find faults within this schema level [22]. Those techniques are not optimized to solve task-specific problems, but can be applied for this when limiting, e.g., the search space to a specific predicate" The claim on page 2, 1st column, row 37 -51 is it still a too strong. It should be changed in agreement with the modification reported in the related work section.

- page 2, 2nd column: "The combination of these novel and adapted techniques into one framework, combined with the explainable
capabilities of the INK representation led to new task-specific and task-agnostic rule mining capabilities without the need to change the KG internal representation." --> whilst the representation is now clearly provided, this sentence results still a bit vague.

- Beginning of section 2: "ILP techniques deduce logical rules from ground facts and require negative statements as counter-examples. Both techniques were already applied in the context" --> this is not fully correct AMIE as well as other solutions inspired to ILP and already suggested in the first review round do not need counter examples. This is partially addressed in section 2.3 but it needs to be made clear before otherwise the overall message get confusing.

- Methods listed in sect. 2.2 actually perform concept learning. If you would like to keep this section because you can view these methods as a way for learning hypothesis you need to make clear that they do not actually learn rules as defined in section 2.1.

- Section 6.2: The argument on why having non-closed rules would be a value added remains vague.

- I find the reply to comment 11 a bit vague e non really related to the comment provided in the first round of review.

MINOR:

- page 2, 2nd column, row 31: "from within the" --> "from the"

- page 8, 2nd column, row 41, 42: "Frequent itemsets defined by the patterns defined in (11) and (12) might be harder to calculate" --> defined is repeated twice