A systematic literature review and classification of approaches for keyword search over graph-shaped data

Tracking #: 3505-4719

Authors: 
Leila Feddoul
Frank Löffler
Sirko Schindler

Responsible editor: 
Mehwish Alam

Submission type: 
Survey Article
Abstract: 
Knowledge graphs provide machine-interpretable data that allow automatic data understanding and deduction of new facts. However, machines are not the only consumers of such semantic data. Human users could also benefit from graph-structured data by browsing and exploring it to detect interesting associations and draw conclusions. To achieve that, methods that allow for search over knowledge graphs are highly sought after. Keyword search is an intuitive and common way to retrieve relevant data (e.g., documents) and can also be leveraged to search over knowledge graphs. In this survey paper, we derive the typical architecture of a system for keyword search over graph-shaped data, we formally define the problem, we highlight related challenges, and we compare to existing relevant surveys to identify the gaps. We conduct a comprehensive review of studies dealing with the topic of keyword search over graph-shaped data (e.g., knowledge graphs) following a systematic method. Based on that, we derive and define different aspects for classifying existing works. We also give an overview about how those systems are evaluated and highlight possible future research directions.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Stefano De Giorgis submitted on 01/Sep/2023
Suggestion:
Minor Revision
Review Comment:

The paper presents a survey on the keyword search over graph-shaped data, and in particular knowledge graphs.
The paper is well written and well structured. The topic is presented in a clear way, and previous works are described in detail following a meaningful structure, focusing on methodologies and evaluation criteria.
In the followings I provide a general review according to Semantic Web Journal "survey paper" criteria, followed by a detailed review addressing some minor but specific issues.

---- General Review ----

(1) Suitability as introductory text:
The paper is well written, and, to the best of my knowledge, it covers the topic in a satisfying way, referring to previous relevant work and describing them properly.

(2) Comprehensiveness and balancedness in the presentation and coverage:
To the best of my knowledge relevant previous works (also already existing surveys) are mentioned and it is clearly stated what is added and why a new survey is necessary. Furthermore, a specific section addresses the criteria followed in inclusion-exclusion decision.

(3) Readability and clarity of the presentation:
The structure of the paper is clear, while some minor refinement could improve the overall impact, more details in the "Detailed Review" section.
The references already include all the DOIs and they seem properly formatted.

(4) Importance of the covered material to the broader Semantic Web community:
Graph investigation and querying methodologies are clearly on scope for the Semantic Web Journal.

---- Detailed Review ----

Page 3, Section 2: The Preliminaries Section includes formal definitions such as "Graph model", "Keyword query", etc. but according to point 1 of the Semantic Web Journal "Survey paper" I think it would benefit a lot of some practical examples. Including examples in particular about Keyword query and Query answers could clarify doubts to newcomers of the domain.

Page 4, line 6-11: this list is not necessary, being this a "Survey paper", it is not necessary to declare what are good general practices to conduct a survey research.

Page 5, line 20: "already read before..." --> I would not say this, at least not in this way. You can't include some work because "you have already read it", it has to be relevant for the domain or for other works, therefore I would delete this sentence and state that they were relevant and explicitly mentioned in following works which were building on them.

Page 6, line 24: "Profile-based": I don't understand this point, could you rephrase it or elaborate on it, please?

Section 5, starting at page 7, line 22: for the whole section you use "they" for "the authors". It is used often and several times in the same paragraphs as "they state...", "they present...", etc. I think the work would benefit by substituting all these "they" with periphrases such as "the authors state that...", "the work presents a...", "the paper includes...", etc. This has to be done for the whole section, since, for example, on page 9, "They" is used at line: 4, 6, 7, 9, 11, 12, and 13. In my opinion it is worth some polishing and rephrasing to improve its readability.

Page 14, line 36: "a more relaxed method" --> "more relaxed" than which one?

Page 15, line 36: "...the baseline using all metrics." --> "...for all metrics" or "considering all metrics"?

Page 16, line 40: "...within an available ontology." --> I think that with "ontology" here you mean the ontological data schema, but I would be careful in using this term. There are still open debates in the semantic web community about what is a "knowledge graph" and what is an "ontology" and if the difference is only in the presence or absence of a schema. If this is the meaning in which it is used, my advice is to state it clearly in Section "Preliminaries", or at least in the same place where it is used, or as a footnote, to prevent any (malicious or sincere) misunderstanding.

Page 19, line 37: the footnote should go next to "Jamendo", since it is related to this element of the sentence.

Page 21, line 6: if you set a standard for which all the paragraphs start with the work cited as "[number]", please be consistent and use it for all the entries.

Page 21, line 40: "(sorted...)" remove the parentheses, they are unnecessary.

Page 21, line 44: "ontology" --> see comment before about "ontology" vs "knowledge graph" debate

Page 23, line 49: "the number of queries is larger when they..." --> maybe a couple of words could be spent commenting this datum, e.g. "it is reasonable since text generation (by domain experts or generic users) is a costly activity...etc.

Page 24, line 4: Move the whole "Metrics" paragraph to Section 4, these measures should be introduced properly before using them.

Review #2
Anonymous submitted on 08/Sep/2023
Suggestion:
Major Revision
Review Comment:

This submission is a survey paper that discusses the research on keyword based search over graph-shaped data including knowledge graphs. The authors describe and discuss the main components of keyword search over graphs and then proceed to present a taxonomy-based classification and elaborate review of the existing literature in this field of research.
The paper is quite well-written and well-reviewed overall, no grammatical or spelling mistakes, as such it is a nice read. Though it becomes a bit tedious in Section 5 which gives long and detailed reviews of all existing works in different categories.

Pros

- Well-written paper
- The proposal of a suitable taxonomy for the classification of existing works.
- A useful and comprehensive review of 35 existing works on keyword based graph search.

Cons

- Needs restructuring of Section 5 with the review of existing work to improve readability.

Detailed comments :

- Section 1 and 2 provide a nice overview of the topic to a new reader. The comparison and advantages over existing surveys are a great inclusion.

- Section 3: Survey methodology seems a bit redundant, unless there is something very specific here that was done differently from other survey papers, this section could just be a paragraph - it is just a suggestion, not criticism.

- Section 4: Taxonomy is well-defined, though it is not made clear to what cases Answer ranking and Answer scoring apply to. If these criteria do not apply to all Answer Types, would there be a way to indicate this in the taxonomy and its visualization?

- Section 5: Here is the major and the problematic section of the paper. While it is appreciated that the authors have provided the reader with detailed and structured overviews of all the 35 works that have been considered, this section needs to be reduced in length (shorter texts for the overviews), or the works could be further partitioned into subcategories in order to improve readability.

- Furthermore, while discussing the different papers within a subsection, it is advisable to have a connection or transition between the papers, this is present in many places such as page 8, but also missing in many places, such as page 9 where there are abrupt jumps to the papers and they feel disjoint. The subsections also need introductory sentences before jumping into the discussion of works under them. Overall, this section could benefit greatly from a restructuring.

- Section 6:Summary starts with good insights, however, the discussion on page 22 about the count of the different papers with different features seems to be without any take away or useful observation.

- Section 7:Conclusion is well-written with interesting insights, relevant citations should be added to subsections 7.5 (line 27) and 7.6 (line 35) to make the claims stronger.

Review #3
Anonymous submitted on 30/Sep/2024
Suggestion:
Major Revision
Review Comment:

In this paper, the authors propose a survey of the field of keyword search over graph-shaped data. The need for such a survey is clearly motivated by the four existing surveys mentioned and that are either outdated or only focusing on specific aspects of keyword search. Overall, the survey is precise in describing related works and also offers convenient summary tables. The authors also identify possible future research directions that are sound w.r.t. works described. Such a survey is also quite important given the rising interest in RAG and GraphRAG, which could leverage techniques presented here.

However, I have the following concerns:

- Structure:
>> in my opinion, the structure of this survey could be improved. For example, works are detailed before giving a summary, and also before introducing the evaluation metrics and datasets. I would advise to provide evaluation metrics after taxonomy overview, then a global summary, and then the detailed description of related works. Otherwise, subsequent sections are often referred from Section 5.

>> The paragraph positioning this survey w.r.t. other existing surveys in the introduction could be placed and detailed in Section 3 while retaining only a short summary in introduction. Otherwise, Section 3 is often referencing Section 1.

- Content:
>> The survey focuses on graph-shaped data but some work related to relational databases are mentioned. These are relevant but I think additional explanations are needed to better position them w.r.t. graph-based data. An major information that is little visible is the types of graph that are considered by approach. This comes only in Section 6 and is little discussed.

>> Section 4: taxonomy - I think this section in particular could be enhanced by examples, figures, and additional details to make the survey helpful for newcomers to the field. As it is now, I think it requires additional reading and background knowledge to get a clear understanding of the field, which hinders somehow the impact of the work.
For example, you could illustrate the graph, different types of tree, structure query, and some other types by giving examples. I also wonder about the link and difference between answer ranking and answer scoring, where ranking does not seem to rely on the scores provided.

>> I would also recommend authors to discuss how keyword search over graph-shaped data could be relevant for RAG and GraphRAG, as they already mention LLM.

- Minor remarks

>> p2: "This technique could also be adapted to work over KGs" - I found the tense strange as this survey describes such an adaptation

>> p4: "Since the identification of the identification of the need for the review [...] in more detail in the following:" - this sentence is quite long and difficult to understand.

>> I wonder if you could give additional statistics about most common reason(s) for excluding papers, that allow you to move from 206 to 35 papers to review

>> footnote 7: "This approaches aim to diversity" -> diversify