Review Comment:
The manuscript “A Systematic Literature Review on RDF Triple Generation from Natural Language Text” presents a relevant subject for the Semantic Web community: RDF data generation from text. Below the authors can find some comments.
(1) Suitability as an introductory text, targeted at researchers, PhD students, or practitioners to get started on the covered topic.
Across the first two sections (1-Introduction and 2-The Triple Generation Problem), the authors introduce the reader to the topic and problem of RDF data generation from text sources and highlight the relevance of its treatment.
The problem formulation and the typical pipeline for triple generation and KG enhancement based on NL texts provide the appropriate context to understand the topic, and through the "Motivating Example" (section 2.2) the reader can clearly understand the complexity and challenges of the task.
------------------------------
------------------------------
(2) How comprehensive and how balanced is the presentation and coverage?
The review covers a broad and representative range of existing approaches for generating RDF triplets from text. The methodology used for the study was based on the work of Budgen and Bereton, a well-known guide to conducting literature reviews.
The search was made using the most popular databases in the Computer Science area, and the authors clearly explained what the search strategy was, and what inclusion and exclusion criteria they used to select the articles related to the interest of the survey.
The analysis of the literature was carried out in two phases. In the first phase, some bar charts are generated to summarize certain characteristics of the 150 selected papers, such as articles by year, category, domain, database, type of evaluation, and others. Then, in the second phase, a more in-depth qualitative analysis of the 15 articles that are most closely related to the purpose of the review is carried out.
Regarding the methodology and the results obtained, several aspects need to be reviewed/improved.
- Table 3: Expression 1 explains which terms were chosen to carry out the search; therefore, what is the need to detail all combinations of terms in Table 3? If it could be explained in the text.
- Figures 3-7: It is suggested to use more appropriate colors for the figures, such as neutral colors.
- On page 12 (Reporting phase), it is mentioned that the paper describes "correlations" in the literature. No evidence of this has been found in the paper. The only relationship between categories is the one found in Figure 8 (Venn Diagram). However, the intersections found in the categories of the 15 papers analyzed could not be considered as correlations. Regarding Fig. 8, it might be more useful to use a bubble diagram to better understand how many papers fit into each category or share categories. Another more appropriate option might be to use a table to describe the 15 papers and the value they have in each of the categories analyzed (language specificity, ontology and KG enhancement, domain, technical methodology, etc.)
- Figure 2 presents the 15 steps of the methodology that the authors have followed. Section 3 explains how each step was carried out; however, the results of the review (Reporting Phase) are described in sections 4 to 6 and their names do not correspond to the steps indicated in that figure. That is, section 4 should be called Statistical Analysis, section 5 should be Research Questions Analysis, and section 6 should be called Open Challenges. In this way, each of the steps of the methodology would be followed (described in Fig. 2).
- In the description of step 10 of the methodology (page 11), the authors refer to Figure 7. Rather, should Table 7 not be referenced? Regarding Table 7, it enlists metadata that are not later used to summarize the characteristics of the articles, for example, "Country -> nationality of the authors" and Methodology.
------------------------------
------------------------------
(3) Readability and clarity of the presentation.
Unfortunately, although the manuscript addresses a relevant topic, the organization and presentation of the results should be significantly improved to maximize their impact and clarity. Some points related to the methodology and results have already been commented on in the previous, additional:
- Section 5 describes each of the 15 articles, through different subsections, but this does not allow us to have a broader or complete view of the advantages and disadvantages of each technical method used to generate RDF graphs from text sources.
- Section 6 briefly answers the competency questions, but in several cases, it does not provide specific information that could be important to identify the most valuable works, according to some criteria. For example, in RQ-05 it is stated that "we observed that a portion of the studies in our survey are not currently employed in real-world applications." ¿With portion, do you mean 20, 50, or 90% of the papers analyzed? ¿Could you add the citations to identify them?
In general, the manuscript language used is appropriate, but it requires the revision of some points. For example, <“Reporting Phase.”. >, , , .
------------------------------
------------------------------
(4) Importance of the covered material to the broader Semantic Web community.
Through sections 5 and 6, the authors identify the limitations of the analyzed proposals, highlight relevant gaps in the literature, propose future directions that can guide the development of the field, and identify the research challenges. In addition, the authors could expand the discussion, referring to the practical applicability (implementation) of the methods in real environments and the concrete applications in which KGs created from text could be useful, thus reinforcing the importance of this process.
|