Review Comment:
This article can be characterised as a survey of research on knowledge representation for fault diagnosis in the context of cyber physical systems. The authors perform a systematic literature review, describing the methodology in detail with helpful visualisations showing the process and showing how the collection of papers to be reviewed is obtained. Five research questions are identified, with the article intending to provide answers based on the reviewed literature. These questions pertain to knowledge acquisition, knowledge extraction, knowledge representation, fault diagnosis, and knowledge enhancement. Many questions refer specifically to industry applications. Acquisition and extraction are contrasted by characterising acquisition as expert-sourced, whereas extraction is characterised as document-sourced. Criteria for inclusion and exclusion of papers are explicitly detailed, but the process of analysing the individual papers is not. Some interesting statistics are provided based on the selected papers, although I miss an analysis of the sources of the selected papers since no explicit source-based filtering---language constraints of course implicitly remove non-English sources---was performed. The analysis is performed for the five research question areas in sequence, yielding paper summaries and categorisations of varying quality. This is followed by a discussion which provides a summary of summaries, forming the basis for some conclusions regarding the state of the research field and recommendations for future work. The research questions are never answered explicitly. The conclusions summarise the authors' work and summarise the future work recommendations from earlier.
Major comments
--------------
The article is not ready for publication and needs at least another iteration to repair its shortcomings. I have three main criticisms, which I will go through now, followed by minor comments to help improve the quality of the writing.
1) Deficiencies in the paper selection methodology
As mentioned earlier, the reported methodology for this systematic literature review provides a detailed account of the choices made when selecting papers, often using helpful figures in the process. My issue is therefore not with the reporting but rather with the justification of choices made and an analysis of the consequences of those choices, both of which are lacking. The initial paper search was performed with the string visualised in Figure 3 applied to Scopus. I believe the selection process is too narrow in some areas and too broad in others.
According to the authors, 'NLP' is included in the search string because "[i]n some papers, NLP serves as a substitute for keywords related to information extraction" (p. 6, l. 25). Here an explicit exception is made based on the authors' knowledge of the field. This is not a bad thing, but why is this exception not also applied to tailor the search string to capture more reasoning methods? After all, one of the article's main takeaways is that "[l]ogical, explainable, and probabilistic reasoning methods over KGs would benefit the fault diagnosis research community" (p. 33, l. 48--49), and this takeaway appears to be based on a lack of such reasoning methods being reported in the selected papers. If a term like 'Description Logic Reasoning' were added, the takeaways may have been very different. Currently, the term 'Description Logic' does not even occur in the article, which is strange because description logics are the formal basis for much of ontology-based reasoning. I do not think a Semantic Web Journal article can forget to at least mention description logics---the term does not occur anywhere in the article---when discussing reasoning methods for ontologies and knowledge graphs, let alone then also recommend the development of those methods as future work. The source of the aforementioned issue appears to lie at least in part in the handling of related work. A literature review should indeed focus on the papers identified and selected using the selection methodology. However, this article seems to at times disregard research that is fundamental to the field but that were not selected, for example because they did not explicitly discuss fault diagnosis or one of the other keywords listed under Goals in Figure 3. Being too narrow and excluding such related work can give a skewed view of the state of the art.
Conversely, I believe the selection criteria may be too broad by not considering the sources of the papers selected. The only criterion seems to be that the paper can be found through Scopus. There is no distinction between flagship conference/journals and regional conferences or predatory journals; as long as the paper is written in English it may be included. This may be too permissive. Initiatives like ICORE can help classify conferences. Even if the authors choose to not exclude any conferences or journals, there should at least be an analysis showing the distribution of the selected papers, and a justification that they are representative of the sought industry perspective.
2) Lack of a overarching methodology for reviewing individual papers
There does not appear to be an overarching structured approach to reviewing the individual papers that were selected. The introduction does indicate that the intention was to provide categorisations and in-depth analyses of methods covered, and that the reporting would be done in terms of advantages/disadvantages with the help of figures and tables to facilitate a comparison. The goal is for readers to be able to use these results to guide their choice of method in their own research. The resulting summaries, categorisations, figures, and tables should then ideally follow a similar format, but they do not.
The summaries of the analysed papers do not follow a predetermined pattern. In some cases the summaries provide sufficient detail to be able to understand what the paper was about, but in others there is a clear lack of context or even authorship information (e.g., 'Paper [n]'), making it impossible to follow the summary without having read the paper first. For example, the KG model paragraph (p. 20, l. 13--31) suddenly refers to some 'shop floor' without providing context. Sometimes the summaries are too vague; for example the "Application of the ontology" paragraph (p. 19, l. 14--26) states that one paper constructed an ontology to serve as a corpus, which does not say much. The categorisations are nicely visualised in the figures, albeit in different styles, but the categories are not necessarily used or contrasted. The tables have different structures and it is not clear how one would use them when choosing a method. Not all approaches are analysed with advantages and disadvantages (or only state advantages, like Bayesian Networks in Table 12), and the ones that are do not provide contrasting information to facilitate such a choice. Some tables focus on counting papers (even though sometimes the counts are wrong, such as in Table 8), which is interesting but offers more of an overarching analysis rather than information that could be used to facilitate a choice.
3) Inaccuracies in summaries and conclusions
The article tends to focus exclusively on the papers that were selected, which leads to strange conclusions when extrapolating to the field as a whole. I already mentioned the issue with description logic reasoning, for example. Inaccurate or unfounded conclusions can be found in several places. For example, when discussing querying under semantic-based methods for diagnosis, the authors reason as follows: "Notably, none of the reviewed studies explicitly discussed reasoning mechanisms applied during the query process, such as query rewriting or logical inference. Therefore, it can be concluded that this approach primarily serves as an information retrieval technique rather than a reasoning-based method." (p. 23, l. 48--51). This is not necessarily true. SPARQL is one of the query languages listed. While SPARQL does not perform description logic inference, it is not uncommon for SPARQL queries to be executed after a materialisation step, during which such inferred RDF triples are made explicit.
On several occasions I have been surprised by a paper summary and/or its conclusions because they do not make sense, possibly because the authors misunderstood the papers they were summarising. This misreporting unfortunately lowers my confidence in all of the summaries. While I am by no means a Machine Learning expert, Section 4.2 (Information Extraction) seems to be particularly impacted in this regard and should be rewritten; some examples are as follows:
"While there are no quantitative results regarding the effectiveness of each method, the following qualitative findings have been noted. (...) Azari et al. [2] mentioned that Apriori algorithm (applied in paper [73]) is easy to implement but requires large resources and a long time because it must scan the dataset repeatedly."
=> Apriori is a well-known data-mining algorithm for which the time and space complexity are known.
"The use of BERT [31] or ALBERT [30] with/without BiLSTM or CRF is common in the literature for entity-relation extraction tasks [56, 77--79]."
=> I believe ALBERT is a BERT. Neither BiLSTM or CRF are defined. They are not common 'in the literature' but rather in the authors' selected papers. One cannot extrapolate to the (entity-relation extraction) literature in general when the papers were narrowly selected.
"Techniques such as attention mechanisms [19] and reinforcement learning [11] can help address challenges such as ambiguity and noisy input respectively."
=> This is extremely vague. The reference to the attention mechanism does not make much sense. Presumably this is a BiLSTM with a transformer architecture to do fault detection over time? I did not have time to check. The references also imply that [19] is the primary source of attention and [11] is the primary source of reinforcement learning, which they are obviously not.
"Gong et al. [27] reported that the BERT joint extraction model achieves a better F-measure compared to the BERT+BiLSTM+CRF model for relation extraction."
=> This does however not apply in the general case, as is implied here. The context is important to report on as well.
"Techniques leveraging BERT, such as fine-tuned BERT [42], pre-trained BERT [68, 76], BERT-BiLSTM-CRF [14, 57], BERT-Decoder-CRF [66] yielded good precision and F-measure. However, reliance on labeled datasets raises challenges."
=> Also extremely vague, but my main issue is with the reporting on the models. I could be wrong, but the authors seem to have listed the terms as they occur in the papers without necessarily understanding what they mean. For example, to the best of my knowledge, fine-tuned BERT is not really a single thing, as models are commonly fine-tuned before scoring; and pre-trained BERT is BERT, since BERT is a pre-trained model.
"Entity-relation extraction using a combination of models such as SP-LEAR-BERT-MRC [58] outperform baseline models."
=> Context matters; this is not true for all datasets and depends on the baseline models. This sweeping statement is therefore inaccurate.
In conclusion, I do not think that the article provides a "comparative analysis [that] empowers researchers to make informed decisions about which approach to adopt in future studies" (p. 3, l. 38--40), which was listed as a key contribution. Its summarisation and contrasting analyses are insufficient to perform such a role. It is also not entirely clear to me why there was a focus on industry applications in the research questions if the purpose of the review was to assist in future research. The research questions are not answered explicitly either. I do think the presentation is decent, especially with the reporting on the methodology used, even if I disagree with some of those decisions. When it comes to importance to the broader Semantic Web community, I am uncertain. The term Semantic Web occurs only once (in the abstract; "Our findings aim to guide researchers and practitioners in leveraging semantic web technologies for more robust and explainable fault diagnosis in CPS."), and is missing from the search string, but the review does cover Semantic Web approaches without explicitly stating so. This is part of a recurring problem; if a term or technique does not occur in the selected papers, it does not seem to exist for the purpose of the article. Perhaps the search string is the issue. Either way, the article is first and foremost a Fault Diagnosis paper, from which some Machine Learning and Semantic Web techniques flow. However, I believe this is an issue of presentation that could be resolved, although only through a major rework.
Additional comments
-------------------
- Excessively flowery language maximising the use of positive adjectives feels artificial but also sometimes assigns value or importance, which should be left to the reader to decide. For example, the authors write: "Moreover, this study provides a *thorough* discussion and *clearly* shows open issues that have not been sufficiently studied and also offers *comprehensible* results by providing *clear* insights into the issues. These *distinctive* features make this review paper a *valuable* resource, providing insight and a foundation for future research." (p. 3, l. 43--47, emphasis added), whereas this reviewer disagrees.
- Please write out counting numbers ten or below, i.e., 'three times', not '3 times'.
- All acronyms should be explained when they first occur.
- Faults refer to machine breakdowns (p. 5, l. 28), but this could be made more explicit since it is a restriction.
- The Goals segment of Figure 3 contains a special case "prognostics" AND "fault" but it is not explained why.
- The quality screening in the review execution (Section 3.2) is not explained well. There is a mention of Chinese text but I assume the authors more specifically exclude non-English papers.
- The definition of CPS is unclear but important for paper selection.
- Forward snowballing (Figure 6) selects papers with at least two references; are these allowed to be self-citations?
- Table 3: The categories do not match the ones in the text.
- Figure 7: Arrow to Output missing.
- Figure 9: Arrow to Bayesian network missing.
- Section 4.4.3 describes statistical methods as being distinct from other approaches because they "rely on mathematical calculations to determine the similarity between the new problem and past problems stored in the knowledge base." This is a questionable definition at best.
- The reference listing needs to be tidied up.
Minor comments
--------------
- P. 3, l. 48: knowledge. [51]. => knowledge [51].
- P. 3, l. 26: Systematic Literature Review (SLR) approach => Systematic Literature Review (SLR) [35] approach
- P. 3, l. 44: comprehensible results => results
- P. 4, l. 15: valuable insights => insights
- Table 1 caption: need to be => are
- Table 1 RQ4: How faults can be => How can faults be
- P. 6, l. 46--47: While we (...) service engineer. => [Incomplete sentence]
- P. 10, l. 27: using Ontomap mapping language => using the Ontomap mapping language
- P. 11, l. 18: [65] Wang et al. [65] => Wang et al. [65]
- P. 11, l. 26: [73] and etc => [73], etc
- Table 4: doesn't => does not (x2)
- Table 6: Airbaus => Airbus
- P. 16, l. 42: that serves as the foundation for fault diagnosis => [Remove]
- P. 17, 31: Ontology => Ontologies
- P. 19, l. 14: shows that how => shows how
- P. 21, l. 1--3: Reviewing the papers (...) in table 10. => [Remove; duplicate]
- Section 4.3.4 header text: Advanced Representations => [Remove]
- P. 24, l. 34: Tieming et al. [59] established fuzzy sets => Tieming et al. [59] used fuzzy sets
- P. 25, l. 9--11: The proposed reasoning (...) handling uncertainty. => [Rephrase]
- P. 26, l. 2: Machine Learning (ML) or Deep Learning (DL) => Machine Learning (ML) and Deep Learning (DL)
- P. 31, l. 40: The Naqvi et al. => Naqvi et al.
|