Scalable Knowledge Representation for Fault Diagnosis of Cyber Physical Systems: a Systematic Literature Review

Tracking #: 3831-5045

Authors: 
Ameneh Naghdipour
Benno Kruit
Jieying Chen
Stefan Schlobach

Responsible editor: 
Eva Blomqvist

Submission type: 
Survey Article
Abstract: 
Fault diagnosis in Cyber-Physical Systems (CPS) is essential for minimizing downtime, ensuring operational safety, and improving system resilience. As CPSs become increasingly interconnected and complex, traditional diagnostic methods struggle to capture their dynamic interactions and dependencies. Semantic technologies, including knowledge graphs and ontologies, offer a powerful solution by enabling structured representation, integration, and reasoning over diverse sources of diagnostic knowledge. This paper provides a comprehensive review of semantic approaches for fault diagnosis in CPS through a Systematic Literature Review (SLR). It covers key stages such as knowledge acquisition from domain experts, knowledge extraction from documents, semantic modeling of domain knowledge and data, and model enhancement. To the best of our knowledge, no prior systematic literature review has covered all these critical aspects. Unlike previous reviews, we systematically analyze and categorize the findings related to each stage. Additionally, we explore the role of available manufacturing data sources and their integration with semantic models. By bridging the gap between fault diagnosis and semantic technologies, this work highlights the potential of semantic representations to enhance interpretability, interoperability, and automation in CPS fault detection. We further discuss open challenges and outline future research directions, emphasizing the role of semantic frameworks in advancing intelligent fault diagnosis. Our findings aim to guide researchers and practitioners in leveraging semantic web technologies for more robust and explainable fault diagnosis in CPS.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 06/Jan/2026
Suggestion:
Major Revision
Review Comment:

This paper presents a Systematic Literature Review (SLR) covering papers published from 1995 to 2024 in the area of knowledge representation for fault diagnosis of CPS. Overall, the paper is well-written, easy to follow and interesting. My main concerns are detailed in the comments below. Addressing them would strengthen the paper and prepare it for publication.

Introduction: The section motivates the work and some background. I wonder if sub-sectioning it as the authors have is actually useful for the reader. It might be better if the text is more integrated and less fractured. Currently, it is formatted as a report rather than a scientific paper.
The research approach and key steps should be better highlighted (e.g. use of bullet points for each step, enumerate, bold font). Similarly, the concrete contributions should be noted. Who is this survey benefiting? How can it be used to aid others' research/work?

Related Work: The section is sufficient since the paper is a survey itself, and later on discusses other papers.

Methodology: The selected SLR is justified. The authors can provide more information on what motivated each research question. Was the related work section helping to form them, or are they driven by a project, a challenge, etc? It would have been interesting to see some statistics about which combination of keywords resulted in discovering the most relevant papers. Are all keywords from Fig. 3. input simultaneously in Scopus, row by row or following a specific combination? How can one reproduce the search?
There should be clarification at the beginning of the paper on what CPS domain the authors cover, since on line 47, the medical domain is excluded. For example, is the main focus manufacturing?
A lot of survey papers cover more than one scientific database. The justification is "Scopus was chosen because of its extensive collection of peer-reviewed journals and strong reputation as a comprehensive scientific research database." While this is true, it raises the question of possible papers/work that the authors might have missed if it had not been indexed there. Further to this, it should be clarified what type of work was surveyed - only scientific papers or also reports and whitepapers?
Clarify if the papers were manually analysed or if software was used to aid this task.
The difference between failure prediction, fault diagnosis and predictive maintenance should be discussed earlier on so the scope of the surgery is much clearer.
.

Results and Analysis: The section is quite informative and interesting. It would be helpful if Figure 7 and other such figures (e.g 10) also include references as examples in each box.

Each table is also described, and conclusions are presented. However, at times these conclusions seem to only be repeating the content from the table and lack deeper analysis. For example, why "The use of BERT [31] or ALBERT [30] with/without BiLSTM or CRF is common in the literature"? Such statements should be better elaborated.

The structure of the tables is somewhat logical. However, I wonder if it could be more structured with columns specifying criteria that can be seen as a pro or con? I leave this to the authors to decide.

p. 17, lines 42-43: "Ontology has been widely used in the domain of fault diagnosis. Several key reasons justify the widespread use of this: - provide several references to such work as an example.

p. 19, line 20: "9 studies" - reference studies here so the reader can directly make a mental connection. Similarly, for the rest of this paragraph.

p. 19, lines 48-49: Missing references to reinforce this statement.

Discussion: This section summarises and discusses findings, but is missing references to concrete work that motivated the statements. The content across all sections should be more balanced. Now, section 5.4 is significantly longer and more detailed than the rest of the sections.

p. 30: "Future work should explore innovative techniques for capturing expert knowledge for fault diagnosis." Such as?

Conclusions: The limitations of the work should be discussed. This section can also reiterate who this survey is aimed at and how it will help them.

There is no mention of generative AI and how it has affected the field. This becomes relevant as the authors aim to cover over 20 years of research work in SLR format.

Minor comments:

The reference style should be consistent throughout the paper. At times, the authors do not even name the authors of the papers they cite, so the referencing style should be checked.

Missing definition of most of the terminology. All abbreviations should be spelt out when first mentioned. There are a lot of unknown terms and names of methods/approaches that the reader is left to figure out on their own.

All tables should have consistent formatting.

Review #2
Anonymous submitted on 09/Mar/2026
Suggestion:
Minor Revision
Review Comment:

**Overview**
The article provides an overview of knowledge representation for fault diagnosis in cyber-physical systems (CPS). The survey is structured around multiple stages of knowledge representation, namely acquisition, extraction, representation, diagnosis, and enhancement. Each of these stages is discussed in detail, with the authors reviewing relevant studies and methods, as well as identifying trends, challenges, and research gaps. Overall, the paper presents a structured and informative synthesis of the topic.

**Suitability as introductory text:**
The article serves well as an introductory text for newcomers to the domain. It provides a broad overview of the field and highlights key approaches and challenges, making it accessible to readers who are not yet familiar with the topic.

**Presentation and coverage**
The presentation and coverage are generally comprehensive. However, the definition of CPS used in the paper appears relatively narrow and is not explicitly clarified at the beginning. This may lead to some confusion for readers, particularly given that broader definitions of CPS exist in the literature (e.g. Müller, 2017 https://ieeexplore.ieee.org/document/8220372). It would be beneficial to clearly define the scope of CPS as intended in this survey early in the paper.

More generally, while the overall goal and focus of the survey become clear as the reader progresses, a more concise and explicit definition of key terms at the beginning would improve clarity. This would also help justify certain methodological decisions made later in the paper.

For example, on page 5, a fault is briefly defined as a machine breakdown. This is a rather restrictive definition, as faults can also manifest in other ways, such as degraded output quality, increased waste, or abnormal operating conditions (e.g., overheating), which do not necessarily correspond to complete breakdowns. Clarifying or broadening this definition would strengthen the conceptual foundation of the survey.

Similarly, the statement that certain domains (e.g., infrastructure, power systems, civil engineering, transportation) do not qualify as CPS may be misleading without further justification. While it is reasonable to narrow the scope of the survey to industrial or manufacturing systems, this restriction should be explicitly stated and motivated, as many established definitions of CPS would include these broader domains.

Regarding the methodology, the paper selection is limited to a single database (Scopus). This may introduce bias or limit coverage, and the authors could strengthen the study by either justifying this choice more clearly or incorporating additional sources. Additionally, the exclusion criteria do not specify whether workshop papers are included or excluded. Clarifying this point would improve transparency and reproducibility.
Readability and clarity of the presentation:

The article is generally well written, and the structure is clear. The organization along the different stages of knowledge representation helps guide the reader and supports a coherent presentation of the material.

**Importance to the broader Semantic Web community:**
The topic is important, particularly for the application of Semantic Web technologies in industrial contexts. While the paper may not be equally relevant to all researchers in the Semantic Web community, it represents a valuable contribution for those working on applied or domain-specific use cases, especially in industrial and manufacturing settings.

**Please also assess the data file provided by the authors under “Long-term stable URL for resources”:**
No data file was provided.

Review #3
Anonymous submitted on 10/Mar/2026
Suggestion:
Major Revision
Review Comment:

This article can be characterised as a survey of research on knowledge representation for fault diagnosis in the context of cyber physical systems. The authors perform a systematic literature review, describing the methodology in detail with helpful visualisations showing the process and showing how the collection of papers to be reviewed is obtained. Five research questions are identified, with the article intending to provide answers based on the reviewed literature. These questions pertain to knowledge acquisition, knowledge extraction, knowledge representation, fault diagnosis, and knowledge enhancement. Many questions refer specifically to industry applications. Acquisition and extraction are contrasted by characterising acquisition as expert-sourced, whereas extraction is characterised as document-sourced. Criteria for inclusion and exclusion of papers are explicitly detailed, but the process of analysing the individual papers is not. Some interesting statistics are provided based on the selected papers, although I miss an analysis of the sources of the selected papers since no explicit source-based filtering---language constraints of course implicitly remove non-English sources---was performed. The analysis is performed for the five research question areas in sequence, yielding paper summaries and categorisations of varying quality. This is followed by a discussion which provides a summary of summaries, forming the basis for some conclusions regarding the state of the research field and recommendations for future work. The research questions are never answered explicitly. The conclusions summarise the authors' work and summarise the future work recommendations from earlier.

Major comments
--------------

The article is not ready for publication and needs at least another iteration to repair its shortcomings. I have three main criticisms, which I will go through now, followed by minor comments to help improve the quality of the writing.

1) Deficiencies in the paper selection methodology
As mentioned earlier, the reported methodology for this systematic literature review provides a detailed account of the choices made when selecting papers, often using helpful figures in the process. My issue is therefore not with the reporting but rather with the justification of choices made and an analysis of the consequences of those choices, both of which are lacking. The initial paper search was performed with the string visualised in Figure 3 applied to Scopus. I believe the selection process is too narrow in some areas and too broad in others.

According to the authors, 'NLP' is included in the search string because "[i]n some papers, NLP serves as a substitute for keywords related to information extraction" (p. 6, l. 25). Here an explicit exception is made based on the authors' knowledge of the field. This is not a bad thing, but why is this exception not also applied to tailor the search string to capture more reasoning methods? After all, one of the article's main takeaways is that "[l]ogical, explainable, and probabilistic reasoning methods over KGs would benefit the fault diagnosis research community" (p. 33, l. 48--49), and this takeaway appears to be based on a lack of such reasoning methods being reported in the selected papers. If a term like 'Description Logic Reasoning' were added, the takeaways may have been very different. Currently, the term 'Description Logic' does not even occur in the article, which is strange because description logics are the formal basis for much of ontology-based reasoning. I do not think a Semantic Web Journal article can forget to at least mention description logics---the term does not occur anywhere in the article---when discussing reasoning methods for ontologies and knowledge graphs, let alone then also recommend the development of those methods as future work. The source of the aforementioned issue appears to lie at least in part in the handling of related work. A literature review should indeed focus on the papers identified and selected using the selection methodology. However, this article seems to at times disregard research that is fundamental to the field but that were not selected, for example because they did not explicitly discuss fault diagnosis or one of the other keywords listed under Goals in Figure 3. Being too narrow and excluding such related work can give a skewed view of the state of the art.

Conversely, I believe the selection criteria may be too broad by not considering the sources of the papers selected. The only criterion seems to be that the paper can be found through Scopus. There is no distinction between flagship conference/journals and regional conferences or predatory journals; as long as the paper is written in English it may be included. This may be too permissive. Initiatives like ICORE can help classify conferences. Even if the authors choose to not exclude any conferences or journals, there should at least be an analysis showing the distribution of the selected papers, and a justification that they are representative of the sought industry perspective.

2) Lack of a overarching methodology for reviewing individual papers
There does not appear to be an overarching structured approach to reviewing the individual papers that were selected. The introduction does indicate that the intention was to provide categorisations and in-depth analyses of methods covered, and that the reporting would be done in terms of advantages/disadvantages with the help of figures and tables to facilitate a comparison. The goal is for readers to be able to use these results to guide their choice of method in their own research. The resulting summaries, categorisations, figures, and tables should then ideally follow a similar format, but they do not.

The summaries of the analysed papers do not follow a predetermined pattern. In some cases the summaries provide sufficient detail to be able to understand what the paper was about, but in others there is a clear lack of context or even authorship information (e.g., 'Paper [n]'), making it impossible to follow the summary without having read the paper first. For example, the KG model paragraph (p. 20, l. 13--31) suddenly refers to some 'shop floor' without providing context. Sometimes the summaries are too vague; for example the "Application of the ontology" paragraph (p. 19, l. 14--26) states that one paper constructed an ontology to serve as a corpus, which does not say much. The categorisations are nicely visualised in the figures, albeit in different styles, but the categories are not necessarily used or contrasted. The tables have different structures and it is not clear how one would use them when choosing a method. Not all approaches are analysed with advantages and disadvantages (or only state advantages, like Bayesian Networks in Table 12), and the ones that are do not provide contrasting information to facilitate such a choice. Some tables focus on counting papers (even though sometimes the counts are wrong, such as in Table 8), which is interesting but offers more of an overarching analysis rather than information that could be used to facilitate a choice.

3) Inaccuracies in summaries and conclusions
The article tends to focus exclusively on the papers that were selected, which leads to strange conclusions when extrapolating to the field as a whole. I already mentioned the issue with description logic reasoning, for example. Inaccurate or unfounded conclusions can be found in several places. For example, when discussing querying under semantic-based methods for diagnosis, the authors reason as follows: "Notably, none of the reviewed studies explicitly discussed reasoning mechanisms applied during the query process, such as query rewriting or logical inference. Therefore, it can be concluded that this approach primarily serves as an information retrieval technique rather than a reasoning-based method." (p. 23, l. 48--51). This is not necessarily true. SPARQL is one of the query languages listed. While SPARQL does not perform description logic inference, it is not uncommon for SPARQL queries to be executed after a materialisation step, during which such inferred RDF triples are made explicit.

On several occasions I have been surprised by a paper summary and/or its conclusions because they do not make sense, possibly because the authors misunderstood the papers they were summarising. This misreporting unfortunately lowers my confidence in all of the summaries. While I am by no means a Machine Learning expert, Section 4.2 (Information Extraction) seems to be particularly impacted in this regard and should be rewritten; some examples are as follows:

"While there are no quantitative results regarding the effectiveness of each method, the following qualitative findings have been noted. (...) Azari et al. [2] mentioned that Apriori algorithm (applied in paper [73]) is easy to implement but requires large resources and a long time because it must scan the dataset repeatedly."
=> Apriori is a well-known data-mining algorithm for which the time and space complexity are known.

"The use of BERT [31] or ALBERT [30] with/without BiLSTM or CRF is common in the literature for entity-relation extraction tasks [56, 77--79]."
=> I believe ALBERT is a BERT. Neither BiLSTM or CRF are defined. They are not common 'in the literature' but rather in the authors' selected papers. One cannot extrapolate to the (entity-relation extraction) literature in general when the papers were narrowly selected.

"Techniques such as attention mechanisms [19] and reinforcement learning [11] can help address challenges such as ambiguity and noisy input respectively."
=> This is extremely vague. The reference to the attention mechanism does not make much sense. Presumably this is a BiLSTM with a transformer architecture to do fault detection over time? I did not have time to check. The references also imply that [19] is the primary source of attention and [11] is the primary source of reinforcement learning, which they are obviously not.

"Gong et al. [27] reported that the BERT joint extraction model achieves a better F-measure compared to the BERT+BiLSTM+CRF model for relation extraction."
=> This does however not apply in the general case, as is implied here. The context is important to report on as well.

"Techniques leveraging BERT, such as fine-tuned BERT [42], pre-trained BERT [68, 76], BERT-BiLSTM-CRF [14, 57], BERT-Decoder-CRF [66] yielded good precision and F-measure. However, reliance on labeled datasets raises challenges."
=> Also extremely vague, but my main issue is with the reporting on the models. I could be wrong, but the authors seem to have listed the terms as they occur in the papers without necessarily understanding what they mean. For example, to the best of my knowledge, fine-tuned BERT is not really a single thing, as models are commonly fine-tuned before scoring; and pre-trained BERT is BERT, since BERT is a pre-trained model.

"Entity-relation extraction using a combination of models such as SP-LEAR-BERT-MRC [58] outperform baseline models."
=> Context matters; this is not true for all datasets and depends on the baseline models. This sweeping statement is therefore inaccurate.

In conclusion, I do not think that the article provides a "comparative analysis [that] empowers researchers to make informed decisions about which approach to adopt in future studies" (p. 3, l. 38--40), which was listed as a key contribution. Its summarisation and contrasting analyses are insufficient to perform such a role. It is also not entirely clear to me why there was a focus on industry applications in the research questions if the purpose of the review was to assist in future research. The research questions are not answered explicitly either. I do think the presentation is decent, especially with the reporting on the methodology used, even if I disagree with some of those decisions. When it comes to importance to the broader Semantic Web community, I am uncertain. The term Semantic Web occurs only once (in the abstract; "Our findings aim to guide researchers and practitioners in leveraging semantic web technologies for more robust and explainable fault diagnosis in CPS."), and is missing from the search string, but the review does cover Semantic Web approaches without explicitly stating so. This is part of a recurring problem; if a term or technique does not occur in the selected papers, it does not seem to exist for the purpose of the article. Perhaps the search string is the issue. Either way, the article is first and foremost a Fault Diagnosis paper, from which some Machine Learning and Semantic Web techniques flow. However, I believe this is an issue of presentation that could be resolved, although only through a major rework.

Additional comments
-------------------

- Excessively flowery language maximising the use of positive adjectives feels artificial but also sometimes assigns value or importance, which should be left to the reader to decide. For example, the authors write: "Moreover, this study provides a *thorough* discussion and *clearly* shows open issues that have not been sufficiently studied and also offers *comprehensible* results by providing *clear* insights into the issues. These *distinctive* features make this review paper a *valuable* resource, providing insight and a foundation for future research." (p. 3, l. 43--47, emphasis added), whereas this reviewer disagrees.
- Please write out counting numbers ten or below, i.e., 'three times', not '3 times'.
- All acronyms should be explained when they first occur.
- Faults refer to machine breakdowns (p. 5, l. 28), but this could be made more explicit since it is a restriction.
- The Goals segment of Figure 3 contains a special case "prognostics" AND "fault" but it is not explained why.
- The quality screening in the review execution (Section 3.2) is not explained well. There is a mention of Chinese text but I assume the authors more specifically exclude non-English papers.
- The definition of CPS is unclear but important for paper selection.
- Forward snowballing (Figure 6) selects papers with at least two references; are these allowed to be self-citations?
- Table 3: The categories do not match the ones in the text.
- Figure 7: Arrow to Output missing.
- Figure 9: Arrow to Bayesian network missing.
- Section 4.4.3 describes statistical methods as being distinct from other approaches because they "rely on mathematical calculations to determine the similarity between the new problem and past problems stored in the knowledge base." This is a questionable definition at best.
- The reference listing needs to be tidied up.

Minor comments
--------------

- P. 3, l. 48: knowledge. [51]. => knowledge [51].
- P. 3, l. 26: Systematic Literature Review (SLR) approach => Systematic Literature Review (SLR) [35] approach
- P. 3, l. 44: comprehensible results => results
- P. 4, l. 15: valuable insights => insights
- Table 1 caption: need to be => are
- Table 1 RQ4: How faults can be => How can faults be
- P. 6, l. 46--47: While we (...) service engineer. => [Incomplete sentence]
- P. 10, l. 27: using Ontomap mapping language => using the Ontomap mapping language
- P. 11, l. 18: [65] Wang et al. [65] => Wang et al. [65]
- P. 11, l. 26: [73] and etc => [73], etc
- Table 4: doesn't => does not (x2)
- Table 6: Airbaus => Airbus
- P. 16, l. 42: that serves as the foundation for fault diagnosis => [Remove]
- P. 17, 31: Ontology => Ontologies
- P. 19, l. 14: shows that how => shows how
- P. 21, l. 1--3: Reviewing the papers (...) in table 10. => [Remove; duplicate]
- Section 4.3.4 header text: Advanced Representations => [Remove]
- P. 24, l. 34: Tieming et al. [59] established fuzzy sets => Tieming et al. [59] used fuzzy sets
- P. 25, l. 9--11: The proposed reasoning (...) handling uncertainty. => [Rephrase]
- P. 26, l. 2: Machine Learning (ML) or Deep Learning (DL) => Machine Learning (ML) and Deep Learning (DL)
- P. 31, l. 40: The Naqvi et al. => Naqvi et al.