NORIA-O: An Ontology for Anomaly Detection and Incident Management in ICT Systems

Tracking #: 3334-4548

Authors: 
Lionel Tailhardat
Yoan Chabot
Raphael Troncy

Responsible editor: 
Cogan Shimizu

Submission type: 
Ontology Description
Abstract: 
Large-scale Information and Communications Technology (ICT) systems give rise to difficult situations such as handling cascading failures across multiple platforms and detecting complex malicious activities occurring on multiple services and network layers. For network administrators and supervision teams, managing these situations while ensuring the high-standard quality of service and security of networks requires a comprehensive view on how communication devices are interconnected and are performing. However, the relevant information is spread across heterogeneous log sources and databases which triggers information integration challenges. There are several efforts to propose data models representing computing resources and how they are allocated for hosting services. However, to date, there is no model to describe the multiple interdependencies between the structural, dynamic, and functional aspects of a network infrastructure. In this paper, we propose the NORIA ontology that re-uses and extends well-known ontologies such as SEAS, FOLIO, UCO, ORG, BOT and BBO. NORIA has been developed together with network and cybersecurity experts in order to describe a network infrastructure, its events (user login, network route priority reconfiguration), diagnosis and repair actions (connectivity check, firmware upgrade) that are performed during incident management. A use case describing a failure on a fictitious network shows how this ontology can model complex ICT system situations and serve as a basis for anomaly detection and root cause analysis.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Valentina Anita Carriero submitted on 08/Feb/2023
Suggestion:
Minor Revision
Review Comment:

This paper presents an ontology (network) accompanied by a set of controlled vocabulary, for representing network infrastructures, incidents and maintenance operations on networks. This ontology directly reuses state-of-the-art ontologies, and extends them with specific classes and properties.

The ontology (network) is stored on GitHub with a useful readme file and other materials (e.g. for the evaluation), and has a permanent URI (w3id.org).
Both the ontologies and the individual ontological entities are annotated (labels, comments, provenance, etc.)

The ontology seems relevant to the domain, and extends existing resources.
The main problems I see are in the evaluation.

The paper is very well written, and pretty clear.

In the following, I report more specific comments and questions:

- Introduction: can you already make it clear in the introduction how NORIA-O enriches existing (reused) ontologies, e.g. with an example?
- Section 3: It is clear why you choose to work with competency questions. However, it is not very clear how you choose a specific methodology (or part of it), like the one in [19], rather than other methodologies that also include the formulation of requirements as CQs. Maybe this could be discussed and clarified further.
- Section 3: have the four facets been defined by the ontology designer(s), or also with the help of the domain experts?
- Figure 2: this huge and rich figure has been very challenging to "read" for me: I would appreciate (additional) more figures corresponding to smaller "pieces" of the model (maybe using the facets as you already did), so that it's easier to understand it. Moreover, the labels of some properties (e.g. the datatype ones originating from TroubleTicket) are not completely readable, or seem absent.
- page 6: "We align with the SEAS CommunicationOntology model through object properties such as networkInterfaceOf and networkLinkTerminationResource." --> can you clarify how this alignment is implemented? (I guess by declaring these properties as subproperties of seas properties, but I would specify it in the text)
- it is not very clear to me the role of the facets you define: how are they "included" in the ontology, if so? Moreover, which is the relation between these facets and the different sub-ontologies you create (which are not even listed in the paper)? And why not to use such facets or sub-ontologies to guide the ontology description in Section 4?
- Additionally, this looks to me as an ontology network: if I understood it correctly, there is a "root" module that imports all other "thematic" modules, returning the whole network. And one of such module (core) contains high-level concepts (even if e.g. agentPreferredContactMethod does not seem that high-level). In that case, I would describe it as an ontology network in the paper. And I would explain better how you identify the different sub-areas for the sub-ontologies (that is, ontology modules) and how they relate with the facets.
- However, the same namespace is used for all "modules", so there is no way, besides looking at the source ttl files, to know which concepts and relations are included in a module rather than another one. Actually, this is declared through rdfs:isDefinedBy. However, if I go to e.g. https://w3id.org/noria/ontology/DocumentOntology it redirects to the NORIA ontology. I think it would be very useful to have separate (sub)namespaces for the modules, or anyway to have a clearer organization of the network, and discuss it further in the paper (also motivating the authors' choices). There are other examples of ontology networks with a more or less clear architecture in the literature.
- Can you specify in the paper how existing ontologies are reused? (by direct reuse of individual ontological entities, I guess by looking at the ontology, which does not import them)
- page 8: "Details about the message meaning are managed with the dcterms:type property that refers to a controlled-vocabulary for event type tagging" --> which is the reason why you use dcterms:type here, instead of creating a dedicated property, like you did for resourceType?
- Moreover, some dcterms properties seem too general to me (e.g. dcterms:relation, also used in the example)
- Evaluation: I like the CQ-driven evaluation. However, I am not sure whether it makes sense to include in the evaluation CQs (9 out of 26) that cannot be answered anyway by an ontology only. It would, IMO, if the authors showed at least an example of a CQ that can be addressed by combining the ontology with an appropriate "AI method", otherwise it is not clear, I think, whether the ontology can be useful for those cases. Moreover, there could be many other measures to evaluate the ontology, rather than only considering the coverage of the requirements, e.g. a consistency check, or some metrics included in "S. Tartir, I.B. Arpinar and A.P. Sheth, Ontological evaluation and validation, in: Theory and applications of ontology: Computer applications, Springer, 2010, pp. 115–130." and other publications abou ontology evaluation
- Evaluation: the fact that the KG is private is a problem in my opinion, since the results of the evaluation are not reproducible. I believe that the same evaluation can be performed on a "fake" dataset, similar to the one already used and not published, i.e. containing example data. In this way, the results of the evaluation could be published, and it would be proven anyway that the ontology can address those CQs even if the data is not real.

*** minor comments ***
- list of authors: there's a "typo" in the name of the last author
- where the NORIA name comes from?
- page 2: "for data integration (e.g. RMLMapper [13]," --> missing ")"
- page 2: "the combination of the SEAS and PEP [1, 2] models are useful" --> the combination...is useful
- page 3: not sure whether a recently released ontology can be defined as "consolidated"
- page 4: “anomaly signatures” and others: I would use either the italic or the quotes, not both
- page 5: "or following the sosa:Observation model": introduce the sosa: prefix, same for other prefixes used in other pages (e.g. bot: and seas:). Even the namespace of the noria ontology does not seem to be mentioned in the paper, but can only be found in the GitHub repo
- page 6: correct Protégé "ProtÃl’gÃl’"
- page 11: "connection fails because of an communicationsSubsystemFailure" --> a communicommunicationsSubsystemFailure
-
page 11: "that rely on and extend" --> relies on and extends

Review #2
Anonymous submitted on 07/Mar/2023
Suggestion:
Reject
Review Comment:

*The following text is written using basic Markdown syntax.*

# Summary

The paper presents an ontology for Information and Communication Technology (ICT) systems. The ontology is proposed for two separate purposes. First, to help detect anomalies within a network of ICT systems.
Second, to facilitate the analysis of root causes for such anomalies. To serve both purposes, the ontology is said to provide a formal model for a network's infrastructure and activity as well as associated incident situations and maintenance operations. The ontology's design is informed by a set of competency questions elicited from 16 domain experts. Said competency questions are used to derive requirements for the ontology that can be tested automatically using SPARQL queries. An example of the ontology's use is provided in terms of a fictitious use-case scenario involving a connection failure.

# Overall impression

The idea of an ontology that helps with the detection and analysis of network anomalies is interesting.
The collection of competency questions (CQ) from a reasonable number of domain experts provides a solid basis for the ontology's development. These CQs are also likely to be valuable for the wider community interested in ontologies for ICT systems. Furthermore, the goal of reusing existing ontologies can be seen as a contribution to a community-driven effort of solving non-trivial modelling problems for ICT systems.

Most of the computational artefacts and materials discussed in the paper are publicly available in an online repository on GitHub. The repository is well-organised and the available documentation is adequate for the purpose of reproducing what is presented in the paper (The available 'README.md' and 'makefile' files are clear to me. However, I haven't tried to reproduce the materials discussed in the paper).

Many design choices for the ontology as well as its practical use are only described on a high level of abstraction. As a consequence, most statements and claims about the ontology's use and design are vague and therefore hard to test and verify. In addition to the lack of technical detail, there are also more general shortcomings from a methodological point of view. In my more detailed comments below, I will outline a few examples of both issues.

# Comments on the conceptual description of the ontology

The ontology's intended use and design goals are only formulated in broad and general terms. In particular, there are no concrete statements that could be tested or verified. For example, it is said in various places of the paper that the ontology is designed for anomaly detection and root cause analysis in the context of ICT systems.
However, it remains unclear
- what kind of anomalies can be detected,
- what kind of analysis is supposed to be done in terms of root causes,
- and how exactly the ontology would help with either of these tasks.

The presentation of the fictitious use-case in Section 6 also doesn't explain how the ontology is supposed to be used w.r.t. the detection of anomalies or root cause analysis.

Adding to the confusion are formulations suggesting that the ontology does in fact *not* provide support for anomaly detection or root cause analysis. For example, in Section 5 it is written that:

> "[...] these questions require the implementation of more complex AI-based algorithms such as anomaly detection algorithms. For example, to answer CQ#11 (“What was the root cause of the incident?”), the explicit representation of alarms and logs associated with a given incident is not enough and needs to be enhanced with root cause analysis algorithms."

# Comments on the technical presentation of the ontology

The ontology is authored in OWL. Yet, there are no concrete statements about the ontology's design in terms of OWL axioms. This is particularly confusing because of the emphasis on reasoning in various places of the paper. In the introduction, it says the ontology is developed to provide

> "a consolidated semantic model for describing and reasoning on the combination of network infrastructure characteristics [...], network activity [...] and operations [...]."

In Section 3, the authors write

> "we consider incidents as a central notion towards i) computing and reasoning on 'anomaly signatures'".

However, the paper does not provide any details about what kind of inferences are made possible by the ontology or what kind of information these inferences would provide.

An inspection of the OWL file reveals many aspects of the ontology that are not mentioned in the paper. For example, the ontology features a very shallow class hierarchy (most classes have only one superclass/subclass in the inferred class hierarchy). This is not a problem in and of itself. However, there are many existing and non-existing subclass relationships that raise questions.

Here are only a few concrete examples that I don't understand:
- why a class called "CommunicationDevice" is not a subclass of "Device"?
- why is the class "StructuralProperty" not a subclass of the class "Property"? (and why is "StructuralObservable" a subclass of "Property"?)
- why is the class "Service" equivalent to the class "ServiceInstance"?
- why is the class "EventRecord" a subclass of "Event"?

The use of individuals is also not clear to me. For example, why are there five individuals from different namespaces with a local identifier "type" in addition to individuals "operationType" and "commentType"?

Note that these kinds of design choices are relevant for OWL reasoning. So, I would expect details like these to be discussed in the paper.

# Comments on the ontology's evaluation

The paper doesn't make a convincing case that the ontology meets the objectives of its designated purpose effectively and satisfactorily. The presented evaluation of the ontology is limited to test cases that have been derived as part of the formal specification for the ontology's design.

The derivation of test cases follows an ontology authoring approach based on competency questions (see reference 19: "Towards Competency Question-Driven Ontology Authoring" by Yuan Ren, Artemis Parvizi, Chris Mellish, Jeff Z. Pan, Kees van Deemter and Robert Stevens). So, passing the generated tests would only provide some evidence that the ontology has been constructed in accordance with design requirements as derived as part of the chosen ontology authoring approach. In particular, it is not warranted to equate a test-driven development approach with an evaluation of the ontology.

Also, it needs to be highlighted that passing tests are reported for only 16 out of 26 competency questions. Put differently, the ontology only meets about 60% of the formal design requirements derived from the formulated competency questions.

Overall, the provided evaluation is not sufficient to make a judgement about the ontology's quality, usability, or relevance.

# Recommendation

I do not see a clear pathway for addressing the issues outlined in this review.

Review #3
Anonymous submitted on 02/May/2023
Suggestion:
Accept
Review Comment:

This paper presents an ontology that models and reasons over network topology/ infrastructure, network activities, and network operations within the context of anomaly detection and incident management in ICT systems. Overall, this paper is very interesting read and well written. The related section proves existing ontologies only partially models knowledge domains related to ICT, and none model fine grained resources of network topology at the level that is eesential for ICT. They are not developed to be interoperable with standards such as SOSA, TMForum Open API. NORIA builds on several existing well-known and some standard ontologies. Overall, this work is well motivated, but also builds upon previous research. The way the CQs are presented and their mapping to archetype patterns and individual components in the ontology framework is neat and clear. While the overall description of the modeling aspects of the ontology is not very comprehensive in the paper, the authors have published a descriptive ontology document, which is maintained in the github link provided. The paper also presents a reasonable evaluation using the CQs and toy KG, which is great. The overall impact of the paper is good, and I believe this ontology development is a first step towards potential future work on combining these results with other AI work for conducting Root Cause Analysis tasks for network anomaly detection.

The github repository is very well organized and has detailed documentation. The CQs, sample data, the ontology, and evaluation that were mentioned in the paper are easy to find and navigate in the repository. There is also has a README file and license file.

Some minor formatting and grammatical issues:
a) page 2, line 34 - close paranthesis after RMLMapper [13]
b) page 3, line 31 - some issue with rendering the reference
c) page 6, line 23 - Protege is misspelt

Review #4
Anonymous submitted on 21/May/2023
Suggestion:
Major Revision
Review Comment:

This submission introduces the NORIA-O ontology, aiming at describing management and response of incidents in ICT systems, with an emphasis on networking. The submission is under the ”Description of Ontologies” category, in which reviewers are instructed to particularly focus on design principles and methodologies, comparison with the state of the art, and applications or use case experiments.

Unfortunately, I do not believe this work is ready for publication in the Semantic Web Journal at this time.

My concerns:

* The submission does not include a comparison of NORIA with the state of the art. Related work is mentioned, but it is a very cursory listing, not a proper comparison. Consequently the motivation for the work is unclear. What specific system will use this? My thought is drawn to https://xkcd.com/927/
* The principles and method by which this ontology was developed is insufficiently covered, summarized in a paragraph or two, before delving into the design choices made. E.g., the 16 experts used, how was their suitability assessed? What do we know about them? The pool of 150 mentioned is entirely irrelevant. Given the heavy reuse, how was such reuse practically implemented, and why? E.g., were designs cloned into the target namespace, were remote references to existing ontologies employed, were local copies of those existing ontologies made, etc. Why, why not, which trade-offs were experienced, etc. This is probably also related to the intended use case/application for the ontology, which again, is unclear.
* There are multiple mentions of how this ontology reuses existing standards and modeling or other approaches, which are referenced bibliographically and using footnotes, but which are not actually described. E.g., ”Competence Question archetype mapping”, LOT methodology, Incident Management Process, ITILs Problem Management Process, etc. The article needs to be free-standing enough that it can be read without having to look up the fundamentals in 3-4 other sources in parallel. The salient points from those references need to be described here.
* The four facets by which the ontology is designed, appear arbitrary. Why these? Why are temporal aspects of descriptions in two facets? Why are physical and logical structures (”real or patterns”) in one facet, while functional (arguably closely tied to logical and patterns) in another? There are doubtlessly good reasons for this, but those reasons are not described for the reader.
* The section on Modeling Strategy is too murky to understand, as is Figure 1. What is this intended to convey?
* The section on logging and ticketing seems to be less descriptive and more prescriptive. Given how core this is to operations in non-trivially sized IT orgs, it seems unlikely that NORIA would be adopted without adaptation by those orgs. Again, understanding the intended purpose would help clarify, or maybe knowing more about those experts from which requirements were elicited, but as I read this right now, a lot of this seems arbitrary.
* Several terms used as concept names appear quite ambiguous: ”Resource” is easily conflated with Resource in the RDF sense of the word, ”Locus” seems unclear as spatial description, ”Application” as synonymous with ”Utilization Category” as opposed to the more common meaning (a piece of software), etc. Furthermore, the use of an object property resourceType as opposed to rdf:type would need to be motivated.
* The ontology is not resolvable from the namespace provided in the submission (https://w3id.org/noria), nor from base URI of concepts in linked documentation (https://w3id.org/noria/ontology/). As a reviewer or reader, I should not have to dig around the GitHub repo or docs to figure out how the authors thought w.r.t. resolving their ontology.