Deriving Semantic Validation Rules from Industrial Standards: an OPC UA Study

Tracking #: 3184-4398

Yashoda Saisree Bareedu
Thomas Frühwirth
Christoph Niedermeier
Marta Sabou
Gernot Steindl
Aparna Saisree Thuluva
Stefani Tsaneva
Nilay Tufek Ozkaya

Responsible editor: 
Guest Editors SW for Industrial Engineering 2022

Submission type: 
Full Paper
Industrial standards provide guidelines for data modeling to ensure interoperability between stakeholders of an industry branch (e.g., robotics). Most frequently, such guidelines are provided in an unstructured format (e.g., pdf documents) which hampers the automated validations of information objects (e.g., data models) that rely on such standards in terms of their compliance with the modeling constraints prescribed by the guidelines. This raises the risk of costly interoperability errors induced by the incorrect use of the standards. There is, therefore, an increased interest in automatic semantic validation of information objects based on industrial standards. In this paper we focus on an approach to semantic validation by formally representing the modeling constraints from unstructured documents as explicit, machine-actionable rules (to be then used for semantic validation) and (semi-)automatically extracting such rules from pdf documents. While our approach aims to be generically applicable, we exemplify an adaptation of the approach in the concrete context of the OPC UA industrial standard, given its large-scale adoption among important industrial stakeholders and the OPC UA internal efforts towards semantic validation. We conclude that (i) it is feasible to represent modeling constraints from the standard specifications as rules, which can be organized in a taxonomy and represented using Semantic Web technologies such as OWL and SPARQL; (ii) we could automatically identify modeling constraints in the specification documents by inspecting the tables (P=87%) and text of these documents (F1 up to 94%); (iii) the translation of the modeling constraints into formal rules could be fully automated when constraints were extracted from tables and required a Human-in-the-loop approach for constraints extracted from text.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jürgen Bock submitted on 15/Sep/2022
Review Comment:

In their revised version the authors conscientiously addressed all main comments and suggestions for improvement from my initial review. This clarifies the scope and impact of the contribution, rules out misconceptions and strengthens the article regarding its scientific character and readability. This and the important gap that is addressed by this work, I recommend to accepted the manuscript for publication.

Review #2
By Gianfranco Modoni submitted on 07/Oct/2022
Review Comment:

I verified the authors have addressed all of my previous comments.
Although the application of the proposed approach still remains to be improved (the authors have in plan specific activities), the contribution of this research work is new and significant and it has a promising potential that will be interesting to explore in the future phases of the work.
For this reason, I propose the acceptance of the manuscript.

Review #3
Anonymous submitted on 17/Nov/2022
Minor Revision
Review Comment:

After carefully reading through the changes and the comments, I acknowledge that much of the points made in the review have been addressed, at least to some degree. In particular the related work section has been extended, although the selection of papers is not particularly targeted. At least some acknowledgement of work outside the strict Semantic Web community has been given. Rule generation and constraint taxonomy are somewhat improved.

Overall, this is still a quite inward-looking paper. Many limitations remain, as do the problems with the specific methods and languages employed. However, these are not sufficient to reject the paper and it is not clear how they could be addressed without a fundamental change in approach, while the evaluation is acceptable relative to the methods employed. I am tending to recommend acceptance, with one particular revision.

It is pointed out that a full list of rules has not been produced and cannot be evaluated. Given that the paper contains multiple techniques that are evaluated in more detail, such a preliminary aspect is perhaps acceptable, but not the claims that accompany this situation. In particular, Section 7.2 states that "only a subset of the rules was implemented", followed by the claim that "the approach's feasibility" was demonstrated. This claim is therefore false.

"The approach" in this paper (or rather section) is not just the implementation of some rules in SPARQL for extraction purposes. Of course this can be done, there is no innovation it. Rather, "the approach" is the overall use extraction and use of rules for semantic interoperability with OPC UA. But therefore the FEASIBILITY of the approach would be measured relative to that problem, and what has been stated is that this is actually work in progress, and in particular the scalability (one of the aspect of my original question) cannot be evaluated because a complete effort simply has not been conducted.

So, what can be claimed to have been demonstrated in 7.2 is that rules can be created and can then function correctly for its particular application area. The claim that "the feasibility of the approach has been shown", however, MUST be deleted.

Minor comment: Fig16 has two occurrences of the typo "Linguistc".