A Benchmark Dataset with Knowledge Graph Generation for Industry 4.0 Production Lines

Tracking #: 3350-4564

Muhammad Yahya
Aabid Ali
Qaiser Mehmood
Lan Yang
John Breslin
Muhammad Intizar Ali

Responsible editor: 
Guest Editors SW for Industrial Engineering 2022

Submission type: 
Dataset Description
Industry 4.0 (I4.0) is a new era in the industrial revolution that emphasizes machine connectivity, automation, and data analytics. The I4.0 pillars such as autonomous robots, cloud computing, horizontal and vertical system integration, and industrial internet of things have increased the performance and efficiency of production lines in the manufacturing industry. Over the past years, efforts have been made to propose semantic models to represent the manufacturing domain knowledge, one such model is Reference Generalized Ontological Model (RGOM). However, its adaptability like other models is not ensured due to the lack of manufacturing data. In this paper, we aim to develop a benchmark dataset for knowledge graph generation in Industry 4.0 production lines and to show the benefits of using ontologies and semantic annotations of data to showcase how I4.0 industry can benefit from KGs and semantic datasets. This work is a result of collaborations with the production line managers, supervisors, and engineers of a football industry to acquire realistic production line data. Knowledge Graphs (KGs) or a Knowledge Graph (KG) emerged as a significant technology to store the semantics of the domain entities. It has been used in a variety of industries, including banking, the automobile industry, oil and gas, pharmaceutical and health care, publishing, media, etc. The data is mapped with RGOM classes and relations using an automated solution based on JenaAPI, producing an I4.0 KG. It contains more than 2.5 million axioms and about 1 million instances. This KG enables us to demonstrate the adaptability and usefulness of the RGOM. Our research helps the production line staff to take timely decisions by exploiting the information embedded in the KG. In relation to this, the RGOM adaptability is demonstrated with the help of a use case scenario to discover required information such as current temperature at a particular time, status of the motor, tools deployed on the machine, etc.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 06/Mar/2023
Review Comment:

The authors have generally implemented the comments from the earlier review phase. The scope of the paper is clear, i.e., providing and evaluating a sample dataset for production lines in Industry 4.0. The authors suggest the use of the dataset or the RGOM data model for other cases than football production.

While the dataset URL is referenced in the paper, some general metadata properties should still be explicitly indicated, as required by the guidelines of the journal for dataset papers:

“name, URL, version date and number, licensing, availability, etc.; topic coverage, source for the data, purpose and method of creation and maintenance, reported usage etc.; metrics and statistics on external and internal connectivity, use of established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language expressivity, growth; examples and critical discussion of typical knowledge modeling patterns used; known shortcomings of the dataset.”

When making the final submission, a long-term stable URL should be added to the submission form, such as Github, Figshare or Zenodo, containing a README to clarify usage of the data. Storing the dataset on Google Drive is less recommended.

The authors claim that the dataset is compliant with the FAIR principles. As suggested in the previous review, it would be beneficial to explicitly evaluate the dataset for FAIR compliance.
Please double-check punctuation and spelling. Also, some sentences are missing articles in the paper.

Review #2
Anonymous submitted on 15/Mar/2023
Review Comment:

Dear Authors,

I appreciate the addressing of some of the comments. I think the article has improved compared to previous versions.
I accept this paper having faith that the authors will address the comments below.

- There are several statements in the article that are not accurate. For example, "The semantic web uses an ontology to represent the information in a machine-processable structure [5]". This is not accurate. The Semantic Web uses ontologies (and not just an ontology) to structure knowledge but also relies on other building blocks.

- "The construction of RGOM has reused several classes and properties from previous studies [12], [13], [16] and also modeled a number of missing concepts and relations." -> Although RGOM is presented in another article. It would be really important that it is better described in this article, since only some references of the reused ontological models are given, although according to [15], another important ontological model reused is the one presented in [Franco Giustozzi, Julien Saunier, Cecilia Zanni-Merk, Context Modeling for Industry 4.0: an Ontology-Based Proposal, Procedia Computer Science, Volume 126, 2018, Pages 675-684, ISSN 1877-0509, https://doi.org/10.1016/j.procs.2018.08.001.], which is not referenced in this article. Even the figure of the ontology is practically the same as well as the description of many of the concepts described in section 4.2.1. Another module that is very important is the sensor module. According to the prefix it is possible to identify that SSN Onotlogy is the reused ontology. It would be very important to add a brief description about it and to say that SSN is reused. After all, the dataset includes values collected from sensors and it is of paramount importance the semantic description of the collected values. This is made from the concepts and relations provided by SSN.

- I emphasize with respect to the values generated on the temperature using equation (1), how can one be sure that the values correspond to real situations/conditions? Specially taking into account that the article claims that this is a key point and a contribution to consider real data.

- In section 4.2.2. "Mapping Between RGOM and Data" an algorithm is described but if I understand correctly this is simply populating the ontology. I think this should be said in this section or at least explain why you use "Mapping Between RGOM and Data" instead of "Populating RGOM". Is there a difference between Mapping and Populating?

- I think there is an error in the Algorithm's Output (output: ontologymodel.write()). What is ontologymodel.write()? Shouldn't it be the ontology with instances or the KG?

- At the beginning of section 4.3, "KG is a knowledge base that uses a graph-structured data model or topology to integrate data." -> Is this a definition? Perhaps the definition of KG should be added in the introduction. Can you add a reference?

- It would be very useful to add a file in the resource (https://drive.google.com/drive/folders/15G4tgVheu-gOHg8Ia4VKwF2UgXeZFWzn...) with a brief description explaining the contents of the folder where the files containing the KGs are located as well as the queries and not just the KGs and queries alone.