RDF Device Description Generator (RDDG)

Tracking #: 2241-3454

Mieczyslaw Kokar
Yanji Chen
Jakub J. Moskal

Responsible editor: 
Pascal Hitzler

Submission type: 
Full Paper
This paper describes a program for generating RDF instance data based on a given OWL ontology. We needed this kind of capability for the purpose of comparing the use of OWL vs. XML based descriptions within scenarios in which capabilities of radios are matched against application needs. Most of the existing solutions for RDF dataset generation are not generic, i.e. they are locked-in to specific class models or ontologies. Existing generic solutions do not support sufficiently (or totally ignore) the OWL semantics, or rely on large numbers of real-world data. Our experimental results show that RDDG performs reasonably well with generating large numbers of descriptions, which are consistent with and have a good space coverage (variety) over the input ontology. It is expected that the method will be applicable to other application scenarios other than radio communication domain.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Md Kamruzzaman Sarker submitted on 27/Aug/2019
Review Comment:

This paper proposes an algorithm to generate instance data/a-box data from an ontology. This paper focuses on owl axioms coverage, i.e. the generated axioms is consistent with the ontology and has proper restrictions as the given ontology. This paper focuses on cognitive radio description data generation. To validate the algorithm it used 5 ontologies to perform the test.

Synthetic instance data generation is important for semantic web, to benchmark semantic web tools and where real data is scarce. For cognitive radio device domain, real data is not that much available. This paper proposes an algorithm to generate instance data and the experiment presented was done for cognitive radio domain.

Main contribution of this paper is it uses the restrictions (using proper domain/range, cardinality etc) of owl-semantics to ensure the created instances also have the proper restrictions on it. This paper is well written, algorithm is correct, which is supported by the experiments.

From the research point of view or originality, generating instance data is kind of engineering not a full research problem. I think this paper would be a very good fit for top semantic web conferences, but it would be hard for the journal.

Some comments:
1. One missing thing from the paper is comparing the runtime with previous algorithms/tools.
2. Figure 3 can be compressed into just 1 or 2 images not 3 different images.
3. Typo (extra then) on the line: "Then instance data are then generated from the model" on section 2.4
4. To make the result significant or to it journal paper the algorithm may be tested on different kinds of ontology, not only on cognitive radio domain ontology.

Review #2
By Maria Poveda submitted on 07/Sep/2019
Major Revision
Review Comment:

This paper describes an algorithm for generating, given an ontology, RDF data. The work done is in general well explained and it would be applicable in other context by the semantic web community members.

Main reasons for my score are:

.- It is not clear the contribution form a research point of view: Even though authors explain a number of benchmark generators, it is not clear how the proposed method compares to GAIA, for example. It is said that they covered less axioms and the weakly-related requirement, but a validation based on the comparison of results from both algorithm would be helpful to measure to what extend the proposed method represents a step forward. In addition, the system could be also compared with "Perry, M. (2005). Tontogen: A synthetic data set generator for semantic web applications. AIS SIGSEMIS Bulletin, 2(2), 46." (and included to the related work).

.- It seems to be a mismatch between the title, motivation and actual contribution. While the paper is oriented to Device and RF devices the actual contribution is a generic RDF data generator, independently of the domain. I am aware of Definition 4 in section 3, but how does this root class from the ontology affects the application domain? I mean, one could use the algorithm selecting another root class form another domain an the result would be the same. It seems that this definition is forced by authors to link the work to the RF domain, or is the domain knowledge used somewhere in the system?

.- Evaluation: the system is (partially) evaluated against the requirements chosen by authors but not against existing systems which makes it difficult, as mentioned, to measure the contribution with respect to the state of the art. The fact that the requirements selection is not well motivated and could be a deliberative decision makes the evaluation less convincing. Finally, Requirement 4 mention "Scalability/extensibility". The extensibility is not proven in the paper. Actually, the evaluation is done with previous SSN version and it would be interesting to see how the process adapts to the new SSN version. How it would handle the new version and in particular the fact that the ssn:Device class does not exist in the new release?

.- The selection of the requirements are not explained at all. Why are these 4 requirements relevant in the dataset generation? Is there a study about this needs or are they chosen by authors?

.- Requirement 2 is unclear in the following point "generated RDF/OWL device descriptions": 1) If by this term author mean the data generated it would be just "RDF device description" which should be done following some ontology that contains the OWL axioms. 2) If authors mean the ontology used in the generation, then this requirement would be out of scope as it would be a requirement of the ontology, not the generation process or result. It seems that the case is more likely to be option "1".

.- Requirement 4: here it seems that 2 requirements are merged (implication in evaluation)

Other comments:

.- Section 4 does not provide much information.

.- In section 5.2.1: notations Dp,r(Ci) y Op,r(Ci): What does it mean that the D or O are related to a given Ci? Does it mean that Ci is the domain of D or O or that the given property appears in at least one axiom of Ci? Or another option? This is not clear to me.

.- How many tests are generated for the calculation of "Dataset passing rate"?

.- How are statistics in figure 5 generated? How many dataset of size 500, 1000 are generated? Is this a mean/median of N datasets?

Minor comments:

.- Page 8: SDR ontology has not been introduced before.

.- Section 5.3.1 from "For those that retrieval of ...." is not clear.

.- Some bibliographical references refer to websites instead of research contribution and might be better included as footnotes. At least: 6, 7, 15, 17, 18, 38, 42, 43, 44, 45, 47, 48, 50, 53 and 54.