d2kg: An Integrated Ontology for Knowledge Graph-based Representation of Government Decisions and Acts

Tracking #: 3328-4542

Authors: 
Konstantinos Serderidis
Ioannis Konstantinidis
Nick Bassiliades
Georgios Meditskos
Vassilios Peristeras

Responsible editor: 
Karl Hammar

Submission type: 
Ontology Description
Abstract: 
To implement Open Governance a crucial element is the efficient use of the big amounts of open data produced in the public domain. Public administration is a rich source of data and potentially new knowledge. It is a data intensive sector producing vast amounts of information encoded in government decisions and acts, published nowadays on the World Wide Web. The knowledge shared on the Web is mostly made available via semi-structured documents written in natural language. To exploit this knowledge, technologies such as Natural Language Processing, Information Extraction, Data mining and the SemanticWeb could be used, embedding into documents explicit semantics based on formal knowledge representations such as ontologies. Knowledge representation can be made possible by the deployment of Knowledge Graphs, collections of interlinked representations of entities, events or concepts, based on underlying ontologies. This paper presents a new ontology d2kg [d(iavgeia) 2(to) k(nowledge) g(raph)] integrating in a unique way standard EU ontologies, core and controlled vocabularies to enable exploitation of publicly available data from government decisions and acts published on the Greek platform Diavgeia with the aim to facilitate data sharing, re-usability and interoperability. It demonstrates a characteristic example of a Knowledge Graph based representation of government decisions and acts, highlighting its added value to respond to real practical use cases for the promotion of transparency, accountability and public awareness. The developed d2kg ontology is accessible at: http://lpis.csd.auth.gr/ontologies/2022/d2kg/d2kg.owl.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Eva Blomqvist submitted on 11/Mar/2023
Suggestion:
Major Revision
Review Comment:

This ontology paper presents an ontology of government decisions and actions, in the context of Greek public administration. This is the revised version of an earlier submission.

Overall, the paper has been improved, but still several issues remain, and they are not all minor. Which leads to my suggestion of again doing a major revision of the paper.

First of all, I cannot assess the resource itself, since the link provided on the front page of the paper is broken. When following the redirect, and checking under the groups “ontology” pages, I cannot find any link to this project or the ontology unfortunately. This is also an example of a bad practice when publishing an ontology, since the ontology should have been given a permanent URI (e.g. through w3id) before publishing it, so that that URI can simply be redirected if the pages/files need to be moved. Also other links later, e.g. some of the links in section 4.5 about URIs are broken.

Section 2.1 has been improved and is on the right track, but needs to comment more on each ontology to make it clear exactly what is missing in each one for covering the whole set of use cases, and what this work adds to that existing work.

The relation to the previous Diavgeia ontology is still unclear in the paper. The new paragraph added on page 6 is unclear, details should be provided on the comparison between the new and “old” version. In addition, this paragraph seems to contradict the first paragraph of the methodology section later on the same page, where it sounds as if the “new” version is merely an extension of the previous one - but then it would still be monolithic etc?

Regarding the CQs of use case 3, all CQs seem to be aggregates - does this imply that the actual data does not need to be stored, but only the aggregates? Or is the list simply incomplete?

Section 4.1 is strangely placed. Is this the motivating scenario from which CQs have been generated? If yes, move it earlier, before the use cases and CQs. However, if this is a use case scenario for evaluation/application it should instead be moved later in the paper, after the ontology has been presented. Given that it mentions concrete properties in the ontology (which is not yet introduced), it seems it should be the latter. But given 4.2 the method description in 4.2 it seems it should be the former. In summary, this section needs to be moved and probably rewritten. Section 4.2 about the methodology is now also very brief, and does not seem to add up with the use cases and CQs in the previous section. If the ontology is constructed based on analysing an annotating documents, then how are the use case and CQs actually used in the process?

Fig 2 that illustrates some concepts of the ontology does not seem to be standard UML, and the legend is still very much unclear. It is still not clear what some of the relations mean, e.g. what does it mean that an object property connects two classes - is it domain and range or another axiom? Colors are mixed, and sometimes properties are shown in the boxes, but sometimes instead subclasses are shown, without any obvious explanation.

On page 11 the authors state that the ontology is composed of vocabularies of “mixed formats”, so how is your ontology represented then? Later is sounds a if it is an OWL ontology, but how do you then do this “mixing” in practice? The authors need to describe more in detail how the integration is done, e.g. through imports, or referencing URIs? But some vocabularies are not in OWL, e.g. the authors say that for instance CPOV is in RDF, so how is that possible? Additionally, what does “integration of controlled vocabularies” mean technically?

Overall the description of the ontology is not sufficient for an ontology paper. Please see [1] for the suggestion of a minimal set of things to describe! Since the ontology is also not accessible online, the quality of the actual artefact cannot be properly assessed.

In section 4.5 I am not sure what the authors mean by saying the “standard XML representation of a URI”, how can a URI be represented in XML? It seems to be more some data snippet about the thing identified by the URI? But why XML and not RDF if you are using linked data principles and RDF as stated both earlier and later in the paper?

The query of use case 3 on page 18 seems wrong. It does not make sense to count the documents and then display also the names and posts, since there will be several names and posts if the count is more than 1. It only works since there seems to be only one post in the data.

Further, it is unclear if the inferences described in section 6.2 are really used in the application of the graph? Or are these only examples for verifying the ontology structure? If this is only for verification, then what reasoning is actually used in the application of the ontology?

Last paragraph of section 6: Are those example instances? Then it does not seem fair to include them at all in the evaluation, since example data should not be considered part of the ontology. If not, i.e. they are actually part of the ontology, then what does it mean that they are not fully covering the classes?

Finally, the paper needs some additional proof reading. For instance, on page 8 the acronym CPV is used in a strange way in the sentences, as if it meant something else, e.g. a type in that vocabulary?

[1] Matentzoglu, N., Malone, J., Mungall, C. et al. MIRO: guidelines for minimum information for the reporting of an ontology. J Biomed Semant 9, 6 (2018). https://doi.org/10.1186/s13326-017-0172-7

Review #2
By Harshvardhan J. Pandit submitted on 20/Mar/2023
Suggestion:
Minor Revision
Review Comment:

Considering my earlier review was "Minor Revision", I am satisfied with the changes made. I have two minor actions as below. Of these the documentation should be considered a MUST these days since we have tools such as WIDOCO that do most of the heavy lifting, and as this is a public project having documentation would help better disseminate as well. Please provide the documentation for this at the long-term link provided.

- Ontology is accessible at http://lpis.csd.auth.gr/ontologies/2022/d2kg/d2kg.owl - however this is an OWL serialistion that looks to be generated from Protege. It is conventional good practice to also provide a human oriented documentation (e.g. HTML, PDF), as well as alternate formats for convenience (e.g. turtle). With tools such as WIDOCO, this can be automated.
- ACTION: Provide ontology documentation intended for human consumption

- ACTION: Provide more details for what settings the reasoner was set up to execute with i.e. what did it check within the ontology for consistency (e.g. were some reasoning options set on/off?). Also check whether using a different reasoner provides differing results / inconsistencies. This can mean in practice you may have only checked for consistency in class hierarchies, but not properties. So a statement to this effect should be sufficient. Reasoners have several options off by default in Protege, so are aware if you have not checked something that could have been checked?

Review #3
Anonymous submitted on 24/Mar/2023
Suggestion:
Accept
Review Comment:

(NB. The lines starting with > are my feedback to the changes made to the article. The lines above those are my comments to the first version of the submission)

General:
This paper proposes a new ontology which integrates standard EU ontologies, core and controlled vocabularies following W3C recommendations to exploit publicly available Open data following Linked Data principles and thus additionally allow a Knowledge Graph based representation of government decisions and acts.

2. Related work
There are four examples in the related work which seems quite a few, but maybe it is true. I know there exists LOD publications of e.g. parliamentary speeches. Last sentence if this section states “A number of standard ontologies and vocabularies have been …” – does this refer to the four examples given in the list earlier, or could you provide examples of ontologies or vocabularies you are referring to.
> Thank you for adding an adequate amount of examples to related work

2.1.
Figure 1. What I wonder is the difference between the classes “Membership” and “Post”, e.g. membership has a property for its duration time:Interval, but wouldn’t a Post as well have a temporal aspect.
> You have updated to Figures so this comment is now obsolete. The former Figure 1 depicting W3C The Organization Ontology is now removed from your article and available via the footnote, which is all fine now.

2.4.
I would add some of the explanation e.g. “("Diavgeia" ..., “Nomothesia” ...)” as footnotes and not in the middle of the chapter (although now all the current footnotes provide links to related websites).
> Well maybe it’s only me, but somehow I find these parts a little bit hard to read

4.3. d2kg
The classes person:Person and foaf:Agent are both for modeling people. Could you explain in more detail the difference between the two of these, e.g. why can’t they be merged into a common class?
> The design choices of the ontology are well-founded

4.3.1
Chapters 4.3.1–4.4. provide good and detailed introduction to the classes and properties in the ontology. However I wonder if some part of these subsections would be more informative if they were in a format of e.g. Tables instead of body text?
> Figure 4 is updated, now focusing on the essential details

6.1.
> Previously very short subchapter 6.1. Debugging is now explained in more detail.
> In earlier review I suggested cropping the previous Figure 11 (Debugging in Protégé) more tightly. Leaving it completely out of the article was maybe an even better decision.
The related GitHub repositories are well organized and introduced in great detail in the README files.

General evaluation: Accept