d2kg-OWL: An Integrated Ontology for Knowledge Graph-based Representation of Government Decisions and Acts

Tracking #: 3160-4374

Konstantinos Serderidis
Ioannis Konstantinidis
Georgios Meditskos1
Vassilios Peristeras1
Nick Bassiliades

Responsible editor: 
Karl Hammar

Submission type: 
Ontology Description
Public Administration is a rich source of data and potentially new knowledge. It is a data intensive sector producing vast amounts of information encoded in government decisions and acts, published nowadays on the World Wide Web. The knowledge shared on the Web is mostly made available via semi-structured documents written in natural language. To exploit this knowledge, technologies such as Natural Language Processing, Information Extraction, Data mining and the Semantic Web could be used, embedding onto documents explicit semantics based on formal knowledge representations such as ontologies. Knowledge representation can be made possible by the deployment of Knowledge Graphs, collections of interlinked representations of entities, events or concepts, based on underlying ontologies. This paper presents a new ontology, d2kg [d(iavgeia) 2(to) k(nowledge) g(raph)], integrating in a unique way standard EU ontologies,core and controlled vocabularies to enable exploitation of publicly available data from government decisions and acts published on the Greek platform Diavgeia with the aim to facilitate data sharing, re-usability and interoperability. It demonstrates a characteristic example of a Knowledge Graph based representation of government decisions and acts, highlighting its added value to respond to real practical use cases for the promotion of transparency, accountability and public awareness. The proposed d2kg ontology is accessible on “http://lpis.csd.auth.gr/ontologies/2022/d2kg/d2kg.owl”
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 17/Aug/2022
Minor Revision
Review Comment:

Review comments
The paper describes an ontology – d2kg for representing government decisions and acts which are currently published on the portal of the Greek Government’s Programme called Diavgeia in non-machine-readable formats. The d2kg ontology was developed by integrating a set of existing EU ontologies, core and controlled vocabularies. Three key use cases in the domain of public procurement were provided to guide the development of the ontology. A knowledge graph representation of ontology was also provided to demonstrate the value of the new ontology.
(1) Quality and relevance of the described ontology (convincing evidence must be provided).
The developed ontology provides significant value-add to existing resources already published on the Greek government’s Diavgeia portal which has already been recognized as a good practice in Europe in the area of Open Government. The exemplar use cases on transparency in public spending and the extent of publicity of call-for-tender documents are particularly very useful. The quality of the ontology is inherently linked to the design approach and the heavy reuse of existing and already mature ontologies, core vocabularies and controlled vocabularies. By also clearly highlighting potentially new and useful relationships in the integrated ontologies, the authors reveal the value and relevance of the d2kg ontology.
(2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.
The paper is very clearly written and provides a reasonably detailed description of the ontology. The rationale for the new ontology was clearly explained as well as the background information on existing relevant vocabularies (e.g the Core vocabularies – person, business, location, core public services, core criterion, and evidence) and ontologies (core organization and e-procurement). The description of the methodology for the ontology development could however be better articulated. Although most of the information about how the ontology was built can be found in the paper, the methodology described in section 3 only covers use cases and competency questions. This section should include all aspects of the process for developing the ontology and validating it. For instance, the W3C guideline referenced by the authors as guiding the ontology development process is for linked data production rather than ontology development.
However, as remarked earlier, the description of the ontology itself is very detailed, the authors clearly described the classes and object properties of the base ontologies underpinning the d2kg ontology. URI strategy (based on the persistent URL approach) was also described with examples.
Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess
(A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data,
The readme file is well organized and provides the necessary important information about the ontology. Most of the technical information in the paper is provided in the README file.
(B) whether the provided resources appear to be complete for replication of experiments, and if not, why,
The provided resources are complete and all claims in the paper can be replicated. The competency questions and corresponding SPARQL queries were provided.
(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and
The ontology and related resources are publicly accessible on GitHub.
(4) whether the provided data artifacts are complete.
The data artifacts appear complete.

Review #2
By Harshvardhan J. Pandit submitted on 28/Aug/2022
Minor Revision
Review Comment:

## Summary

The article is well written, well presented, and the developed work is accessible online. Albeit with limited documentation. The use-cases were clear, as well as the developed ontologies. The reuse of existing authoritative resources (especially from EU COM bodies) points towards good practice and interoperability. The work itself appears to be complete. So the authors are encouraged to undertake the below as minor changes, rewrite them into the article, and resubmit.

Drawbacks: The current version has limited relevant work and state of the art, the developed work needs to be clearly stated in separation from reused work (to clear any ambiguity on what exactly is the contribution). The ontology needs to be better explained in terms of how it was developed, evaluated, and use of reasoners and metrics as described in their respective sections. The use of screenshots for code and other tasks should be substituted with better informative designs (e.g. listing and tables) for readability.

## Introduction

- Motivation is well written and clear.
- Page 2 line 27 onwards, the paragraph describes the contributions of the paper as well as the objectives of presented work. It can be better explained to assist in identifying where the contributions are novel (regarding this field) and where they are pertinent for the use-case (i.e. they implement an application).
- ACTION: Provide explicit list of objectives / outputs
- Using this list, the article can be better structured to present related work (state of the art), methods for implementation/design, and to evaluate the work
- This minor change should will also (help to) emphasis that this article has research components and is not meant as a project implementation report for a particular use-case.

## Related Work

- This section is good on the background. It is weak on competing approaches.
- ACTION: The authors should provide more information on state of the art or competing approaches for the goals/research objectives. Suggestions: eGovernment frameworks, ontologies, Legislative approaches, decision and case law representation

## Ontology Development

- Page 7 Section 3.2 Competency Questions: is this the full list of competency questions? 15 CQs spread across 3 use-cases seem inadequate for the objective and scope of the ontology. Further, are the CQs answered anywhere? i.e. what are the concepts developed from the CQ? Were the CQ utilised later to assess/ensure the ontology has been sufficient developed? Is there any artefact or evidence of this?
- ACTION: Provide information about competency questions in terms of: completeness, use in development of concepts, use in evaluation of ontology.
- Ontology is accessible at http://lpis.csd.auth.gr/ontologies/2022/d2kg/d2kg.owl - however this is an OWL serialistion that looks to be generated from Protege. It is conventional good practice to also provide a human oriented documentation (e.g. HTML, PDF), as well as alternate formats for convenience (e.g. turtle). With tools such as WIDOCO, this can be automated.
- ACTION: Provide ontology documentation intended for human consumption
- For reused ontologies, e.g. Core vocabularies, please provide versions utilised or integrated since these vocabularies are periodically updated
- The ontology contains both OWL and SKOS components together, e.g. skos:broader and owl:subClassOf (complex restrictions). However, the methodology does not mention this or provide an explanation for how this design affects the complexity, or if this was an acceptable practice that the developers decided to adopt based on the requirements of their use-cases as well as reuse of external vocabularies.
- ACTION: Provide information about ontology development and evaluation in terms of mixing OWL and SKOS together, and its effect if any on reasoning complexity, consistency checks, and reuse of developed ontology (e.g. are adopters required to follow practices from OWL, SKOS, both as needed?)
- Page 9 line 45 the terms mentioned have different conventions: 'postCode' and 'date_document' - why this discrepancy? Is this because of reused terms? IF yes, it would be better to present these terms using prefixed notations to highlight this inheritance.
- ACTION: Explain why concepts use differing style conventions for terms?
- The ontology (Fig.4 and linked file) is coherent to me in terms of understanding what concepts exist and how they relate to the usecase, and though the style conventions take some time to get used to - they are functional in linking concepts and relations.
- Page 11 4.3.1 d2kg classes - here there are links provided to Diavgeia and ePO ontologies with phrases such as "main entities introduced" or "deployed". To me, this indicates these classes where added by the authors to these ontologies.
- ACTION: Confirm the concepts presented were authored/contributed by the authors to mentioned resources. IF this is incorrect, please consider using less ambiguous terms, e.g. "we reused the following classes".
- Page 22 Sec. 6 ontology assessment does not mention CQ coverage (also mentioned earlier). It also does not elaborate on using tools such as OOPS! that check for common pitfalls.
- ACTION: Provide a note regarding OOPS! assessment (or another method of checking for pitfalls or bad ontology design indicators or any other quality assessments used).
- Fig.11 is a screenshot of Protege. This is redundant. You can simply mention you executed a specific reasoner (name + version) using a tool (Protege) and it reported consistency in ontology.
- ACTION: Remove Fig.11
- ACTION: Provide more details for what settings the reasoner was set up to execute with i.e. what did it check within the ontology for consistency (e.g. were some reasoning options set on/off?). Also check whether using a different reasoner provides differing results / inconsistencies.
- Page 22 6.2 Reasoning - the explanation was confusing to understand, e.g. the domain/range aspects. The inferred concepts are presented, but not the context of where/why they were inferred. The example states Fig.12, which is a screenshot of Protege. The details are difficult to make out here.
- ACTION: Rewrite 6.2 reasoning with a running example, and use a table or another format to list the concepts and the reasoned concepts to assist the example. Though not providing these details and simply presenting the reasoned concepts in the example should be sufficient as well. The description and example for reasoning can be used to highlight how the ontology helps identify/create knowledge.
- Page 23 6.3 Metrics - it isn't clear to me how this information is supposed to be useful. For example, there is a large number of Axioms, but a low number of class and property counts. Is this indicative of something? Further, how many of these are from imported/reused ontologies? How many are unique to developed ontology? Similarly, the use of OntoMetrics is fine - if you provide an explanation for what those numbers mean in practical terms. For example, attribute richness is high is specified as being okay and class richness being low is also specified as being okay. Given that ontometrics is not a common metric used alongside ontologies, the description can be written and presented in a better manner.
- ACTION: Provide better explanations for metrics used for ontologies. More specifically, details of axiom breakdowns for reused ontologies if applicable, and what do the count of axioms mean for ontology users (e.g. they have a high number of facts in a KG to work with?). Also explain what the ontometrics outputs mean in terms of ontology sufficiency for your use-case - did you strive towards some specific metric (e..g richness > 1.0) or did you use these during development to do periodic checks?
- ACTION: The DL expressivity is explained without consequences of its use in applications, please provide an explanation. For example, the ontology is better suited towards querying or efficient reasoning etc. See different flavours or profiles of OWL2.

## Writing & Presentation

- The writing is clear, coherent, and well structured.
- The presentation of arguments, material, and resources is good.
- Page 15 line 22 CPV is provided as 4.4.2 Taxonomies. Since it is the only entry, and the section prior to it also relates to taxonomies (Authority tables), CPV can be moved at the end. Given that CPV is published by an EU body, it should be okay structurally. The description for what is a taxonomy can be removed.
- ACTION: Move CPV to 4.4.1, remove taxonomy explanation
- Figures of code are bad for reading (pixelation increases with resolution).
- ACTION: For figures with code (5, 6) provide the RDF as a code listing instead. Consider using Turtle instead of XML for better readability.
- Page 17 line 8 "developed ontology allows to produce data ... via a Semantic Graph Database ..." - This is confusing for me. Aren't all well formed ontologies expressable as graphs (for that matter, all RDF is). Furthermore, I don't think there is a specific standard for "Semantic Graph Databases", correct? The most I would know would be SPARQL support. Also, GraphDB is easy on the eyes with its dashboard, but other than that I would advise caution against stating its efficiency without a reference to back this up.
- ACTION: Rephrase above mentioned sentence. Consider simply presenting the fact that you have used a tool and found it adequate (e.g. running SPARQL queries, visually presenting data). Also consider providing a better and clearer diagram of the sample use-case since the GraphDB representation is difficult to read and understand. The nodes are small enough that this can be done in any convenient tool (E.g. draw.io)
- Page 17 The description of Fig.7 has a lot of Greek text, which is difficult to grok in terms of understanding what concepts/relations lead to identification.
- ACTION: Provide the relevant concept/relation for information identified, e.g. highlight that "is created by" is the relation that permits identifying Organisation that issued the decision - since there is a difference in text ('issued') and the relation ('created by')
- Page 19+ Fig 8,9,10 is just a screenshot of a SPARQL query with two rows as result. This is difficult to read, the page orientation has been changed to horizontal, and a lot of space is empty & wasted.
- ACTION: For Fig.8 Instead of providing a screenshot, insert a table with the results below the SPARQL query i.e. Page 18 CQ1. Same for Fig 9 and Fig 10.
- ACTION: Consider rewriting the conclusion to better highlight what the outputs of this work are, what it has made possible i.e. what kinds of use-cases and applications are now possible in the Greek governance bodies. This directly leads to your stated future work where you state expanding the scope with more actors and collaborations. eGovernance is a great example of open interoperable semantic technologies - and its worth highlighting their usefulness.

## Minor

- In the abstract, line 34, it says "proposed d2kg ontology": why say proposed when you have a functional ontology? You can say the presented ontology, or developed ontology, etc.
- Page 2 line 31: remove required and change mandatory to mandated
- Page 2 line 35: universal to a certain extent approach sounds confusing, please rephrase. E.g. a universal approach with/for limited scope
- Fig.2 the colours for Diavgeia and ELI are too close together on the same pallette. Please consider either spreading them apart to contrasting shades, or using a different colour pallette to better highlight developed work.
- Page 1 line 34: URLs should not be in quote
- Page 17 line 3 "can be demonstrated" -- please be assertive, you have done work that has demonstrable outputs in this section. You can say "is demonstrated via ...". The use of "can" refers to a possibility i.e. the reader has to do the work to get to it.

Review #3
By Eva Blomqvist submitted on 15/Sep/2022
Major Revision
Review Comment:

The paper describes an ontology for documenting governmental decisions and actions, in order to support transparency of government. The topic is interesting and relevant, however, the paper is unfortunately not very well written, has an unclear focus, and there are also potential issues in the quality and reusability of the resource itself.

First of all, the paper is overly long to be an ontology paper, which is recommended to be a short paper, typically below 10 pages. This paper is 25 pages in total. Still, this is not the main problem of the paper, since there are quite a few things in the paper that could be cut, see comments below.

What I instead would claim is the main shortcoming of the paper is the unclear purpose and scope of the ontology, as well as the lack of a sufficient evaluation/application of the ontology. As it is presented now, the paper neither makes clear exactly why this ontology is needed, i.e. what gap it fills and what the benefit of having an ontology for this data is, nor what exactly it is to be used for. The title of the paper is pretty generic, and from that it sounds as if the ontology covers all kinds of decisions and acts (does this mean actions or official documents by the way?), but in the end the only focus seems to be on financial decisions, such as procurement and funding decisions. This delimitation is not made explicit anywhere in the paper but rather can be understood by the reader from studying the CQs of the ontology. Although some generic use cases are presented, the paper would benefit from a concrete application - how has this ontology been used in the end? This is both for setting the scope of the ontology, but also for evaluating it through its usage, with actual data. As it stands now, the paper lacks an application-focused evaluation of the ontology. It is debugged, and analysed in terms of its characteristics, but the only thing that gets close to an evaluation are sections 5 and 6.2. Section 5 though reads more as an example of structuring data in accordance with the ontology, which is then loaded in a triple store and queried through SPARQL. To me it seems highly unlikely that the end usage is to visualise and query the data through the GraphDB GUI, I assume that this is just a way to verify the CQs? Section 6.2 discusses some inferences of the ontology. But neither section really describes the intended application that should use the ontology, why it is useful and correct, with an appropriate coverage, complexity etc., and how will it benefit the end-users?

This also relates to the fact that the discussion of related work in the paper is poor. Some related work is listed, but mainly in terms of ontologies that are reused by the proposed ontology, and not in terms of alternatives or “competing” ontologies. For instance, how does this ontology relate to other open government initiatives, such as data.gov and their vocabularies? Or the Australian AGRIF ontology (https://raw.githack.com/agldwg/agrif-ont/master/agrif.html)? A search in LOV also reveals a number of other potentially related ontologies (see https://lov.linkeddata.es/dataset/lov/vocabs?&tag=Government ). From a related work section I do not only expect to see a description of the reused ontologies/previous work, but more importantly a discussion on related efforts and why/why not they were reused or taken into account - what are the gaps? Why is this new ontology needed? In particular since this ontology seems to be quite specific, one needs to consider the contribution of publishing this paper. Such a contribution could be if this ontology, although quite specific for Greece at the moment, models something that none of these other government data ontologies do (which could then be useful to apply also in other countries), but this is not clear from the paper. Which in turn makes the contribution of the ontology paper quite unclear - who will benefit from reading this, or reusing the ontology?

With this said, the next thing to consider is the quality of the ontology itself, which is quite hard to judge based on the paper and associated resources. The paper contains a link to the actual ontology, which does contain some comments, but partially in Greek. Also other links in the paper takes you to pages in Greek, and Figure 3 in the paper is completely opaque to someone who does not read Greek and would need some more explanation to understand what the document is actually about. Additionally, here suddenly it sounds as if the usage of the ontology will be some kind of information extraction task from text documents, or is it manual annotation? A proper online documentation of the ontology (e.g. as a html page generated from the ontology), at its URI, would also greatly help. Additionally, Figure 4 is quite confusing. The notation of the figure is not explained, e.g. what the boxes and arrows mean in terms of actual implementation in the ontology, i.e. OWL constructs. There is no standard UML notation for OWL as far as I am aware. Nevertheless, I assume that the boxes are classes, and arrows represent object properties? But why are the object properties called “connections”, both in the text and in the figure? I assume “existing” means imported, and “new” means locally defined in the ontology? But what does indirect mean? Inferred? There are also types of arrows used that are not in the legend, e.g. the long dashed one, such as between legalResource and Value. I further assume that the lists inside the boxes are datatype properties, but what does it mean that some have bullets, some have a + and some nothing in front of them? Some things also seem a bit strange and might require some explanation, e.g. why Document is a subClassOf Contract and not the other way around. There is no connection between organisations an agents - does this mean that you don’t see organisations as agents, as in the W3C org ontology, or is it just omitted in the figure? Figure 2 is actually even more confusing - both in terms of its legend/notation (totally different from Figure4), but also in terms of the relation between these ontologies. Is this a picture of the same ontology, or a different one? If the former: why have two figures describing the same ontology in two notations? If the latter: what is the relation between the two ontologies? My guess is that the presented ontology is somehow an extension of a previous ontology, which is somehow hinted in the text, but the paper needs to describe this relation in detail. Is the old ontology reused (imported?) and extended, or is it replaced and remodelled completely?

Some more detailed comments on parts of the paper, in addition to the discussion above:
- What is the name of the ontology? Just d2kg or d2kg-OWL as in the title?
- Section 1, second to last paragraph: why is an ontology needed in this case? Does it have to do with data integration, i.e. that the organisations represent decisions in different ways? Just having to upload some data does not necessarily require an ontology.
- Introduction to section 2.4: is it Internet or the Web? And is it really 28 million decisions per day??
- Both the use cases on page 7, and the CQs on page 8 are quite vague and ambiguous, and in particular I am not sure about the terminology. What is an economic operator? Is it an organisation or some kind of measure/formula? And what does “top” refer to in CQ1 - receiving most funds? Most recent? Most frequently receiving funds? How does the notion of “contracting authority” in CQ4 relate to the general notion of “organization”, and to “economic operator”? What does appointment mean in CQ5, are these meeting appointments or something else? It seems detached from the other CQs - why is this interesting in terms of transparency of spending? Use case 2 seems to be about tenders, but CQ3 mentions instead decisions/acts, how are they related to tenders? Also in use case 3 there are highly ambiguous CQs, containing terms such as “most popular” (how can this be measured?) and “appointed” (are these employment contracts?).
- What does “document analysis” refer to in section 4.1? Is this a part of the ontology engineering methodology, i.e. the way the ontology has been built, or is it another use case of the ontology, i,e. to annotate documents or do information extraction? Similar for 4.2 - is this really about building the ontology, or about annotating documents using the ontology?
- In section 4.3.1 the authors mention importing other ontologies, but it is unclear to what extent this is actually done. In the ontology file directly linked from the paper there seems to be no imports at all, only external elements referenced by their URIs. However, in the github repository the version of the ontology there seems to import one ontology. It should be made clear what the architecture of the ontology is and how it technically reuses the other ontologies.
- It is unclear what Figures 5 and 6 represent. For sure these do not show the URIs of the two resources. It seems rather like the results of some DESCRIBE query over a specific URI, displayed for some reason in XML rather than in the return format of a SPARQL query.
- What do you mean with Semantic Graph Database in section 5.1?
- Figure 8-10: there is no point of showing screenshots of the query interface. If the authors want to show some example queries and their results this may be relevant, but not in the form of screenshots. Similar arguments hold for figures 11 and 12, which are screenshots from Protégé.
- Section 6.2 needs clarification. Do you mean that you infer a domain restriction for the property “hasAwardCriterion” somehow? Or are you using a domain restriction to perform the inferences? Similarly I do not understand what you mean that you “infer concepts” for the other bullet points. Please clarify.
- Please introduce the OntoMetrics measures in 6.3 briefly, so that the table is interpretable without looking up the reference. Also, I am not sure about the discussion in the bullet points - could this be made more specific? Statements such as “good coverage in the range of concepts” are quite vague and unclear.
- The conclusions section needs to be re-written to state specific conclusions that can be drawn from the presented results. In some parts it now more reads like a general discussion, and several paragraphs are unclear and ambiguous. Paragraph 2 in this section, for instance, is completely unclear to me. Also in paragraph 3 there are unclear statements, such as “The benefit is evidently the scalability of the ontology…” while no such scalability assessment has been done in the paper.
- Reference list can be improved. In many cases the name of the conference is used instead of the full title of the proceedings volume, also volume numbers and series are often missing. Style is sometimes different, i.e. sometimes the year is in parenthesis, sometimes not.

Review #4
Anonymous submitted on 26/Sep/2022
Minor Revision
Review Comment:

This paper proposes a new ontology which integrates standard EU ontologies, core and controlled vocabularies following W3C recommendations to exploit publicly available Open data following Linked Data principles and thus additionally allow a Knowledge Graph based representation of government decisions and acts.

2. Related work
there are four examples in the related work with seems quite a few, but maybe it is true. I know there exists LOD publications of e.g. parliamentary speeches.
Last sentence if this section states “A number of standard ontologies and vocabularies have been …” – does this refer to the four examples given in the list earlier, or could you provide examples of ontologies or vocabularies you are referring to.
Figure 1. What I wonder is the difference between the classes “Membership” and “Post”, e.g. membership has a property for its duration time:Interval, but wouldn’t a Post as well have a temporal aspect.
I would add some of the explanation e.g. “("Diavgeia" ..., “Nomothesia” ...)” as footnotes and not in the middle of the chapter (altough now all the current footnotes provide links to related websites).
4.3. d2kg
The classes person:Person and foaf:Agent are both for modeling people. Could you explain in more detail the difference between the two of these, e.g. why can’t they be merged into a common class?
Generally why isn’t a person modeled using some of the existing vocabularies (schema.org, foaf, cidoc etc) instead of introducing a new person:Person resource?
Difference between org:Organization and org:OrganizationalUnit. How do you distinct between these two classes in cases of e.g. a deeper organizational hierarchy (e.g. state / municipal government / suborganizations of a municipal government …)
Chapters 4.3.1–4.4. provide good, detailed introduction to the classes and properties in the ontology. However I wonder if some part of these subsections would be more informative if they were in a format of e.g. Tables instead of body text?
Figure 11 could be cropped more tightly, now ca 40 per cent of its area is a blank part.

The related GitHub repositories are well organized and introduced in great detail in the README files.

General evaluation: Minor revisions required