Enhancing Data Use Ontology (DUO) for Health-Data Sharing by Extending it with ODRL and DPV

Tracking #: 3528-4742

Authors: 
Harshvardhan J. Pandit
Beatriz Esteves

Responsible editor: 
Cogan Shimizu

Submission type: 
Ontology Description
Abstract: 
The Global Alliance for Genomics and Health is an international consortium that is developing the Data Use Ontology (DUO) as a standard providing machine-readable codes for automation in data discovery and responsible sharing of genomics data. DUO concepts, which are encoded using OWL , contain only the textual descriptions of the conditions for data use they represent, and do not specify the intended permissions, prohibitions, and obligations explicitly - which limits their usefulness. We present an exploration of how the Open Digital Rights Language (ODRL) can be used to explicitly represent the information inherent in DUO concepts to create policies that are then used to represent conditions under which datasets are available for use, conditions in requests to use them, and to generate agreements based on a compatibility matching between the two . We also address a current limitation of DUO regarding specifying information relevant to privacy and data protection law by using the the Data Privacy Vocabulary (DPV) which supports expressing legal concepts in a jurisdiction-agnostic manner as well as for specific laws like the GDPR. Our work supports the existing socio-technical governance processes involving use of DUO by providing a complementary rather than replacement approach. To support this and improve DUO, we provide a description of how our system can be deployed with a proof of concept demonstration that uses ODRL rules for all DUO concepts, and uses them to generate agreements through matching of requests to data offers. All resources described in this article are available at: https://w3id.org/duodrl/repo.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jaime Delgado submitted on 20/Sep/2023
Suggestion:
Accept
Review Comment:

After the first review, the paper has been improved by solving most of reviewers' concerns.

Just to request that the information in the Appendix on "Analysis of recent Developments" to be integrated in the paper itself. This is a very good extension that shows the relevance of the topic and the value of the proposed approach.

Review #2
Anonymous submitted on 08/Oct/2023
Suggestion:
Minor Revision
Review Comment:

Unfortunately, the matching algorithm seems actually wrong, in the light of the authors' reply. Please fix it as explained below, and let me have a quick look at the new version before publication.

DETAILED COMMENTS:

The explanation R3.2 provided by the authors clarifies that their matching algorithm is correct in some cases. Still, there are cases where it returns a wrong answer.

1) prohibitions:
if offer:spatial is EU and request:spatial is Spain then the algorithm does not reject the request, because it checks whether EU is contained in Spain (which is false). However, a prohibition concerning the EU should cover also Spain, and block the request.

Note that swapping the two terms of the equivalence (i.e. checking whether request:spatial is a special case of offer:spatial) would not solve the problem: if offer:spatial is Spain and request:spatial is EU then the algorithm would not reject the request because (due to the swap) it would see that EU is not a special case of Spain (again). However, the prohibition applying to Spain should block a request that is meant to apply to *all* Europe (in other words, the request is too general, given the prohibitions in the offer; the request includes also forbidden types of data usage).

The correct solution is rejecting the request whenever request:spatial and offer:spatial have common elements (nonempty intersection). As a special case, this happens whenever one of the two terms is a special case of the other, as in the above examples. The rationale is that prohibitions may block the request by denying it either totally or *partially*.

The same considerations apply to the other properties of offers and requests checked in lines 31-36. In particular, purposes are organized in a hierarchy in DPV.

2) permissions:
I understand that offer:group and request:group are classes of entities. In order to permit a request, the group of entities involved in the request should be *entirely* contained in offer:group (otherwise the request involves some entities that are not authorized to use the data).
This means that the check between lines 44 and 45 should verify whether request:group is not a subclass of offer:group (i.e. the two terms should be swapped).

Similarly for the test on purposes between lines 45 and 46.

Another problem is that the symbols used in the algorithm are misleading and confusing in various ways:
- the standard semantics of the membership operator \in is: "the left-hand side is an instance of the class in the right-hand side", while here it is mapped on owl:sameAs which means "the lhs and the rhs are the same thing". Note that \in is asymmetric while owl:sameAs is symmetric.

- the standard semantics of the equivalence symbol \equiv is that its two terms denote the same class, while here it is mapped on owl:subClassOf which means: "the lhs is included in (and possibly differs from) the rhs". \equiv is symmetric, while owl:subClassOf is antisymmetric.

Thus, in order to improve readability and prevent misunderstandings, the authors should replace \in with "=" and \equiv with \subseteq in the algorithm.