PrivOnto: a Semantic Framework for the Analysis of Privacy Policies

Tracking #: 1486-2698

Alessandro Oltramari
Dhivya Piraviperumal
Florian Schaub1
Shomir Wilson1
Norman Sadeh
Joel Reidenberg1

Responsible editor: 
Guest Editors Linked Data Security Privacy Policy

Submission type: 
Full Paper
Privacy policies are intended to inform users about the collection and use of their data by websites, mobile apps and other services or appliances they interact with. This also includes informing users about any choices they might have regarding such data practices. However, few users read these often long privacy policies; and those who do have difficulty understanding them, because they are written in convoluted and ambiguous language. A promising approach to help overcome this situation revolves around semi-automatically annotating policies, using combinations of crowdsourcing, machine learning and natural language processing. In this article, we introduce PrivOnto, a semantic framework to represent annotated privacy policies. PrivOnto relies on an ontology developed to represent issues identified as critical to users and/or legal experts. PrivOnto has been used to analyze a corpus of over 23,000 annotated data practices, extracted from 115 privacy policies. We introduce a collection of 57 SPARQL queries to extract information from the PrivOnto knowledge base, with the dual objective of (1) answering privacy questions of interest to users and (2) supporting researchers and regulators in the analysis of privacy policies at scale. We present an interactive online tool using PrivOnto to help users explore our corpus of 23,000 annotated data practices. Finally, we outline future research and open challenges in using semantic technologies for privacy policy analysis.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 22/Dec/2016
Review Comment:

Previous review comments have been implemented to a considerable level.

Review #2
By Luca Costabello submitted on 03/Jan/2017
Review Comment:

Thanks for addressing major remarks in the rebuttal.

The newly added section 2 sheds some light on how the contribution of this paper fits in the bigger picture, including the role of NLP.

I appreciate the “Semantic Search” functionality described in sec 5 and 6 (Note to authors: link [1] in your cover letter does not work).

Thanks for sharing the knowledge base and the PrivOnto Ontology.

- garbled text in ref 51, 52 in in bibliography.

Review #3
By Pompeu Casanovas submitted on 20/Jan/2017
Minor Revision
Review Comment:

A short comment, this time, because some of my comments have not been addressed. As I already noticed in my first review (i) “Grau” should be quoted as “Cuenca-Grau” [15], (ii) there is no risk analysis, (iii) the authors equate and do not differentiate policies and jurisdictions (especially the difference between US and EU conceptions of data protection and privacy). This is a bit surprising, as one of them has extensively written on this subject.
The paper shows some overconfidence in the results, assuming that the ontology will be effectively used. But no evidence is offered yet. The ontology validation process is still to be done. The description of the state of the art could benefit from “Ontologies for Privacy Requirements Engineering: A Systematic Literature Review” (by Gharibi et al.). The same holds for the ontology-based consistency of legal and policy argumentation reasoning on privacy.
The reader cannot grasp easily the conceptual framework in which the annotation process took place: what counts as “relevant information”, and specifically “legal relevant information”? The annotation process is in itself interesting, and could be explained a bit more. This leads to the possibility of making more explicit the boundaries (and not only the expected benefits) of the project.
To put one example, the authors write on the creation of the website privacy policy corpus (in a previous paper): “We excluded the “World” sector and limited the “Regional” sector to the “U.S.” sub-sector in order to ensure that all privacy policies in our corpus are subject to the same legal and regulatory requirements”. Moreover: “For each selected website, we manually verified that it had an English-language privacy policy and that it pertained to a US company (based on contact information and WHOIS entry) before downloading its privacy policy”. In fact, they reproduce the first paragraph on this article, but not the second. They should: this is not only a feature of the corpus, but it defines the scope and implementation of the analysis and the related ontology. I.e. as a matter of fact the selected policies belong in fact only to “US companies”. Other languages, legal cultures, drafters, and markets are set apart. All of this should be made explicit in the final version of the article.