What is in Your Cookie Box? Explaining Ingredients of Web Cookies with Knowledge Graphs

Tracking #: 3144-4358

Geni Bushati
Sven Carsten Rasmusen
Anelia Kurteva
Anurag Vats
Petraq Nako
Anna Fensel

Responsible editor: 
Guest Editors Interactive SW 2022

Submission type: 
Full Paper
The General Data Protection Regulation (GDPR) has imposed strict requirements for data sharing, one of which is informed consent. A common way to request consent online is via cookies. However, commonly, users accept online cookies unaware of the meaning of the given consent and the following implications. Once consent is given, the cookie "disappears", and one forgets that consent was given in the first place. Retrieving cookies and consent logs becomes challenging, as most information is stored in the specific internet browser’s logs. To make users aware of the data sharing implied by cookie consent and to support transparency and traceability within systems, we present a knowledge graph (KG) based tool for personalised cookie consent information visualisation. The KG is based on the OntoCookie ontology, which models cookies in a machine-readable format and supports data interpretability across domains. Evaluation results confirm that the users’ comprehension of the data shared through cookies is vague and insufficient. Furthermore, our work has resulted in an increase of 47.5% in the users’ willingness to be cautious when viewing cookie banners before giving consent. These and other evaluation results confirm that our cookie data visualisation tool helps increase users’ awareness of cookies and data sharing.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 01/Sep/2022
Major Revision
Review Comment:

# Overall
This paper presents a set of contributions surrounding making cookies more transparent to the end-user: an ontology for cookies (called ontocookie), a kg for cookies, and a tool for visualizing cookie information (and an evaluation thereof).

This work is timely: of data privacy is an important, current problem and educating consumers / users to fight against surveillance capitalism is important.

To my knowledge, the work is also novel -- I know of no ontology for cookies and the related work is convincing in pointing out similar -- but distinct -- semantic web resources for this purpose.

In general, the quality of writing is good. I would carefully proofread for consistent use of the Oxford comma, comma splices, and ensure that a comma follows the use of every "e.g." and "i.e.".

The manuscript is organized around the contributions: the ontology, the KG, and the visualization. My concerns are as follows.

* The Ontology
The manuscript has not made a convincing case that an ontology is needed (beyond machine-interpretability). The ontology does not use any particularly rich semantics (only domain and range), acting more as a schema than something to aid in inference/reasoning. Are there functional relations, or existential?

I also am not sure exactly how well implemented this could be. For example, dates and times are implemented as classes, and there is no data property. What does the actual KG look like? I can't find a way to look at the exact triples being generated.

* The KG
Inline with above, there is no way to actually inspect the KG -- or it is not apparent in the github repository.

* The Tool and Evaluation
The link for openid generator does not work within the tool. (9/1/22 - ~17:00 EDT)

The results of the evaluation should be visualized in a table. The prose format was difficult to follow, or compare answers pre/post tool use.

The metrics evaluated in the tool do not seem to actually reasonably proxy the conclusion. Where did the 47.5% increase in caution come from?

Section 6 (conclusions) should be separated into a discussion of the results (i.e., discourse related to how the tool helped, etc., reasonable discussion of the effectiveness of the proxies), and actual a summary of the paper (conclusion)

Finally, it was not exactly clear to me why using an ontology and KG is useful or effective. In the most uncharitable case - this work could be replicated using ad-hoc json files and some python scripting. I would like to know why the (admittedly quite heavy) overhead of an ontology and triplestore (e.g., graphdb) is necessary for this.

It is this reviewer's opinion the paper would benefit from significantly extended discussion of the value that semantic web / knowledge graph technologies bring to the table, or how they enable significant added value in future work (i.e., that this is the foundation upon which future laurels may rest).

Review #2
Anonymous submitted on 03/Dec/2022
Major Revision
Review Comment:

This paper introduces an approach for the very important problem of cookies and their management through semantic web techniques, and is quite relevant to the Special Issue on Interactive Semantic Web. The paper describes all the steps of the approach with details, performs a user-based evaluation, and provides links to the source code and to the web application. On the contrary, there are some parts that should be revised, such as to mention more clearly the novelty of the presented work, i.e., also in comparison with the related work, whereas more statistics and efficiency measurements can be added. For this reason, my suggestion is for major revision, since I believe that the changes and advances required can be resolved. More details are presented below.

Strong points
S1. An interactive web application for the very important issue of cookies and their management is presented by using semantic web technologies.
S2. A web application and the code is available online, whereas a new ontology the can be also important for other researchers has been released
S3. A user based evaluation has been performed and analyzed.

Weak Points
W1. Some sections, but especially the related work section, should be reorganized
W2. In some parts, more details should be provided, as it is mentioned below, e.g., novelty should be described in more details, and comparison with related work is missing
W3. Experiments including statistics and the efficiency of the presented workflow are missing.

Abstract & Introduction

The abstract is well described. Concerning the introduction, the motivation is well presented, i.e., the key problems that are related to cookies and privacy are discussed. On the contrary, in my opinion, the paragraph “The use of semantics, namely KGs … discussed in [22, 23]” should be moved to the related work section. Please try in the introduction to focus on motivation, contribution and novelty. In this way, provide more details about your contribution, and the novelty of your approach. For instance, you just mention “A cookie KG”, however more details should be included.

Related Work

The related work section is hard to follow, please provide different subsections, e.g., Cookies and Privacy, Cookies and Semantic Web solutions, visualizations etc. Moreover, you should place the presented work according to the related approaches, i.e., to describe more clearly your novelty, differences with these works, etc.

Section 3

This section presents the main methodology, my only comment is the following:
More details should be given for the following “Existing solutions for cookies (with and without the use of semantics) were also reviewed”, e.g., to mention in brief these solutions.

Section 4.

The ontology is well described, it reuses existing standards, and statistics are mentioned. Concerning 4.2, it could be interesting to provide the SPARQL queries that you use for the insert and the select case. Also, as I mention below, statistics about the number of triples and their creation/retrieval time are missing, and it would be good to provide both the mentioned statistics and the queries that are used.

Section 5.

Concerning the evaluation section, in my opinion several things are missing. I agree with the user based evaluation, however, it should be also good to provide statistics/metrics about the effectiveness and efficiency of the presented workflow of Fig. 3. For example, some ideas are listed below:
Number of triples per website/cookie type/etc
Execution time for insert/retrieve the data to/from your KG and for the whole workflow of Fig 3.
Regarding the user based evaluation, it is quite interesting and well presented.


Please reorganize it, first describe in brief what you presented, then the evaluation results, and finally, the future work.

Additional issues

By trying to use the web app, I had two different errors.
a) i received a CORS policy error
b) By turning off CORS, i got the following error:
Failed to load resource: the server responded with a status of 503 (Service Unavailable) main.dart.js:33263 Uncaught TypeError: Cannot read properties of null (reading 'i')

Minor issues
emphasise → emphasize
higher level of → higher levels of
comprises of a → comprises a
indivdiuals’ → individuals’
a button that give →a button that gives
if agreed → if they agreed
of the the users willingness →of the users’ willingness

Review #3
Anonymous submitted on 05/Feb/2023
Review Comment:

The paper presents and discusses a KG-based tool aiming to improve the awareness of web users with respect to the cookies installed on their devices.
The tool relies on the OntoCookie ontology and, by starting from it, it creates a KG and a custom viewer to ease the presentation of information to end-users.

The overall idea behind a tool like this is nice.
However, from both the scientific and technological perspectives the contribution of this manuscript seems poor.

First, concerning the semantic layer, which is the actual difference between the KG and the OntoCookie ontology?
They are presented as separate contributions, but from the paper, it is not clear how these two elements contribute to the work in two different ways.

Second, concerning the tool, it seems that the implementation "simply" instantiate the OntoCookie ontology with cookie's data but, besides a straightforward visualization, no further operations are performed of such data (e.g., reasoning).
Hence, it is unclear if such a tool is only a good exercise for adopting semantic technologies into a real-world use case, but without particular innovative aspects.
Indeed, the testing by considering only 4 websites is quite limited.
Moreover, using such well-know domains may lead to the guess that their cookies are also well-formed.
But, what would it happen if the tool would manage cookies from lesser-known websites?
Here, it would be interesting to observe if null nodes would be created in the knowledge graph and how user awareness may change.

Finally, the evaluation is appropriate and quite well-defined.


This paper was submitted for consideration in the Special Issue: “Special Issue on Interactive Semantic Web”.

Hi. Thanks for acknowledging DPV within the paper. I cannot review the paper as it would construe a conflict of interest, but I'm making general comments that hopefully are helpful, and some corrections to references.

# Comments regarding cookie ontology

AFAIK, this is the first work to systematise cookies as an ontology (there have been several categorisations over the years). So well done on picking this up, running it as a user study and putting the code and resources online. Its always good to see accessible and reproducible work.

What is interesting is that your design of the ontology differs from the apparently common cookie dialogues and in relation to legal requirements (specifically ePrivacy Directive and GDPR). I recall a short discussion on creating a taxonomy for cookies within DPVCG, as well as tangential conversations about cookie categorisations with browser devs and W3C Privacy WG over the past ~3 years - but nothing happened. I will only comment about DPVCG: we decided against such an explicit taxonomy only for cookies because the terms were already covered by existing concepts (i.e. Purpose, Necessity, and Technology).

The same concepts are also useful for explaining the common web dialogues with following variations (for EU):
a) necessity: necessary & purpose: providing requested service;
b) necessity: optional & purpose: personalisation or optimisation based on persisted preferences and choices;
c) necessity: optional & purpose: analytics;
d) necessity: optional & purpose: marketing/profiling/tracking.

So this can be used to provide categorisations based on: 1) purposes - as above 2) actor-roles - first party, third-party, same-site, etc. 3) persistence - ephemeral, persistent, fixed-duration 4) modality - HTTP-only, HTTPS/encrypted 5) Necessity - did user have a choice in accepting it? This can also be phrased as what was the lawful basis e.g. consent 6) Anything else I've missed.

Then there is the question of how privacy/data protection laws like GDPR and their requirements to provide information on cookies via dialogues or notices has shaped users' perceptions as well as the availability of this information to us researchers. #5 regarding necessity is a good link since it states use of consent. So when asking to users about cookies, it would also have been interesting to check whether they recall these notices, or about accepting optional purposes, or (more likely) they were co-erced into accepting all these cookies through the use of deceptive practices (e.g. Accept All is the only easy choice). There's growing literature on these topics, and having a semantics-backed framework for these would be really cool to create some systemic knowledge and tools.

# References

1) The citation for DPV should be: Pandit, H.J. et al. (2019). Creating a Vocabulary for Data Privacy. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C., Meersman, R. (eds) On the Move to Meaningful Internet Systems: OTM 2019 Conferences. OTM 2019. Lecture Notes in Computer Science(), vol 11877. Springer, Cham. https://doi.org/10.1007/978-3-030-33246-4_44

2) The footnote #3 https://dpvcg.github.io/dpv-gdpr/#A7-3 refers to a specific concept and not the entire resource. The PURL for DPV is https://w3id.org/dpv and for DPV-GDPR is https://w3id.org/dpv/dpv-gdpr

3) The footnote #2 is a URL for Google but is cited as for Chrome, and this itself is about extensions being browser-specific. Rather than referencing a specific browser, it would be better to refer to a common page that presents support of APIs and extension limitations across major browsers. For example, MDN https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/B...

Thanks for the interesting paper, and I look forward to the reviews.