What is in Your Cookie Box? Explaining Ingredients of Web Cookies with Knowledge Graphs

Geni Bushati
Sven Carsten Rasmusen
Anelia Kurteva
Anurag Vats
Petraq Nako
Anna Fensel

Full Paper
The General Data Protection Regulation (GDPR) has imposed strict requirements for data sharing, one of which is informed consent. A common way to request consent online is via cookies. However, commonly, users accept online cookies unaware of the meaning of the given consent and the following implications. Once consent is given, the cookie "disappears", and one forgets that consent was given in the first place. Retrieving cookies and consent logs becomes challenging, as most information is stored in the specific internet browser’s logs. To make users aware of the data sharing implied by cookie consent and to support transparency and traceability within systems, we present a knowledge graph (KG) based tool for personalised cookie consent information visualisation. The KG is based on the OntoCookie ontology, which models cookies in a machine-readable format and supports data interpretability across domains. Evaluation results confirm that the users’ comprehension of the data shared through cookies is vague and insufficient. Furthermore, our work has resulted in an increase of 47.5% in the users’ willingness to be cautious when viewing cookie banners before giving consent. These and other evaluation results confirm that our cookie data visualisation tool helps increase users’ awareness of cookies and data sharing.
Hi. Thanks for acknowledging DPV within the paper. I cannot review the paper as it would construe a conflict of interest, but I'm making general comments that hopefully are helpful, and some corrections to references.

# Comments regarding cookie ontology

AFAIK, this is the first work to systematise cookies as an ontology (there have been several categorisations over the years). So well done on picking this up, running it as a user study and putting the code and resources online. Its always good to see accessible and reproducible work.

What is interesting is that your design of the ontology differs from the apparently common cookie dialogues and in relation to legal requirements (specifically ePrivacy Directive and GDPR). I recall a short discussion on creating a taxonomy for cookies within DPVCG, as well as tangential conversations about cookie categorisations with browser devs and W3C Privacy WG over the past ~3 years - but nothing happened. I will only comment about DPVCG: we decided against such an explicit taxonomy only for cookies because the terms were already covered by existing concepts (i.e. Purpose, Necessity, and Technology).

The same concepts are also useful for explaining the common web dialogues with following variations (for EU):
a) necessity: necessary & purpose: providing requested service;
b) necessity: optional & purpose: personalisation or optimisation based on persisted preferences and choices;
c) necessity: optional & purpose: analytics;
d) necessity: optional & purpose: marketing/profiling/tracking.

So this can be used to provide categorisations based on: 1) purposes - as above 2) actor-roles - first party, third-party, same-site, etc. 3) persistence - ephemeral, persistent, fixed-duration 4) modality - HTTP-only, HTTPS/encrypted 5) Necessity - did user have a choice in accepting it? This can also be phrased as what was the lawful basis e.g. consent 6) Anything else I've missed.

Then there is the question of how privacy/data protection laws like GDPR and their requirements to provide information on cookies via dialogues or notices has shaped users' perceptions as well as the availability of this information to us researchers. #5 regarding necessity is a good link since it states use of consent. So when asking to users about cookies, it would also have been interesting to check whether they recall these notices, or about accepting optional purposes, or (more likely) they were co-erced into accepting all these cookies through the use of deceptive practices (e.g. Accept All is the only easy choice). There's growing literature on these topics, and having a semantics-backed framework for these would be really cool to create some systemic knowledge and tools.

# References

1) The citation for DPV should be: Pandit, H.J. et al. (2019). Creating a Vocabulary for Data Privacy. In: Panetto, H., Debruyne, C., Hepp, M., Lewis, D., Ardagna, C., Meersman, R. (eds) On the Move to Meaningful Internet Systems: OTM 2019 Conferences. OTM 2019. Lecture Notes in Computer Science(), vol 11877. Springer, Cham. https://doi.org/10.1007/978-3-030-33246-4_44

2) The footnote #3 https://dpvcg.github.io/dpv-gdpr/#A7-3 refers to a specific concept and not the entire resource. The PURL for DPV is https://w3id.org/dpv and for DPV-GDPR is https://w3id.org/dpv/dpv-gdpr

3) The footnote #2 is a URL for Google but is cited as for Chrome, and this itself is about extensions being browser-specific. Rather than referencing a specific browser, it would be better to refer to a common page that presents support of APIs and extension limitations across major browsers. For example, MDN https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/B...

Thanks for the interesting paper, and I look forward to the reviews.