Multiset semantics in SPARQL, Relational Algebra and Datalog

Tracking #: 3900-5114

Authors: 
Renzo Angles
Claudio Gutierrez
Daniel Hernandez

Responsible editor: 
Eva Blomqvist

Submission type: 
Full Paper
Abstract: 
The paper analyzes and characterizes the algebraic and logical structure of the multiset semantics for SPARQL patterns involving AND, UNION, FILTER, EXCEPT, and SELECT. To do this, we align SPARQL with two well-established query languages: Datalog and Relational Algebra. Specifically, we study (i) a version of non-recursive Datalog with safe negation extended to support multisets, and (ii) a multiset relational algebra comprising projection, selection, natural join, arithmetic union, and except. We prove that these three formalisms are expressively equivalent under multiset semantics.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Olaf Hartig submitted on 21/Jul/2025
Suggestion:
Minor Revision
Review Comment:

While the revised version of the manuscript addresses all of my earlier comments, some of the fixes are not sufficient.

Especially, related to the main point M1 of my previous review, there are still issues:
- What exactly are "copies of [a tuple] t" (mentioned now in Sec.8.1.1) ? From the text, it seems that each such "copy" is a tuple itself; in fact, it seems that each such "copy" is the tuple t itself. But then, "the set of copies of t" would always be a singleton set because every "copy" is the same tuple.
- As a consequence of the previous issue, it is also not clear what distinguishes the "copies of t" from one another and why/how the IRI iri_t^i "which identifies the tuple[/copy] t^i" can be different for each copy.

I suggest to avoid talking about copies altogether. Instead, for every tuple t in r, you can assume a set of n distinct IRIs (distinct from one another, but also from the IRIs for all the other tuples in the whole MRA database), iri_t^1 ... iri_t^n, where n=card(t,r). I am sure you can write this in a more formal way. Then, you can use these IRIs to construct the RDF triples in the way you already describe before Def.18 now.

Further issues:

Example 11: The notation used to present the sets of attributes of the two example relations is different from the notation introduced in Sec.5.1. This inconsistency needs to be resolved.

My earlier comments m11 and m12 are not properly addressed.
- Def.8 still presents the Ω (which is "a multiset of SPARQL solution mappings") using a set notation in which the elements of the set are pairs consisting of a solution mapping and an integer. The notation for writing multisets as given in the paper is a different one (namely with a card function).
- Def.8 includes ".. μ(?X) .." but does not say what ?X is.

Regarding the fix for my earlier comment m13,
- what is a "SPARQL term" (now mentioned before Def.9)?
- it is still not clear what kind of thing is denoted by p in the formula;
- it is still not clear that the c1 and cn in the formula are meant to be part of the specific fact F mentioned under the union symbol in the formula.

Regarding the fix for my earlier comment m24, now the "where ..." explanation for the first case in Table 6 mentions an IRI u_r but the corresponding SPARQL pattern does not contain any u_r.

Review #2
By Dörthe Arndt submitted on 03/Nov/2025
Suggestion:
Minor Revision
Review Comment:

The authors addressed most of my concerns and I especially appreciate the effort they put into adding more structure (for example introducing the functions they use for their translations from one framework to another at the beginning of each section) and into clarifying the text by adding examples (for example the text about multiset operations). I like the overall paper and its (now well-motivated) topic, but still consider it a little bit too dense in terms of theory introduced. I understand that this has to do with the nature of the paper: there are three frameworks (SPARQL, Relational Algebra and Datalog) which need to be introduced. So, I would like to have even more explanation, but accept if the authors opt against that to save space.
With that, I would and have only a few minor points I would like to see addressed:

- page 2, existential negation: what does “preserving the cardinalities” mean?
- Table 2: P_3 UNION P_4 -> should that be P_1 UNION P_2?
- page 15, Definition 4: The definition starts with introducing the predicate names and “comp” is currently missing, the first bullet makes the effort to determine all terms occurring in D while the last bullet used “for each term a in D”, technically, D is a set of facts, so maybe one could refer to “for each a for which term(a)\inD”?
- page 15, equation 1 uses \phi and \varphi
- page 16, Definition 5: I think the predicate “bound” was never introduced
- page 19, Table 4, AND: are the rules producing “comp” still needed now that there are also facts introduced in Definition 4? I also still have difficulties understanding the where condition as it is stated, maybe that could be clarified? Maybe explaining it in example 5 could also help.
- page 18, Example 5: While I appreciate most of the examples, I think particularly this one should be refined and explained in more detail, maybe by referring to the table. Additionally, there are many typos in the example (e.g. “... people their know”-> “...people they know”, triple predicates missing in the first and the last rule) which need to be fixed.
- page 20, Example 6: what does “the fact p(a,b) contains two copies” mean?
- page 21, line 22: You rewrite the rule R=q(Y,Z)<-p(X,Z,Y) by R’=q(X,Y)<-p(X,Y,Z) and I do not understand that. Why is it not q(X,Y)<-(Z,Y,X)?
- page 21, lines 28ff, it seems that predicates in NRMD^\neg are either intentional or extensional but not both? If so, that should be mentioned when it is introduced. If I am mistaken, what happens in that case?
- page 24, line 38: fro -> from
- page 25 line 51. B_3 -> B_2

Under the condition that the above points, especially Example 5 and Table 4 will be addressed, I recommend acceptance.