Review Comment:
The paper deals with SPARQL query containment. The problem is very important and relevant for various semantic web activities, including but not restricted to querying.
The contributions of this paper are, from my standpoint:
 a querycontainment prover based on first order logic;
 discussion and cleanup of benchmarks.
The first contribution is original in the sense that other solutions attempted to avoid full firstorder logic and have a different and often reduced covereage with respect to the present work. Moreover, the proposed prover, thanks to the use of stateoftheart reasoners seems to perform efficiently.
There are several points which, from my standpoint, prevent to publish the paper in its current state:
 The structure of the paper is quite unbalanced: there are too much known content and to few original one.
 There are some weird problems with the content.
 The main theorems are missing: this is certainly the most serious issue.
 The numerous English language mistakes make the paper difficult to read.
I come back on these issues one by one.
(1) Structure
A lot of the content before Section 7 (p15) is not new content. This is not wellbalanced. It is clear that it is needed to introduce the manipulated concepts. However, this is too much of already published material. Moreover, the presentation provides the basics, with various forward references to extensions (Section 5). I think that it would be good, to (a) simplify the description of SPARQL semantics (with references to other source) and (b) directly cover the whole fragment used in this work to avoid forward references and repetitions. For instance, all figures presenting the grammar Section 5 occupy a lot of space and actually contain only one line worth of information. The discussion on normalisation in 5.1 would benefit from referring to standard work on normalisation in logic and query manipulation.
Another structural problem is that it is natural to discuss query containment (4.2) and query subsumption (6) together. It is not clear why this is not done this way.
The numbering of items is non standard: please ask yourself why it usually does not happen in a paper that two theorems have the same number. This is not acceptable.
(2) Various weird aspects
End of page 5, it is written: 'We follow the standard set semantics...'. As far as I know, but I may be wrong, the standard SPARQL semantics is the bag semantics and not the set semantics. Hence, the sentence is misleading for those who do not know. There is no shame to write: 'Despite the SPARQL semantics being a bag semantics, we follow the standard approach to query containment of using the set semantics \cite{thosewhomadeitstandard}.'
It is amazing that the theory does not take directly into account variable renaming but in the last section (p17) it is added to the implementation! The reason why this is strange is that for theory, this is not a problem (a simple homomorphism of variable vectors which could also account for different arities will do), though for an implementation this could be a serious problem.
In Section 4.2, query projection is suddenly suppressed, though it is actually very present in real world queries. This would deserve to be discussed at least during the evaluation (reading Table 1 and 2 it seems that after all it is supported?) and in relation with complexity, because this is a major source of complexity.
The discussion at the beginning of page 2 is unclear: (a) 'we also consider such queries': do you mean we only consider such queries? (b) 'we assume similar features': do you assume similar restrictions?
Being one of the publisher of one of the benchmarks, I am also embarrassed if you found mistakes in them. However, I would have appreciated being contacted about that (if you did this and we did not answer, then the mistake is on us, that may happen). The main reason for me, is that this could benefit everyone (to agree to have them corrected or any other useful action). The main reason for you, is that it could only make your paper stronger if you find a mistake and have it acknowledged by the benchmark designers before submitting. That is the same when adding part of a benchmark, you may have a case, but if you are alone, it is weaker. Actually, it is clear that there is a mistake in rdfs21, but also in rdfs22, for instance. And this may have impact on the results.
Details:
The notations are not very elegant, in particular the choice of using monospace fonts (tt) for symbols which are not in SPARQL syntax (like IBLe or gp_1) or the use of the ':=' symbol in math. Definition 3.5 the '\mu\modelsp R := curly brace' is not needed (this is a duplicate of the sentence before), it would just be useful to have a curly brace for the three first line for R is E_1=E_2, the if in front of the 'not' should not be here.
 Definition 4.1 (arity): $\alpha\in\mathcal{P}_s and$ is useless (twice).
 The use of $\mathtt{t}$ in $\sigma_{\mathtt{t}}$ (Definition 4.2) is not very judicious, especially that this is used in Definition 4.4 with $\mathtt{t}$ which is now a variable.
 Definition 4.4: \bot does not seem to have been introduced in the logic, moreover it seems like a 0arity predicate. (that's a detail).
 I would be worth indicating the timeout. OK now I see it at 20000ms. But it is not explicit: Table 2 only mentions #TO and the text mentions 'time'. It would be also good to justify why is it this value that has been retained.
(3) Main theorems
Clearly, Theorems 4.1's proof are missing. A theorem is a theorem only when it has been proved. If the proof has been published, then it is fine to refer to it. Otherwise, it should be given or this is not a theorem. This is already a first problem.
A second problem is that Theorem 4.1 seems apparently to cover only what is in Section 4, i.e. none of the extensions that make the value of the contribution, and Theorem 4.1bis what is covered in Section 45? This may be a problem with the structure of the paper (see 1).
Finally, the paper fully lacks any discussion about complexity/decidability. There should, at least, be a discussion on the benefits and drawbacks of choosing of firstorder logic. Ideally, there should be a discussion of the complexity/decidability of the problem and its encoding in FOL.
(4) Language
As an example, there is a constant lack of 'the' article where it is needed.
'The number of datasets', 'verify the equivalence', 'the SPARQL language', 'the query containment problem', 'the semantic web', 'The grammar presented...', 'the select clause', 'the function x', 'the y operator', etc.
Other issues are given below, but this is not extensive.
 'a distributed environments'
 'D that figures in' I do not think that 'figure' can be used in that sense (it is also unclear).
 'the domains of $m$ and $\bar{x}$'
 In that case when' > 'When' (twice)
 Definition 4.4: 'if the argument is a'
 'standarad'
 subsumtion (Table 1)
 stateoftheart with dashes when used as an adjective
