On the interplay between validation and inference in SHACL - an investigation on the Time Ontology

Tracking #: 3704-4918

Authors: 
Livio Robaldo
Sotiris Batsakis

Responsible editor: 
Katja Hose

Submission type: 
Full Paper
Abstract: 
This paper presents a novel framework to validate the Time Ontology (a.k.a. OWL Time, https://www.w3.org/TR/ owl-time), which is currently a W3C candidate recommendation draft for representing temporal data in the Semantic Web. The framework is based on SHACL shapes and SHACL-SPARQL rules. These are used together to invalidate knowledge graphs that OWL is unable to identify as such due to its lack of expressivity, specifically its lack of operators to compare and work with temporal data. Besides providing a useful tool to process temporal data encoded in RDF within applications, our research work also sheds some light on how using SHACL shapes and SHACL-SPARQL rules together, in order to capture the proper interplay between validation and inference on knowledge graphs. The SHACL shapes and the SHACL-SPARQL rules that define the proposed framework are freely available on the GitHub repository https://github.com/liviorobaldo/TimeOntologyInSHACL, together with Java programs and clear instructions to process them.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Maxime Jakubowski submitted on 21/Jul/2024
Suggestion:
Minor Revision
Review Comment:

The core of the paper is about validating RDF graphs that use the Time
Ontology. In principle, this could be done by writing SHACL
shapes. However, the authors note that writing such shapes is a
complicated task, given the semantics of the Time Ontology.

The interesting insight here is that with some simple reasoning, you
can already add enough information to validate most of the
properties.

As far as I understand, the inferences/validation that is done with
their SHACL shapes and rules is correct.

* OWL and SHACL for temporal data
In the Introduction, the authors state that OWL is inadequate for
representing temporal data because it lacks support for comparing
values of type xsd:dateTime. It may be interesting to discuss what it
would mean for OWL to support this, would this help your case? It
seems that you mostly want the "validation" part of SHACL, not
necessarily, that it supports xsd:dateTime values. This discussion
regarding OWL's support for xsd:dateTime values does not seem to be
that relevant for this work.

* Your view on SHACL rules
SHACL rules, as defined here (https://www.w3.org/TR/shacl-af/#rules)
simply serve a different purpose than the one necessary for your
use-case. Your claim that these should be executed "before"
validation, in general, does not seem supported by its original
definition. This paper simply uses the syntax of SHACL rules, but
disregards its original meaning.

It seems to me that you want "repeated SPARQL CONSTRUCT queries", and
there is no need to use SHACL rules for that. This is simply something
new, but interesting nonetheless. You would like to add more triples
in order to more easily validate the knowledge graph with
SHACL. However, the problem then becomes complex: especially the
interplay of the rules you add. While SHACL seems like a simple tool
to check integrity constraints on a data graph, it now validates an
extension of the data graph according to some rules.

It is also interesting to note that this idea of doing inference
before validation is also mentioned in the SHACL recommendation
Section 1.5, albeit about RDFS inferencing. Maybe you advocate for
different inferencing, but this should still be mentioned.

* SHACL-core and SHACL-SPARQL
Because this paper is about SHACL, I expected more SHACL-core
shapes. In this work, only shape (3) seems to be a SHACL-core
shape. All other shapes are SHACL-SPARQL shapes. This means that most
of the constraints you write are actually SPARQL queries that retrieve
the violations. This should be motivated and explained before you
start writing the shapes. The reason for this is clear to me:
SHACL-core is simply not powerful/expressive enough to write the
shapes you want.

The fact that your shapes are mostly SPARQL queries, also makes the
discussion about SHACL supporting comparisons between xsd:dateTime
values weak. You actually refer to SPARQL's capabilities, which SHACL
allows you to use through SHACL-SPARQL. In short, it is not really
about SHACL, but about SPARQL. Interestingly, SHACL-core does support
simple comparisons between xsd:dateTime values through the sh:lessThan
and related keywords (and to a lesser extent through the
sh:minExclusive and related keywords). However, these are not powerful
enough for your purposes.

Here are some specific comments relating to your shapes:

- Shape (3): this is, strictly speaking, not a syntactically correct
shape. The recommendation states that the object of "sh:property"
should be a well-formed property shape
(https://www.w3.org/TR/shacl/#PropertyConstraintComponent) and this
means that is needs to have a value for sh:path
(https://www.w3.org/TR/shacl/#dfn-property-shape). As I understand
your shape, you probably did not intend to use property shapes
here. Rather, you are defining a node shape that states that the
node must be of datatype xsd:dateTime. Therefore, I suggest this
shape should be written as:

[ a sh:NodeShape ;
sh:targetObjectsOf time:inXSDDateTime ;
sh:datatype xsd:dateTime ;
sh:message "..." ]

This shape states that every object of a triple where the predicate
is time:inXSDDateTime should be of datatype xsd:dateTime.

However, it is unclear to me whether shape (3) is what you intend it
to be. You write before: "...only instances of Instant can be
associated with xsd:dateTime values..." but this is not reflected in
the shape. The next sentence starts with "Therefore...", but I do
not see how that follows.
- Shape (4) can be written without the use of SHACL-SPARQL, using
simply the SHACL sh:pattern feature
(https://www.w3.org/TR/shacl/#PatternConstraintComponent). Rewriting
the shape in that way might be more readable. The sh:pattern feature
looks at the string representation of the RDF Term. So, I believe
this corresponds with your approach.
- Shape (5) is an interesting case and you should mention why simply
using maxCount is not good enough: the string literals of datetime
can be different but referring to the same instant in time. SHACL
sees it as two different literals and counts them as such. This
highlights shortcomings of SHACL and its usage of
datetime. Basically, the counting in SHACL is not "semantic" on the
datatypes. You can also see this when counting numbers: "1" is
semantically the same as "1.0", but SHACL counts them separately.
- Shape (7), the last one, is also an interesting limitation of
SHACL. In SHACL you can check whether all value reachable with
property A are "less than"
(https://www.w3.org/TR/shacl/#LessThanConstraintComponent) all
values reachable with property B, you cannot check this when A and B
both are path expressions. Same remark for Shape (12). Similar
situation for Shape (33): although the case with "time:before" can
possibly be captured (although not easily) with SHACL-core.
- Shape (13) is also an interesting limitation of the expressiveness
of SHACL-core. Specifically, checking for cycles is part of an
unofficial extension of SHACL-core: DASH. Specifically
dash:nonRecursive
(https://datashapes.org/constraints.html#NonRecursiveConstraintComponent)
- Shape (17) can be written in SHACL-core:

[ sh:targetSubjectsOf time:inside ;
sh:not [ sh:class time:Instant ] ;
sh:message "..." ] .

In summary, there needs to be some discussion about the limitation of
SHACL-core.

* SHACL-based version of the Time Ontology
This is listed as the first contribution in the Conclusion. However,
it is unclear to me how this compares with the "original" Time
Ontology. The SHACL-SPARQL + rules approach from this work is too
opaque to be informative about the Time Ontology. This is especially
true because of the extensive use of SPARQL to create shapes and
rules. It does, however, add to the overall functionality for graphs
using the Time Ontology, i.e., validation.

* Writing
The paper is written well in general. Some remarks:
- p2 l10 "... have noT been added ..."
- p3 l5 "... are not enough expressive ..."
- p6 l37-38 "disjunctions" do you mean "disjointness statements"?
- p8 l16-17 "... xsd:datatime IS the single..."
- p8 l17-18 "... recommendation defineS comparison ..."
- p8 l25-26 "Gregorian"
- p10 l21-22 should be Figure 3?
- p15 l18-19 "... are not enough expressive ..." strange wording
- p16 l1 ".. to recognize as invalid ..." strange wording
- p19 l23-24 "beings" -> "begins"

* Summary
Although I have a couple of specific concerns:
- Redefinition of SHACL rules
- Missing discussion of SHACL-core limitations
- Unclear claim about "SHACL-based version of the Time Ontology"
and others mentioned above, I believe this work presents valuable
ideas and clever insights on how to tackle validation in the context
of the Time Ontology.

Review #2
Anonymous submitted on 29/Jul/2024
Suggestion:
Major Revision
Review Comment:

Review OWL Time in SHACL

The manuscript claims two main contributions:

1. A collection of about 50 SHACL-SPARQL constraints and SHACL-SPARQL rules that are presented as “a new version of the Time Ontology based on SHACL”
2. Insight into the interplay of inference (using SHACL-SPARQL) and validation and suggestions how this should be required by the standards.

I think that the paper has a number of major problems that need to be addressed:

1. The statement of contributions is a bit haphazard and distributed. They do not become very clear in the introduction; then on page 15, we read that the “main finding” of the paper is that SPARQL property path operators are insufficient for some types of constraint checking. Finally, they are reiterated in the conclusion. It would have been preferable to state the contributions clearly in the beginning.
2. The authors claim to give a new formulation of the time ontology in SHACL. The SHACL rules and constraints are a method of inferring inconsistencies, and not an ontology that formalises the domain. This becomes very visible for instance in the technicalities around the formalisation of Allen’s algebra in SHACL. This is not an axiomatisation of the concept of time (as an ontology would be), but rather an attempt at casting a satisfiability checking algorithm into SHACL-SPARQL by means of reification etc.
3. The 50 SHACL constraints and rules effectively constitute a satisfiability checker, an automated reasoning system, a mechanism to systematically detect if a knowledge graph contradicts the intended semantics of the time ontology. Now the intended semantics is never formalised, although a mathematical model for the investigated part of the ontology would not be complicated. Accordingly, the soundness of the rules is only explained verbally. I don’t know if the authors intend to claim completeness of their rules, i.e. that any KG contradicting the intended semantics will be flagged by the rule system. At least for Allen’s interval algebra, I do not think that this is the case, see my next point. Computational complexity of the problem and the proposed rule system are not discussed. Some mechanics to avoid endless loops are implemented, but there is no discussion of the termination of the system as a whole.
In other words: almost all of the basic theoretical discussions we are accustomed to see for a theorem proving system are missing.
4. As far as I could understand, the rules for Allen’s interval algebra implement the propagation of relations. However, as Allen himself already discovered, this is not enough to detect all inconsistent sets of relations. This problem is in fact NP complete, see “Constraint Propagation Algorithm for Temporal Reasoning: A Revised Report” by Marc Vilain, Henry Kautz, and Peter van Beek (1990) for a discussion. I don’t think this exponential search is implemented in the proposed rules, so the method would have to be incomplete. In other words, the proposed set of 50 rules discovers some possible inconsistencies, but not all. Which ones? Are they a “good” subset?
5. The presentation of all rules using SHACL syntax makes them hard to read. Using a terser syntax would have allowed to present most of the rules on one page. Granted, the possibility of executing the rules in SHACL is interesting but for most of the rules (if triples x,y,z are present, add triples u,v,w) this is unsurprising. A special treatment of encoding in SHACL-SPARQL would be needed only for some rules like e.g. 38.
6. There are various statements in the manuscript to the effect that something cannot be achieved using SHACL or property paths or… E.g., on page 15 “There is no way to extend the SHCAL shapes … to recognize zigzag patterns…” Proofs of non-expressibility are notoriously difficult, but the manuscript gives merely a rough intuition.
7. The authors observe that the expressivity of OWL is not sufficient to formalise the concepts of the Time Ontology. This is correct, but what about the various proposed temporal extensions of OWL (the manuscript mentions relevant literature)? Wouldn’t a temporally extended ontology language be a more natural choice than a constraint language?
8. Concerning the interplay of inference and validation: I would say that the right way depends on the circumstances. Thinking of a set of SHACL shapes as describing the shape of a KG, I may be interested in expressing its shape before the application of any reasoning. E.g. to determine whether inference is necessary. Also the application of rules until saturation may not always be wanted. Depending on the set of rules, it may lead to non-termination. When it does terminate, checking for saturation (i.e. no new facts are derived) may be both costly and not desired.
In other words: the interaction described is necessary for the method presented in this paper to implement a satisfiability checker in SHACL-SPARQL. I am not convinced that this is a worthwhile exercise.
9. More useful, in my opinion, would be to enhance the time ontology with an intended semantics explaining how time intervals correspond to intervals of real numbers. This is a declarative definition like an ontology should be, although outside the expressivity of OWL. The question of how to prove things about this theory (using SHACL-SPARQL or more conventional reasoning mechanisms) can then be treated separately.

To summarise:
* Contribution 1 requires a more thorough formal treatment
* The findings of contribution 2 are relative to the task considered in the manuscript. For other uses of SHACL-SPARQL these recommendations may not be sensible.

Minor comments:

Throughout:

capitalise Section 1, Figure 2, Chapter 3, but lower case in “the following section” etc.

Footnotes after punctuation. “Time Ontology.\footnote{…}”

p2 l3: “finite state machines such as…” NuSMV is a model checker, not a finite state machine.

p2 l38 “temporal management”: should that term be explained? What does it mean?

p3 l5 “enough expressive” → “expressive enough”

p3 l15f: Much of the paper focusses on adding triples using rules and then checking for inconsistencies using constraints. SHACL is also used to ensure the presence of information, not only the absence of conflicts. How does that relate to your observations about the interplay between inference and constraint validation?

p3 l30: “Nor the available… execute” → “Nor do the available … execute”

p3 footnote 2 “extends” → “extend”

p4 l5 “focuses” → “focus”

p4 l9 “its aims”: it’s not really clear at this point what those aims are.

p5 l10 do you have any support for the claim that SWRL is not widely supported? https://en.wikipedia.org/wiki/Semantic_Web_Rule_Language#Implementations

p5 l11f: No single standard for temporal concepts is agreed on. But are the existing proposals compatible (just no agreement on vocabulary etc) or do they represent different conceptualisations of time?

p5 l12 “an” → “and”

p5 l 24 “results to” → “results in”

p5 l26 “RDFs” → “RDFS”

p5 l26 [38] is a handbook chapter. Please cite the recommendation for RDFS!

p5 l35–37 seems to contradict itself concerning what can and cannot be achieved

p5 l39 “ternary relations”: aren’t triples ternary?

p7 l7: “allows for the encoding of” → “allow encoding”

p7 l39 “it is easy to re-implement … OWL … as SHACL” An OWL reasoner needs to include case distinctions over different expansions as well as checking of blocking conditions to ensure termination. It may be possible to do this with SHACL but it’s definitely not “easy” and surely not efficient either.

p8 l16 “We repeat again” is a pleonasm (I also found that this piece of information is repeated quite a lot)

p8 19 “defined as a string” no, the lexical form of a xsd:dateTime literal is defined as a string…

p8 l25 “Gregoriang” → “Gregorian”

p8 l30 “in Greenwich” – UTC is not the timezone of Greenwich. The UK, including Greenwich, observes GMT in winter but British Summer Time in summer.

p8 l44 “just a mere” → “a mere”

p8 l45 “compulsory requires” – the adverb for compulsory is compulsorily. But “requires” already expresses compulsion. Also on page 12.

p9 l9 “pairwise clustered” – I don’t understand

p 9 l18–30 6 copies of the same sentence. Can be shortened.

Fig 2.–4 look like included pixelated screen shots. Should be vector graphics.

p9–11, 3 pages on Allen’s interval algebra seems too much. The algebra is well known and discussed in the literature. Is Table 1 needed? If someone wants to check your SHACL rules in detail, they could refer to a presentation of the composition operation in the literature.

p11 Table 1: If included, there should be a reference.

p12 l 29: “if the same instant … two or more xsd:dateTime … these values must be the same” – If they’re all the same, then they are not two or more values. “Each instant can only be associated with at most one xsd:dateTime…” Note that something like “If there are two inXSDDateTime-triples, then…” wouldn’t work either because if subject, predicate and object are the same then it is only one triple.

p19 l22 “deducted” → “deduced”

p19 l36 “the neither objects of” → “neither of the objects of”

p20 l25 “its more abstract classes” → “… superclasses”

p21 l6 “will be already discussed”: already is strange for a future discussion.

p21 l23 “rules in (10) infers” → “… infer”

p40 l35 “reported” → “reproduced”?

p40 l48 “the the” → “the”

References

[15] ends with “year = 2013“

Review #3
By Jose Emilio Labra Gayo submitted on 09/Aug/2024
Suggestion:
Major Revision
Review Comment:

The paper describes an approach to apply reasoning about time based on the Time Ontology by leveraging on SHACL rules.

Originality: As far as I know, the work presented in the paper about applying SHACL rules to the time ontology is original. Nevertheless, the idea on which it is based, which is doing inference using SHACL rules, is not so original and has already been proposed in several posts in the SHACL community like this one from 2017: https://spinrdf.org/shacl-and-owl.html At the same time, reasoning about temporal intervals like the ones defined by Allen and the time ontology is also not a new idea and there are already several works that have attempted it.

Significance of the results: The authors indicate two main contributions in the conclusions:

1. A new SHACL-based version of the Time Ontology: The authors indicate that the OWL representation of the time ontology is not satisfactory because it lacks operators for temporal data. However, it is not clear if this is something that could be added to OWL or if it is something that OWL can not handle because there are limitations on the underlying description logic. I think that it is something that could be added which would enable OWL reasoners to handle it, but maybe it isn’t. If it can be added, then the argument of the authors should probably be rewritten, if it can’t, maybe they should indicate why not.
A significant part of the paper is devoted to presenting a translation of Allen’s temporal relations to SHACL-SPARQL rules. The resulting rules are quite verbose and involve a lot of details from SPARQL and SHACl syntax. I think a more concise syntax could be defined for the rules which could be translated to the SHACL-SPARQL code avoiding the extra cognitive load of combining embedding SPARQL code in SHACL’s turtle syntax which make it harder than necessary to understand the paper. As an example, rule 18 is defined as:

[rdf:type sh:NodeShape;
sh:targetSubjectsOf time:inside;
sh:rule[rdf:type sh:SPARQLRule; sh:prefixes ... ;
sh:construct """CONSTRUCT{?b time:before ?i}
WHERE{ $this time:hasBeginning ?b. $this time:inside ?i}"""]].

Using a Datalog-like compact syntax, that rule could be rewritten as:

before(b,i) :- hasBeginning(x,b), inside(x,i), inside(x,_) .

Which would make it easier to reason about the rule…and the main question would be about what are the new semantics of those SHACL rules that are different from other logic programming systems…which could be the second contribution of the paper.

2. Novel insights about the interplay between validation and inference in SHACL: The authors describe how the SHACL rules can be used for temporal inference and explain the behaviour of the system that they have implemented based on SHACL rules. From my point of view, although I consider that the work is interesting, I think there is a need for an extra step in defining a formal semantics of the SHACL rules and indicating what is the difference between that semantics and the semantics of other rule-based systems. That would be a more significant contribution of the paper and it would help to raise some light on it.

The authors already indicate that the SHACL rules proposal needs to be changed so instead of checking the shapes and then doing inference as proposed in the SHACL rules working draft, the authors propose to do the SHACL-rules inference before and them apply the SHACL shapes to check the constraints. The authors acknowledge that this approach could generate infinite loops if the authors of the rules don’t take enough care when they write the rules but I think it may be relevant to understand the semantics of the rule based system that is obtained from this combination of SHACL rules. At the end, it is not clear for me what is the difference between defining the rules using SHACL-SPARQL syntax and defining the rules with SWRL, RIF or any other rule-based system that could be applied. I think that if the paper wants to provide a significant contribution for it, it must be rewritten to take into describe what is the semantics of the new system based on SHACL rules and what are the differences between that system and other rule based systems that have already been proposed in the past. Is it just a syntactic difference or is there a semantic difference?

In that sense, the authors compare their approach with OWL and they indicate that “reasoning over large knowledge graphs with temporal information cannot be achieved using OWL reasoners” (page 5, line 36), I think that information is too strong and I would like to know why the authors state that. Is it because of the complexity of the reasoners? Or is there some fundamental problem in description logics or in the Open World Assumption that make it unable to be applied for temporal reasoning? Or is it possible to add something to OWL to make it able to be applied for temporal reasoning? I wonder if the authors could provide more insights to answer those questions, specially as the second author has published the paper: “Temporal representation and reasoning in OWL 2”, I think it would make sense if the authors gave a more in depth comparison between the pros and cons of OWL vs SHACL-rules which is not just about some practical problems of existing implementations and focuses more about the underlying semantics of both formalisms.

Quality of writing: The paper is well written and it has a lot of nice pictures and examples that help understand the different aspects of the time ontology. nevertheless, the authors chose to present the SHACL-rules using the turtle syntax that embeds the SPARQL queries as strings, which make those rules very verbose and take a lot of space in the paper. Although I appreciate that the paper is based on a real implementation with its github repository which includes those rules, I think the paper could be more readable if the authors used a more concise syntax for the rules and for SHACL following, for example, an abstract syntax like the one employed in the paper Semantics and Validation of recursive SHACL, https://doi.org/10.1007/978-3-030-00671-6_19 which could help understand the semantics of the SHACL fragment employed by the authors. In fact, I think those rules could just be presented more concisely with a logic programming syntax, which would enable to present a formal semantics of the rules and the implications of iteratively applying those rules which can create infinite loops. I have the impression that the language obtained by the iterative application of those rules is not different from traditional rule based languages like SWRL. The main comparison between the work presented in the paper and SWRL is just a like saying that “SWRL has been used for representing temporal reasoning rules but corresponding approaches were not efficient.” I think the authors should do a more in-depth comparison about why SWRL was not efficient and their approach is more efficient…is it because they define less rules in their system or is there some fundamental change in the SHACL-rules based approach from SWRL?

Availability of the resources: The paper points to an implementation available in a public github repository: https://github.com/liviorobaldo/TimeOntologyInSHACL. The repository contains both the rules, the data and the Java code that can be used to obtain the results. I was able to run the code and I appreciate the efforts of the authors to make this resource available.

Some more detailed comments:

Page 2. Line 45: “another has been ...” should be “another has been ” (W3C doesn’t publish standards, it publishes recommendations)

Page 4. Line 5, “Will then on describing the Time ontology…”

Page 4, line 43. “Many properties of are dynamic…”

Page 5, line 10, the authors seem to indicate that a problem of SOWL and others is that they are based on SWRL which is not a widely supported W3C recommendation for rule definition…but SHACL-rules is also not a W3C recommendation, SHACL-rules is a working group note which is also not a recommendation.

Page 5, line 13. “Common vocabularies an definitions…”

Page 5, line 21: “the temporal ...”

Page 5, line 26, “the semantic web ...”

Page 5, line 37, “cannot be achieved using OWL reasoners” that statement is too strong and I would like the authors to provide more details about what are the real problems of OWL to achieve it.

Page 5, line 49, “with specific ontology” -> “with a specific ontology or “with specific ontologies…”

Page 6, line 9, “SWRL has been used to representing temporal reasoning but corresponding approaches were not efficient” requires more details about why those approaches were not efficient and why SHACL rules is more efficient…in fact, I suggest the authors to provide a more in depth comparison between their approach and SWRL or other rule-based approaches.

Page 8, line 16, “which associate instances…”

Page 8, line 17, “we repeat again that …xsd:dateTime the single…”, if the sentence is a repetition, could it be removed?

Page 10, line 13, refers to figure 4, but I think it should refer to figure 3, and line 21, the same. The captions of figure 3 and figure 4 are the same, I think the caption of figure 4 is wrong.

Page 11 line 50, “this is due to the limitation of OWL…does not include constructs to compare and work with temporal data”, I wonder if it is a real limitation of OWL as those constructs could be included, maybe in a new version of OWL or if there is some fundamental problem of OWL. I also wonder if those definitions could be encoded as intervals and do the reasoning at a lower level comparing the decimal values of the start and end time of those intervals.

Page 16, line 45, “also cover the property after

Page 19, line 24, “an instant and ends of itself…”

Page 22, line 48, “property inside in a chain of properties…”