Review Comment:
The article proposes an approach for context-aware reasoning over
Wikidata, where the context is given by qualifiers on Wikidata
statements. The approach groups qualifiers into several different
categories and consideres a many-sorted first-order logic setting,
where each qualifier category corresponds to a sort and is accompanied
by a background theory, specified in CASL. Built on this, the article
presents several inference rules over qualified statements. Finally, a
possible implementation is outlined where rules are compiled to SPARQL
construct queries, with the background theory implemented as
Javascript functions in the triple store.
I am not particularly happy with the choice to base the approach on
the RDF dumps of Wikidata, rather than on the Wikibase data model
itself. This gives rise to some semantic inaccuracies, such as
calling, e.g., “prov:wasDerivedFrom”, “wikibase:rank”, or “rdf:type”
qualifiers (they are not, but rather encode references (which can,
indeed, have their own qualifiers!), statement ranks, and a variety of
other things in the RDF representation). Furthermore, it excludes,
e.g., the special “no value” value (used, e.g., in conjunction with a
“follows” qualifier to signal the start of a sequence) from some of
the inference rules, as these don't correspond to nodes in the RDF
dump, but rather lead to “stmt a wdno:P…” triples. This RDF influence
is also reflected in Definition 1, where the first three sorts of the
single predicate symbol are all “resource”. However, resource
explicitly includes only Wikidata items (by virtue of mentioning
Q16222597) and “literal denotations” occurring as “subject, object, or
qualifier value in a statement”. In particular, this does not include
other Wikidata entities, such as properties (yet any statement in
Wikidata always has a property in the predicate position, and
properties can also appear as subjects or objects), but also excludes,
e.g., IRIs as objects, while still allowing for, e.g., literals as
subjects (which is neither allowed in Wikidata statements nor in RDF
triples).
Some of the proposed inference rules don't seem to be well-typed
(although I might be misreading the CASL specification, but in that
case clearly some explanation in the article would be required), e.g.,
the marriage and death rule passes the date (of sort resource) to the
interval function, which takes either a instantTime or a duration as
its second argument. Another example are the sequence rules, which
call startTime/endTime (expecting sort timeInterval) on values
returned from extractTime (returning sort time).
Other inference rules don't seem to be particularly useful:
“equivalent property” and “equivalent class” are used on Wikidata
exclusively for alignment with external Ontologies, i.e., they should
(barring modelling errors) never have Wikidata entities as
objects. Thus, e.g., the second “equivalent property” rule will only
ever match if the conclusion is already present in Wikidata, wheres
the first and third rule cannot infer a legal statement.
Yet other inference rules lack explanations for the design choices
made: E.g., why does the “subclass of” rule check that the validity
contexts intersect, but the “subproperty of” rule does not? More
generally, a lot of the material in the article is presented as-is,
without any discussion of the choices and possible trade-offs made. As
another example, the authors mention that the “country” qualifier is
only sometimes used to signal validity, which might suggest that
assigning each qualifier to a single sort is too restrictive (and it
is not immediately obvious to me why having a single sort for all
qualifiers could not work).
I am also wondering what exactly can (and cannot) be expressed in the
proposed formalism. For example, consider the sequence rules and the
Obama example: Suppose we have a statement saying George W. Bush is
replaced by Barrack Obama (with series ordinal 43), and a statement
saying that Donald Trump replaced Barrack Obama (with series ordinal
45), could we have a rule inferring the statement from Figure 1?
Furthermore, would it be possible to _only_ infer this statement, and
not also two statements with just “replaces” and “replaced by”,
respectively (or even just not infer these two statements if the one
shown is already present)?
Some of the code, data files, and statistics are provided, but are not
explicitly mentioned as the long-term link to resources. It seems that
the data is hosted with the authors' university, but it is non-obvious
whether this is suitable for long-term storage. While, e.g., the
results of compiling some of the rules to SPARQL construct queries
(and the compiler itself) are included, I have not been able to find
the rules themselves, and the syntax used is not documented (judging
from the grammar rules in the compiler, it is definitely not the
FOL-based syntax used throughout the article). I would also have
expected to find the CASL specification among the data (although it is
also printed in the appendix, having it available as a file would
simplify its use), but have not managed to find it.
I have also not been able to figure out how the provided
ModuleFunctions correspond to the CASL specification: some of the
functions specified are not present (e.g., timeValidity), others (such
as endTime) don't seem to match the specified behaviour on undefined
input. Unfortunately, the article contains no details on this.
The writing is easy to follow, though I have found some issues (see
below for a – possibly still incomplete – list). Two things that
should be standardised, however, are (i) spaces/no spaces in front of
footnote markers (compare, e.g., footnotes 1 and 2) and (ii) the
format used for dates (e.g., 9 June 1732, 25/11/1991,
16-02-2022). Also kindly note that the correct capitalisation is
“Wikidata”, not “wikidata”.
While reasoning about qualifiers is a very interesting topic, I cannot
recommend the article in its current form for acceptance. I can
imagine several distinct directions that could eventually lead to a
nice article: A comprehensive exploration of what kind of inference
rules can and cannot be expressed in the multi-sorted formalism is
certainly one such direction. A rigorous, formal specification of the
Wikibase data model and a well-designed background theory might be
interesting in their own right. Alternatively, a working pipeline as
outlined in the “Implementation” section, where I could load a
Wikidata dump, write some inference rules, and compute the inferred
statements would be a welcome contribution.
p1, l41: drop the space before “)”
p1, l44: “Where” doesn't begin a new sentence here, so should not be capitalised.
p2, l32: “If it is relatively” ~> “While it is relatively”
p3, l12: “intersection of statement” ~> “intersection of statements”
p3, l38: “Georges” ~> “George”
p3, l47: Note that properties with a property usage example always occur in a qualifier position (as this is how usage examples are modelled), but that does not make them qualifiers (indeed, usage as qualifiers might be forbidden by a property scope constraint).
p4, l9: “statement ;” ~> “statement;”
p4, l31: “2008” ~> “2009”
p4, l50: “are :” ~> “are:”
p6, l6: “Divorce being the value of the end cause qualifier” is an incomplete sentence
p6, l32: “constraint(P2302)” ~> “constraint (P2302)”
p7, l1: “an knowledge” ~> “a knowledge”
p7, l9/l13: this should be the same predicate symbol in both places
p7, l26: “)[” ~> “) [”
p8, l37: “provenances is” ~> “provenances are”
p10, l1: “Prominance” ~> “Prominence”
p10, l38: “level( i.e.” ~> “level (i.e.,”
p10, l42: “value of sorts” ~> “values of sorts”
p10, l43: “rules takes” ~> “rules take”
p10, l44: “atoms ψ contains” ~> “atoms and ψ is”
p11, l10: “qualifiers categories” ~> “qualifier categories”
p11, l47: “P_2) .” ~> “P_2).”
p12, l5: “describe more” ~> “further describe”
p12, l23: “dismiss also” ~> “also dismiss”
p12, l31: “Inspired from” ~> “Inspired by”
p13, l2: “qualifier” ~> “qualifier value”
p13, l9: “For example,” is an incomplete sentence
p13, l24: “dateofdeath” ~> “date of death”
p14, l39: the triples need spaces between the components
p15, l26: “lattices structures” ~> “lattice structures”
p16, l36: “subproperty property” ~> “subproperty of”?
p17, l25: “Type constraint” ~> “Subject type constraint”
p17, l50: “contaisn” ~> “contains”
|