Review Comment:
# Overall evaluation
The paper introduces so-called RDF Surfaces, which are an RDF serialization Peirce's (Beta) existential graphs.
These graphs can be viewed as an alternative syntax for First-Order-Logic (FOL), using only conjunction, negation and existential quantification.
The proposal focuses on FOL with binary predicates (or maybe up to binary?)
I find the idea interesting.
The serialization format uses blank nodes for (first-order) variables, and nested graphs to indicate the scope of negations.
The format has been designed in such a way that variable declarations are distinguished (using N3S lists) from their occurrences.
Besides, free occurrences of blank nodes are treated as existentially quantified.
An interesting benefit of this latter choice (and strength of the proposal overall) is a natural integration with the interpretation of plain RDF, where blank nodes are already existentially quantified (however, I am not an expert of the different semantics of RDF, some I may be missing some subtlety here).
Overall, RDF Surfaces look like a reasonable way to encode FOL formulas in RDF, with a very limited amount of extra syntax or special vocabulary (i.e. dedicated IRIs), as opposed for instance to the verbose serializations available for various fragments of OWL.
Overall, I am convinced that this format may be useful to share machine-readable FO formulas while complying to Semantic Web standards.
I am less convinced about readability (for a human being).
The reason is that the usual syntactic sugar (universal quantification, disjunction and implication) available for FOL is missing.
In other words, like Peirce's existential graphs, this serialization format heavily relies on nested negations, why are notoriously hard to parse (especially in combination with quantification).
This is exemplified in the paper itself, by a few mistakes made by the authors when interpreting their examples.
However, I am also aware that Peirce's graphs have their proponents, and may be more readable that FOL for simple formulas (with no or limited quantifier alternation).
Besides, a mostly machine-readable serialization format is probably sufficient for many applications.
So I would be curious to see this proposal published and further discussed within the Semantic Web community.
My main concerns do not pertain to the proposal itself, but to the presentation:
- The syntax and (most importantly) the semantics of the proposed language are nowhere defined.
Instead, some long and half-informal paraphrases are provided, which are often redundant.
I suspect that the paper would significantly gain in clarity (and succinctness) by using instead standard model-theoretic semantics for FOL (over possibly infinite models).
Alternatively, the authors could use a proof-theoretic semantics, e.g. based on Peirces rules.
But this would require providing at least the full system for Beta graphs (a.k.a. FOL).
Currently, only the system for Alpha graphs (a.k.a. Boolean logic) is included in the paper.
- Simple things are often obfuscated (i.e. made artificially complex), although I do not think that this is on purpose.
RDF surfaces are relatively straightforward (which is by no means a criticism).
But they are introduced in mostly literary/philosophical terms.
This generates noise and ambiguities (inherent to natural language), which could be easily avoided with a formal definition.
Some illustrations are also unnecessarily complex, sometimes misleading (see below).
Another example of obfuscation is Pierce's calculus, which arguably serves no purpose in this paper (see below), but is repeatedly used to prove relatively straightforward logical implications.
- Part of the content feels out of place in a scientific paper (but would fit nicely in a tutorial or introduction to logic).
For instance, the article contains several long (and naive) illustrations of elementary boolean equivalences (such as De Morgan's laws), or of the fact that a logical formula can be paraphrased in several ways.
These are well-known, and often come down to common sense.
- The authors do not seem familiar with computational logic and automated reasoning, as evidenced by a relatively large number of approximations, incorrect claims, or unnecessarily convoluted examples or definitions.
I would suggest the following revision:
- Focus on the core of proposal, i.e. the definition of RDF Surfaces, and make it precise (syntax + semantics).
I suspect that this may by itself be an valuable contribution, sufficient for a publication.
I would also keep the two running examples as motivations.
- Drop all content related to reasoning (decidability, complexity) or inferences, and leave these considerations for future work.
Unless I missed them, there is currently no meaningful contribution in this domain, and too many related errors.
- Drop the content related to the (in-progress) implementation.
Currently (i.e. without a quantitative evaluation), this is a proof-of-concept implementation.
However, there are already mature systems available for the computational task at hand (satisfiability of a FO formula).
I would rather work on a dedicated article where the performances of the different systems are compared (together with a baseline).
In particular, it would be interesting to evaluate the system called Latar (briefly mentioned in the paper), and provide details about its implementation.
- Drop all approximative, visionary, emphatic and/or poetic claims about classical negation, full FOL, undecidability of human knowledge, etc.
If anything, these have a detrimental effect on the credibility of this work.
In particular, I do not think it is necessary to argue in favor of (or against, for what matters) full negation or FOL (a running example should be sufficient).
- Ask a colleague with more experience in computational logics to proof-check the paper (given the affiliation of some of the authors, this should not be difficult). I listed below some related issues, but cannot claim that this list is exhaustive.
# Suggestion, remarks, questions
## General
- I am confused by the angle taken in Section 1-3 to introduce RDF surfaces.
These 3 sections (mostly) focus on the need to express classical negation.
This is also reflected in the title of the paper.
This arguably misrepresents RDF surfaces, which can not only express negation, but also capture (a fragment of?) FOL.
- I would not spend so much space arguing in favor of an expressive representation language.
Especially with obscure arguments like "A true Semantic Web should encompass more than just information and logic that machines can process".
It is already well-known (especially in the SW community) that each knowledge representation language strikes a certain balance between expressivity and computational properties.
It is also widely accepted that several of these languages may coexist, each tailored to certain tasks.
- The purpose of Pierce's rules (R1 to R4) in this paper is unclear.
These rules are only used to demonstrate that some well-known (and rather intuitive) Boolean equivalences hold.
This only makes Sections 4 and 5 artificially harder to read.
Instead, it would be interesting to introduce the full system (for Beta graphs) in a dedicated paper, and discuss how it has been implements with the system called Latar (together with related optimization techniques).
## Introduction
- Page 2: "create statements about zero or more resources: universal quantification."
This is an arguably misleading way to explain universal quantification ("zero or more", as opposed to "one or more").
- Page 2: "Second, democratic and social reasons can be provided to desire classical negation on the Semantic Web."
I would stay away from this kind of vague arguments.
- Page 3: "However, this is unsafe in an open world".
I do not understand what is meant here.
- Page 4: "undecidable for finding all valid inferences from a knowledge base" and "finding all correct inferences".
The set of consequences of a knowledge base is in general infinite, so decidability does not make sense here.
I guess the problem that the authors refer to is not "finding all inferences", but deciding whether a formula is implied by a knowledge base.
- Page 4: "can be safely processed by machines in isolation".
Note sure what "in isolation" means here.
My blind guess is that this has to do with parallelization.
But I was not aware that this had an incidence on the design of OWL 2 DL.
Some clarification would help.
- Page 3: "A safe execution makes computational reasoning tasks tractable".
A task is either tractable or intractable.
At first sight, this has nothing to do with a specific execution ("safe" or not, whatever this means).
- Page 3: "OWL2 DL and its sublanguages need to abandon the semantics of RDF when expressing Web logic [...] Direct Semantics disregards the RDF nature of the OWL2 DL formulas in its semantics."
This is extremely vague (what exactly is "Web logic"?), and introduces more confusion that it helps.
I would either make these considerations about decidability and tractability more precise, or drop them completely (probably the latter).
- Page 4: "Billions of websites and Web applications exchange HTML, CSS, and JavaScript code in combination, fully Turing complete, thus providing a daily undecidable halting problem on the Web stack. Despite this, the Web has continued to evolve and thrive."
Unclear exactly what the argument is here.
Most programming languages (including JavaScript) are Turing complete.
This has nothing to do with the Web.
I would rather make this more precise, or drop this argument (probably the latter).
- Page 4: "We do not assume that a single machine will solve the undecidable problem".
Unclear why "single" machine, and what "the undecidable problem" refers to.
- Page 4: "one of these features can be dropped to turn an undecidable problem into a decidable one where machines can assist humans in decision-making."
I would drop this as well (does not make any sense).
- Page 4: "a Semantic Web that is closer to enabling anyone to express any statement about any topic requires [...] providing the full expressivity of FOL".
Unclear to me why FOL only, and not modal logics, higher-order logics, etc.
- Page 4: "the full expressivity of FOL (including universal quantification)"
No need to specify "including universal quantification" (this is already part of FOL).
## Section 3
- Table 4: unclear what "logic" means in "RDF data & logic"
## Section 4
- Page 10: ("true" in the relative sense, not an absolute truth)
I do not understand what is meant here.
- Page 12.
Equation 2 (and the explanation that precedes) seems unnecessary.
- Page 13.
I doubt it is necessary to explain that disjunction and implication can be encoded with a combination of conjunction and negation.
In particular, Figure 5 seems unnecessary.
It is folklore in computer science that the set {NOT, AND} is functionally complete (i.e. can simulate all boolean operators).
See for instance this Wikipedia page:
https://en.wikipedia.org/wiki/Functional_completeness
- Page 13:
"denotes that some identity exists in the Beta system", "These heavy dots can be extended into a 'line of identity'" and "denote the same identity".
I am confused by the term "identity" here.
Is this maybe some philosophical notion?
In the first and third occurrences, it seems like "identity" is used in place of "entity".
In the second occurrence, it looks like "line of identity" simply stands for "dotted line".
If this is correct, then I would simply use "dotted line" or "line".
- Page 13: "indexed heavy dots".
The term "indexed" is most commonly used to refer to indexing with natural number (or more generally with elements from a countable set).
I suspect that "labelled" would be more standard here.
- Page 13: "(possibly indexed)" heavy dots.
Unclear why "possibly": in the examples, all dots have labels (even when the formula has a single variable).
- Page 13:
Unclear why a dot is called a "heavy dot" (there is apparently no other type of dot).
- Page 14:
Again, showing that universal quantification can be encoded with negation and existential quantification seems unnecessary (this is well-known).
- Page 14: "A heavy dot on a surface with parity 1 should be interpreted as a negated existentially quantified variable".
There is no such thing as a negated variable in FOL: what is negated here is a statement (of the form "\exists x \phi(x)").
I would simply drop this sentence (it is already clear from what precedes that nested surfaces encode negation).
- Page 14: "For the Beta system, diagram rules can be provided".
The most interesting aspect of these rules is omitted: they form a complete calculus for FOL.
More importantly, these rules are not provided in the paper, whereas the ones for boolean logics are.
This is an arguably questionable choice, given that the paper is about FOL, not Boolean logic.
I would either:
- provide the whole system, and use it, or
- completely omit this diagram rules.
Given the current content of the paper, I feel like the second solution is more appropriate.
- Page 14: "The diagram notation A...F should be interpreted as: $\exists x A(x) \land F(x)$".
Unclear why: there is no explicit variable in this diagram, whereas there is one in Figure 6.(a).
- Figure 8:
Writing down the FOL formulas would really help here.
My blind guess is that there is an additional implicit variable not present in the diagram (for some reason).
If this is correct (not sure), then the formula for Figure 8.a would be
$\exists z: (
A(z) \land F(z) \land \neg \exists x:
F(x) \land \neg \exists y
K(y,x)
)$
But this would be an arguably strange formula, where "z" is unrelated to the two other variables.
## Section 5
- Page 16: "Using the diagram rules from Section 4.4, we can deduce...".
The diagram rules are arguably overkill here.
The formula that immediately precedes is of the form $A \land (A \to B)$, so a standard modus ponens (which is arguably more intuitive) would do.
More generally, I wonder if Pierces calculus needs to be introduced in this paper, given that the contribution pertains to knowledge representation, is not reasoning (let alone algorithms).
As rightfully observed (later in the paper) by the authors, any theorem prover could be used here.
- Definition 5.1.
I would clarify (immediately) what a "surface type" is (e.g. it can be either "positive" or "negative").
Currently, the only indication is that a surface type "translates to a surface interpretation" (whatever "translates" means here), but it unclear what a surface type is to start with.
More generally, the whole section would benefit for a more formal notation (the intended meaning is sometimes hard to reconstruct).
- Definition 5.1.
It looks like a "graffiti node" is just a variable (as witnessed by the terminology used in the paper e.g. "free" VS "bound").
The fact that these are serialized as blank nodes is probably not relevant at this stage of the paper.
If this is correct, then I would use the term "variable" instead "graffiti node", to avoid unnecessary confusions.
- Page 16:
"These graffiti nodes are scoped in H in such a way that if a deeper nested Hayes graph exists that contains graffiti nodes using the same label as a parent Hayes graph, this deeper nested Hayes graph creates a new logical scope for the graffiti node with that label."
If I read this correctly (not sure), this is a direct consequence of Definition 5.2.
If so, then I would drop this (unnecessarily convoluted) sentence.
- Page 16:
"If a blank node in a RDF triple does not have such a coreference to a graffiti node, it is said to be free."
This is already part of Definition 5.3.
I would drop this sentence (redundant).
- Page 17:
"i.e. its content has the value 'true'" and "i.e. its content has the value 'false'".
This formulation is very approximative.
I would drop these explanations, and provide a formal semantics instead.
- Figure 10: "Every A that indexed B has some C that accredits B"
This paraphrase is arguably misleading:
- B is not quantified in this sentence, and
- the relation "has some" (between A and C) dose not appear in the diagram.
A more accurate paraphrase could be
"Every indexed B is accredited".
Or if we really want to mention all variables:
"If some A indexed some B, then B is accredited by some C".
- Figure 10:
The triple `:WOS :indexed :JournalA` seems unnecessary in this example (I would keep examples minimal to avoid noise).
- Page 17: "All the nested surfaces are negative, i.e. its [sic] content has the value 'false'.
This is arguably confusing.
According to this claim, in the boolean formula $\neg\neg A$, the subformula $A$ "has the value 'false'", even though the whole formula is equivalent to $A$.
Similarly, in Figure 10, the innermost surface (with parity 2), would "have the value 'false'".
Again, I feel like the issue comes from using natural language (which is inherently ambiguous) to describe the semantics of RDF surfaces.
In this example, "has the value 'false'" is simply too vague.
- Page 17: "i.e. either in the Hayes graph of the positive surface or in that of a nested Hayes graph".
This is unnecessary, and makes the sentence hard to parse.
I would drop it.
- Page 17:
"To achieve this, we create an implicit existential closure by capturing these seemingly ’free’ blank nodes within RDF in the set of graffiti of the top-level, positive Hayes graph, ’pinning down’ their existential nature."
This is an unnecessarily complex/literary way to say that free variables are treated as existentially quantified.
In particular, in this sentence, "within RDF", "pinning down" and "existential nature" are unclear.
Again, a simple formal semantics would help define things in a succinct and non-ambiguous way.
- Page 17:
"as existential quantified variables that can be transformed into universal quantified variables".
This is approximative, and in my opinion unnecessarily confusing.
It is always possible in an FOL formula to "transform" and existentially quantified into a universally quantified one, and conversely, because
existential and universal quantification can be defined in terms of each other (by introducing negations).
- Page 17, Equation 7.
A more idiomatic translation for the second conjunct would be:
$\forall a,b: -> \exists c: $
(pushing the existential quantifier down the algebraic tree).
This is also a more straightforward translation of the surface of Figure 10.
- Page 17: "Applying the diagram rules of Section 4.4., ..."
Again, I would stay away from reasoning procedures in this paper.
This is a complex topic, and arguably out of scope.
There are already many well-studied complete calculi for FOL (and many implementations, as observed by the authors), so showing an application of Pierce's rules does not bring much (if anything, this would be material for an introductory course to computational logic).
Especially given the fact that the set of rules provided in the paper for Beta graphs is incomplete.
In this example, is is sufficient to say that the formula $\exists $ is a consequence of the above formula/surface, and maybe give a serialization of this formula as a surface (although I am not sure this is necessary).
- Page 17: "When multiple facts are available, the diagram rules from Section 4.4 can be repeatedly applied to make derivations, ..."
For the same reasons as above, I would skip this.
- Page 18: "We aimed to stay close to the Semantic Web ideal that everything can be expressed as triples".
This is probably be due to my limited knowledge of concrete RDF syntaxes, but I do not understand how the examples (in so-called "N3S" syntax) that are provided in the paper qualify as sets of triples.
More precisely, the syntax uses so-called N3 "formulae" (visually, curly braces), which stand for unnamed graphs.
- Page 18: "To entail the RDFS version, an extra formula can be introduced that explains ..."
If the scope of the proposal is to capture FOL (a.k.a. Pierce's Beta graphs), then I would stay away from this type of encoding.
The issue here is that the formula quantifies over (unary) predicates (the variables x, y and z in the formula).
Allowing this type of quantification may lead to capturing fragments of second-order logic.
- Page 18: "The usage of N3 graph terms is restricted so that ..."
A formal grammar would be more precise (and arguably easier to read) than this kind of complex sentences, which provide an incomplete description, and are inherently ambiguous.
- Page 18 and 19:
It looks like the word "on" is highlighted on purpose in two different ways (either quoted, or emphasized), with slightly different meanings (the former for triples, and the latter for graffiti nodes).
However, this conflicts with usual conventions in scientific papers, where a term is emphasized only once, when it it introduced (typically together with its definition).
If these two "on" have different meanings here, then I would different terms, for clarity.
For instance, the emphasized "on" (for the scope of a graffiti node) may be replaced with "local to".
- Page 20: "which graffiti nodes on lines 6 and 10 should bind?"
Unclear what it means for a graffiti node to "bind" (without object).
I guess the question here should be
"which graffiti nodes do the blank nodes on lines 8 and 11 refer to?"
It also looks like that the answer is fairly obvious.
Scoping here works like in FOL (among others).
So I wonder if this whole example (beginning of Section 5.2, up to Line 30) is useful.
Again, a formal semantics (currently missing) should be enough.
- Page 20: "in a much bigger world".
I do not understand what this means.
- Page 21: "These contradictions are explicitly available by the semantics of RDF Surfaces and are not hidden as in the current RDF model".
I do not understand what these "hidden" contradictions refer to.
More importantly, there is currently no semantics of RDF surfaces (only a half-informal description), so the meaning of "explicitly available" is unclear.
- Page 21:
"One could be tempted to create a negation to simulate NAF as in Listing 4. However, Listing 4 does not entail `:Surface :is :JournalLess`."
I do not understand what this example has to do with negation as failure.
The fact that `:Surface :is :JournalLesd` is not entailed can be explained by the implicit disjunction (or in the paper's terminology, by two surfaces with parity 2 in the same surface with parity 1).
Precisely, in FO, the formula can be simplified as:
```
BookABC a :Book \land
BookDEF a :Book \land
(
(\forall A: A a :Journal)
\lor
(:Surface :is :JournalLess)
)
```
Then plain boolean logic tells us that
$A \and (B \lor C)$
does not imply
$C$
I also find this example unnecessarily confusing, because:
1. The two triples in the outermost surface apparently serve no purpose.
2. the IRIs `:Journal` and `:JournalLess` suggest some implicit semantic relation between these terms, whereas they could be any class and constant respectively, unrelated to each other.
3. The IRI `:Surface` suggests a reserved term in the proposed language (like `log:inNegativeSurface`), whereas it could be any constant.
This is an illustration of what I meant by "obfuscation" earlier in this review.
I can see the value of such examples in other contexts (e.g. a course on semiotics).
But in a CS paper, I feel like it only increases the odds of a misunderstanding.
- Section 5.4.
I would drop this whole section.
It is already well-known that universal quantification, disjunction and implication can be encoded with existential quantification, conjunction and negation.
If anything, I feel like this section does a disservice to the proposal.
The reason is that it takes a sentence that can be naturally expressed in FOL, and shows how to serialize it in a less readable way.
- Page 22.
The initial FOL formula is artificially hard to parse.
A more idiomatic way to write FOL consists in pushing quantifiers down the algebraic tree (when possible).
This often yields formulas that are closer to natural language.
I this case:
$\forall x ⟨x a :JournalArticle⟩ \lor (⟨x a :Preprint⟩ \land \exists y ⟨y :reviewed x⟩)$
- Listing 5: "Any article is a journal article or a preprint, and some S reviewed it, or both".
This paraphrase is not correct (whereas the three other provided in this section seems correct).
- Page 20: "Listing 5 can be read in many ways..."
No need to provide these 3 paraphrases, this is elementary logic.
Again, this would be relevant in an introductory course, but is arguably out of place in a publication.
- Page 20: "that was also reviewed by someone or both".
The "or both" feels redundant here.
As far as I know, it is conventional (in CS or maths) to treat "or" as a non-exclusive or.
- Page 20: "RDF Surfaces makes the equivalence of all these readings explicit."
I do not understand what is meant here.
If anything, the RDF surface in this example makes the initial formula more difficult to read.
- Page 23: "Applying the diagram rules of Section 4.4 the following deduction is possible: ...".
Again, for the reasons mentioned above, I would drop all the content related to automatic reasoning.
- Page 23: "the deductive closure of Listing 5 contains the RDF triples".
Unclear what "deductive closure" precisely means here (in the absence of formal semantics).
- Listing 6: The encoding has four levels of nested negations, which is really hard to parse.
Two levels are sufficient here:
$\neg \exists x ( ⟨x a :Publication⟩ \land \neg ⟨x a :Journal⟩ \land \neg ⟨x a :Book⟩ )$
- Figure 12:
I did not check in detail this example, because:
- the full deductive system for beta graphs is not provided, and
- for the reasons mentioned above, I would recommend dropping this whole section from the paper.
## Section 6
- page 25: "Using four relatively simple rules, authors of RDF Surfaces can make simple derivations and check the consequences of surface logic by hand".
I think that this only holds for boolean logic, i.e. additional rules would be needed to "check the consequences of surface logic by hand".
- Page 25: "complex implications can be composed using a copy-and-paste approach with simpler constructs".
If I read it correctly (not sure), this sentence says that the language (or its semantics?) is compositional.
If this reading is correct, then this can safely be skipped (most formal languages have this property).
- Page 25: "We use Peirce’s diagram rules in this paper to convince the reader of the application of FOL using RDF Surfaces."
Unclear what "applying FOL" means (this is vague).
My understanding is that Peirce's rules are one (out of many) complete proof systems for FOL.
If RDF surfaces capture a fragment of FOL, then any of these proof systems could be used to show that an RDF surface is a consequence of another.
This is arguably a strength of RDF surfaces, as a potential W3C standard: well-understood techniques (with mature implementations) can be used out-of-the-box to reason with surfaces.
Instead, the paper (partially) introduces a relatively obscure calculus (with apparently no full implementation), which may turn off potential adopters of RDF surfaces.
I am not sure it is necessary to use Peirce's rules to "convince the reader" that one can reason with RDF surfaces.
I would stay away from reasoning and simply:
1. Provide a formal semantics for RDF surfaces,
2. Show that this formalism can capture FO formulas with conjunction, negation and existential quantification (and precise which fragment exactly: with/without equality, arity of predicates, etc.)
- Page 25: "Using four relatively simple rules, authors of RDF Surfaces can make simple derivations and check the consequences of surface logic by hand".
Again, according to what is said earlier in the paper, this is only correct for boolean logic (the complete set of rules for FOL/Beta graphs is apparently more involved, and not provided in the paper).
- Page 25: "four implementations specially targeted to RDF Surfaces."
Aside from Latar, unclear what "specially targeted" means here, i.e. whether algorithms are tailored to efficiently process FO formulas with certain shapes.
- Page 25:
I wonder if it is useful to introduce all four systems, given that no (quantitative) evaluation is provided, and that only EVE was used in the experiments.
In particular:
- There is no apparent difference between EYE and Retina, other than the underlying Prolog engine. Maybe these could instead can be presented as a unique approach (serialization into Prolog), with 3 Prolog engines currently supported (SWI, Trealla and Scryer).
- Without further details, it is unclear what Tension.js brings to the picture. If anything, this looks like a questionable engineering decision (performance-wise): implementing resolution in an imperative language (especially an interpreted one!), when dedicated (and mature) engines are already available.
- Page 25.
I am surprised overall by the choice made to develop (several!) end-to-end reasoning systems for RDF surfaces.
This is a daunting task (probably out-of-scope for this paper), and arguably premature, given that the formalism itself is not defined yet (RDF surfaces do not have a formal syntax or semantics).
More, the only reasoning task that is described in the paper (called "Proof by contradiction" and "Proof by negation", for some reason) does not seem tailored to RDF surfaces: it apparently comes down to (un)satisfiability of a FO formula.
For a simple proof-of-concept implementation, a modular and (much) simpler approach would be to:
- develop a (backend-independent) parser for RDF surfaces and,
- serialize the parsed surfaces into FOL formulas that can be fed to a prover.
Besides, such a parser could become a valuable resource by itself, allowing the development of alternative serializations.
- Page 25: "Full support of the RDF Surfaces syntax as presented in this paper".
Then I guess there is a grammar for RDF surfaces.
If so, then I would suggest incorporating it to the paper, either in the main body or in appendix.
- Page 25: "Disjunctions in the data, antecedent and consequent of an implementation."
I guess "implementation" is a typo (for "implication").
More importantly, I do not understand what this precisely means.
I have two hypotheses:
1. the system supports syntactic sugar (in this case disjunction) as input. But this would somehow contradict the design of RDF surfaces.
2. The implementation of the reasoning engine is only sound for a fragment of RDF surfaces, which includes this kind of "disjunction" (simulated via conjunction and negation). If this is the correct, then I would drop this whole list of "features", because it raises too many questions. E.g. what exactly does "in the data" mean, how robust is the system to nested disjunctions, etc.
- Page 25: "every implementation is expected to have computational limits due to the undecidable nature of the underlying logic. It is possible to create formulas that never halt. Creating formulas that will result in an incomplete answer is possible."
An algorithm may not may not terminate, but a "formula" by itself cannot "halt" (or "never halt").
More importantly, unclear why undecidability would allow "Creating formulas that will result in an incomplete answer".
According to the two querying "methods" presented Page 26, it is unclear what an "answer" is (everything is apparently reduced to satisfiability), let alone an "incomplete" one.
Besides (in theory at least), if the implementation is sound, then the system should output a correct answer (true or false) whenever it terminates.
So I would either drop these considerations, or make them more precise.
- Page 26: "there is no distinction between assertion and theory boxes"
This is presented as a drawback (e.g. of Description Logics), and conversely as an strength of RDF Surfaces.
I would definitely drop this observation.
I suspect a misunderstanding: the distinction between plain facts and other formulas is a feature, not an arbitrary constraint.
It allows efficient algorithms tailored for a variety of specific tasks (a good example would query answering in the presence of a background theory).
If RDF surfaces gain traction, a similar distinction (i.e. isolating "blank-node free RDF triples on the positive surface", using the terminology of the paper) is likely to become a feature as well.
- Page 26: "Proof by contradiction checks if a graph G is available in the knowledge base".
The meaning of "is available" and "knowledge base" is unclear here.
My guess is that "is available" means "is implied" (or "is a consequence of"), and that what is called a "Proof by contradiction" is the usual way to reduce entailment to satisfiability.
- Page 26: "Proof by negation checks if a graph $\neg G$ is available in the knowledge base by adding negated $\neg\negG \equiv G$ to the knowledge base and tests if this leads to a contradiction."
I am not aware of a proof technique called "proof by negation".
More importantly, there seems to be bug in the description of this method, but I am unable to reconstruct the intended meaning.
The formula $\neg\neg G \equiv G$ is a tautology, so adding it to a "knowledge base" $K$ (whatever it is) cannot affect the truth value of $K$.
Or maybe "negated $\neg\neg G \equiv G$" in this sentence means the negation of this tautology (not sure), i.e. $\neg\neg G \not\equiv G$.
But this formula is a contradiction, so adding it to $K$ will systematically result in a contradiction.
- Page 26: "A contradiction in EYE is implemented by an 'inference fuse'. The EYE reasoner will stop running when such an 'inference fuse' is detected..."
Unclear what it means to "implement" a "contradiction".
I also Googled "inference fuse", without success.
So I genuinely do not know what is meant here.
- Page 26: "an experimental new surface was added to EYE: the query surface log:onQuerySurface".
If I understand correctly (not sure), the purpose of this new "surface" is to allow the user to feed to the system a query as input, together with the data.
In other words, the two variables in this example should be considered free.
However, this raises too many questions.
In particular, if such a surface appears within a surface with parity 1, the query becomes domain-dependant (a.k.a. "unsafe", in formal database's terminology).
I would instead recommend keeping the query language and the knowledge representation language distinct.
For instance, in SPARQL, this query becomes:
```
SELECT ?s, ?o WHERE { ? :accredit ?o }
```
Alternatively, one may use a SPARQL construct query here.
- Page 26: "Web agents require an approach to address this challenge ...".
No concrete proposal is made here to "address these challenges", even though a variety of techniques have been developed to deal with contradictions (e.g. inconsistent query answering).
I would drop this whole paragraph (too vague).
## Section 7
- Page 26: "With the help of the results of Section 5 and Section 4":
There is no result in these sections.
- Page 26: "it is a preference for a particular researcher."
The researcher does not appear in the formula.
This may be confusing, especially given the fact that this researcher is called "Researcher X" earlier.
A simple solution here would be to replace the triple:
`x a :ResearcherPreference`
with
`:ResearcherX prefers x`
Same observation for "Department Y" (Page 29).
- Page 27:
"We made here use of the fact that $(A \to C) \land (B \to C) \equiv (A \lor B) \to C$ which can be easily derived using Peirce’s Alpha system."
No need to explain this in a research paper (this is elementary boolean logic).
More importantly, no need to apply Peirce’s Alpha system (let alone a full derivation!) to convince the reader that this equivalence holds.
- Page 29: "distributed using many ontologies"
I would avoid the term "ontology" (used informally here), because it raises questions about what is and what is not considered an ontology.
- Page 30 and Figure 13 + Page 32 and Figure 14.
No need to spend so much space explaining that if T and \phi are consistent but T \cup { \neg \phi } is not, then T implies \phi.
Again, this is material for an introduction to logic.
- Figures 13 and 14 (screenshots):
These examples illustrate a valid use case (what the authors call a "proof by contradiction").
The fact that the system outputs an error message in these cases is a questionable engineering decision (an error is meant to indicates a malfunction of a program).
I would simply drop these screenshots.
- Page 31: "In other words, the Peircian diagram that corresponds with these formulas has equal readings".
Again, no need for Peircian diagrams here, this is standard boolean logic.
- Page 32: "after copy and pasting the RDF Surfaces knowledge base".
Why "copy and pasting"?
The interface dos not support multiple files as input?
In any case, I would omit such details.
- Page 32: "The EYE query surface ..."
Again, I would simply use a SPARQL query here.
## Section 8
- Page 33: "provided an abstract and a concrete syntax".
The concrete syntax is only informally sketched.
As for the abstract one, I wonder what is meant here.
Is this the FOL formulas (with triples in place of atoms) used throughout the paper?
Or instead the diagrams?
- Page 34: "The expressivity and portability of FOL provide the advantage that we can state what we want to say instead of only stating what machines can process."
There is an arguably better candidate for this than FOL, namely natural language.
Besides, RDF surfaces are meant to be automatically processed (at least parsed).
I would simply drop this sentence.
- Page 34: "consistently providing complete (C) and exhaustive response".
I do not understand what is meant here by "exhaustive" (precisely, how it differs from "complete").
- Page 34: "Demanding these (P)+(C)+(H) attributes simultaneously is infeasible due to the undecidability of FOL and forms an iron triangle: at most, only two of these features can be selected for any implementation."
(C) + (H) should be sufficient for undecidability.
Unclear why (P) comes into play here.
- Page 34: "this will lead to a necessary fragmentation and decentralization of the Web where many Web agents must compete to find inconsistencies or find all derivations that can be made from a decentralized knowledge graph".
I would drop these kinds of vague visionary claims (this is borderline sci-fi).
- Footnote 35.
No need to explain what semi-decidable means (again, this is material for an introductory course on logics).
- Page 35: "When a derivation or inconsistency is found, verifying that the given solution can be derived from the knowledge graph is possible."
There is nothing to verify in this case: the inconsistency itself tells us that the entailment holds (provided that the implementation is sound).
As I wrote above above if $T$ and $\phi$ are consistent but $T \cup { \neg \phi }$ is not, then $T$ implies $\phi$.
- Page 35: "as Trakhtenbrot demonstrated".
An important clarification about the semantics of RDF surfaces is missing.
I assumed so far that the semantics was the one of FOL interpreted over possibly infinite structures (like the usual interpretation of Description Logics).
However, Trakhtenbrot's theorem is about FOL over finite structures (used for instance in database theory).
In this latter case, satisfiable formulas are recursively enumerable (but contradictions are not).
Again, prior to discussing computational properties of reasoning (let alone implementing something), I feel like one should clarify what the semantics of RDF surfaces precisely is.
- Page 35: "We regard RDF Surfaces as a low-level language [..] Higher level vocabularies are still required to create a compact serialization".
I do not understand what is meant here by "Higher-level vocabularies".
Are these defined using a different language?
Or are these still FO predicates, but more abstract?
Or maybe high-order predicates (like `rdfs:subClassOf`)?
An example would help.
- Page 35:
"Patel-Schneider’s argument".
I would add "Peter" and a reference.
- Page 35:
"We think we can 'wiggle out' self-referential statements ..."
Unclear what "wiggle out" means here.
More importantly, why "We think"?
My understanding was that the problem is already solved.
Again, a formal syntax and semantics for RDF surfaces would unequivocally clarify this point.
- Page 35: "But this does not absolve us from paradoxes because these are very hard to avoid in highly expressive languages".
This is too vague to be understood.
I would either drop this whole discussion about "paradoxes", or be more precise.
## Section 9
- Page 37: "Our contribution provides [...] a semantic with FOL expressivity ..."
This immediately contradicts Section 8.3, which states that there is no semantics yet for RDF surfaces.
- Page 38: "Database queries use the SQL language, which has FOL expressivity and can be solved in polynomial time."
A problem cannot be both undecidable and solvable (let alone in polynomial time).
Again these vague claims introduce more confusion than they help.
If anything, some distinctions need to be made here: first, SQL is interpreted over finite structures.
Second, "polynomial" here refers (I guess) to so-called "data complexity" of the query answering problem, i.e. assuming that the size of the input query is fixed.
# Typos
- Multiple occurrences.
"party" -> "parity"
- Definition 5.4.
"and every Hayes triple" -> "and such that every Hayes triple"
- Page 17:
"only one single" -> "a single" (redundant)
|