Using logical constraints to validate information in collaborative knowledge graphs: a study of COVID-19 on Wikidata

Tracking #: 2677-3891

Houcemeddine Turki
Dariusz Jemielniak
Mohamed Ali Hadj Taieb
Jose Emilio Labra Gayo
Mohamed Ben Aouicha
Mus'ab Banat
Thomas Shafee
Eric Prud'hommeaux
Tiago Lubiana
Diptanshu Das
Daniel Mietchen

Responsible editor: 
Aidan Hogan

Submission type: 
Full Paper
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides a forum for exchanging structured data. In this research paper, we catalog the rules describing relational and statistical COVID-19 epidemiological data and implement them in SPARQL, a query language for semantic databases. We demonstrate the efficiency of our methods to evaluate structured information, particularly COVID-19 knowledge in Wikidata, and consequently in collaborative ontologies and knowledge graphs, and we show the advantages and drawbacks of our proposed approach by comparing it to other methods for validation of linked web data.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Julien Corman submitted on 25/Feb/2021
Minor Revision
Review Comment:

The paper reports experiments run over the COVID19-related content of Wikidata, in order to identify data quality issues (i.e. incorrect or missing triples).

The main contribution is the experiments reported in Sections 5 and 6.
Section 5 identifies data quality issues on a semantic basis, leveraging the so-called "statements" of Wikidata.
A variety of heuristics are applied (leveraging for instance frequent classes of subjects/objects of a given property, or inverse property statements) in order to identify either triples that are likely to be incorrect, or possibly missing information.
In contrast, Section 6 identifies numerical assertions about the pandemic (raw statistics, rates, etc) that are likely to be incorrect, based on external epidemiological knowledge.

In both case, the missing and/or possibly incorrect triples are identified via SPARQL queries designed for the occasion, executed over the Wikidata endpoint.

I am not an expert, but I suspect these experiments may be interesting to the SW community, since large collaborative knowledge graphs like Wikidata are notorious for their inconsistencies.
I found Section 6 particularly interesting, in the way additional knowledge (about the pandemic) is exploited to identify data quality issues.

On the other hand, I found the introduction of these experiments quite confusing.
A large portion of what precedes (i.e. up to page 11) is only loosely related.
In other words, an important amount of contextual information (sometimes with exhaustive tables) is provided, which is not used in the paper.
In particular:
- about COVID 19,
- about Wikipedia: list of prefixes, moderation system, list of buttons of the SPARQL GUI interface, etc.
- about SPARQL: list of clause and aggregate functions, syntax of filtering conditions, etc.
- about ShEx: syntax, format of identifiers, etc. (unless I missed it, no actual use of ShEx was made in the reported experiments)
- about validation techniques that could in theory be implemented, but fall out out of the scope of the paper.

At times, it seems like the article was designed as a generic introduction to a series of semantic web standards/resources.
I doubt this is needed, especially in the SWJ.
For instance, there are already good introductions to SPARQL and ShEx available elsewhere

This has the effect of diluting the contribution, and requires the reader to manually filter out what is unnecessary.
I would suggest retaining only what is useful (for instance, in Table 1, list only the Wikipedia constraints that have been exploited), and move the rest to the appendix.

I also feel like some simple definitions are missing in order to understand the experiments reported in Section 5 (Section 6 in comparison is is very clear).
One needs to go through the (useful) examples provided later on in order to reconstruct what "use case", "supported statements", "scheme" or "logical constraints" mean for instance.
And even then, there is a lot implicit.
As a sonsequence, it is sometimes difficult to interpret the results.
For instance, from the examples that are provided, I suspect that Task T2 identifies missing statements of the form:
p_1 "inverse property" p_2
on a statistical basis, whereas task T3 identifies missing statements of the form:
o p_1 s,
when both:
s p_2 o,
p_1 "inverse property" p_2
are present in the knowledge graph.
But this is only blind guess, because this is not made explicit in the paper.

Overall, the ambiguity could be significantly reduced by adopting a slightly more formal notation, and adding a short section with preliminary definitions.

Similarly, since the approach is based on so-called "logical" constraints, it would be useful to clarify (even informally) which semantics the authors adopted.

Among others:
- the article refers to "statements" of the form (C_S, P, C_O), where C_S, and C_O are classes, and P is a property.
But it is unclear what these statements are: they do not seem to appear in Wikidata, neither in the Semantic Web standards (RDF/RDFS/OWL) that apparently inspired the Wikidata "statements" (unless P is a meta property like "subClassOf").
It is also unclear how the authors interpret such statements (e.g. it could be a combination of domain + range, or instead OWL-like qualified range restrictions, etc.).
- "inverse property" statements are treated as constraints, which departs from their traditional interpretation in Semantic Web standards (also surprisingly, "equivalent property" statements did not get the same treatment).
- the assumption is implicitly made (but sometimes only) that an item has at most one class (or maybe several via the transitive closure of "subClassOf"?).

All these choices are interesting.
I have no doubt that they are well motivated, and they may be meaningful contribution on their own.
But because some of them are unconventional (at least according to SW standards), it would be useful to make them explicit.

### Suggestions

- Page 1:
The first paragraph mostly consists of unnecessary information.
For instance "zoonotic", "characterized by the onset of acute pneumonia and respiratory distress", "frequently compared to the 1918 Spanish Flu", "distribution and storage challenges".
I guess this can be safely dropped .

- Page 3, second paragraph ("knowledge graph evaluation is therefore necessary"):
Most of this paragraph can be moved to the state of the art section.

- Page 3, last paragraph ("The data structure employed by Wikidata"):
This is very useful.
But I would suggest moving it to Section 3 (next to Figure 2), in order to have everything about the Wikidata data structure in one place.

- Page 4, second column, second ("As a collaborative venture") and third ("Much as Wikipedia ...") paragraphs:
The information provided in these two paragraphs is arguably not needed in this paper.
I would recommend dropping them.

- Section 3:
The structure of this section is confusing:
. The 3 first paragraphs belong to the state of the art.
. Then comes the description of the Wikipedia data structure.
. The end of the section introduces ShEx.

I think readability could be improved by reorganizing this content.
For instance:
. Have a clear state of the art section (combines content from Sections 2, 3 and 7), possibly split in subsections.
. Have a section/subsection that exclusively describes the Wikidata data structure (combines content from Sections 2 and 3). This is important in order to understand the verification techniques proposed in Sections 5 and 6.
. Drop the introduction to ShEx (not used anywhere in the paper).

- Page 5, Figure 1:
I do not see a reason for including this figure in the paper (this is the workflow of a completely different approach to data quality), neither the link to the source code.

- Page 5, "non-relational statements cannot have a Wikidata item as an object" and "objects of relational statements are not allowed to have data types like a value or a URL":
I think I roughly understand what this means ("non-relational" means statement whose object should be an RDF literal).
But for clarity, it would be useful to define explicitly what "relational" and "non-relational" means (again, in a short section with preliminary definitions).
In particular, it is unclear to me whether a "relational" statement requires an IRI as object, or more specifically a Wikidata item.

- Page 6, second column, second paragraph ("These statements can be interesting"):
The description is vague, and it is unclear whether the techniques that are mentioned in this paragraph have been implemented, either by the authors or by someone else.
If this is part of the state of the art, then make it explicit.
If this is instead an announcement of the results presented in Section 5 (I suspect this is the case for the last sentences, about inverse statements), then make it explicit.
If this is none of the two, then this paragraph can safely be dropped.

- Table 1:
No need for an exhaustive list here, only the constraints used in Sections 5 and 6, if any (the rest can be moved to the appendix).

- Page 8:
I do not see the point of providing the syntax and semantics of ShEx, since it is not used anywhere in the paper.
I do not see the value of Figure 4 in this paper either.

- Section 4:
Similarly to the presentation of ShEx, the purpose of this section is unclear to me.
A list of SPARQL operators is provided, together with their (informal) semantics.
Syntactic details are even added, for instance:
"the variables in SPARQL are preceded by an interrogation mark and are not separated by a comma".

But this information is not exploited in the paper (aside from the SPARQL queries provided in appendix).

As far as I understand, it would be sufficient to:
- say that SPARQL is a SQL-like query language designed for RDF graphs,
- mention that it also allows retrieving higher-order entities (such as classes and properties),
- provide a link to the SPARQL specification (and possibly to a good introduction).

Similarly, details about the GUI of the Wikidata SPARQL endpoint and the Wikidata prefixes would be relevant if the paper was a tutorial about querying Wikipedia with SPARQL.
But this is clearly not the case.
So I think these details can safely be dropped (together with Figures 5 and 6).

- Page 11:
"a similar protocol fully based on logical constraints fully implementable using SPARQL queries to infer constraints for the assessment of the usage of relation types (P) on Wikidata based on the most frequently used corresponding inverse statements (C_O , P^-1 , C_S ).":
This is very difficult to parse.
Also the repetition of "constraints" does not help (it is unclear whether both uses refer to the same thing).
Again, this could be easily fixed by adopting a slightly more formal notation.

- Page 13, "logically accurate":
Maybe "semantically accurate"?
Or just "correct"?

- Section 6:
A small table with the meaning of the different codes (c, l, r, m, mn, mx, R_0) would improve readability.

- Page 20, second column, and page 21, first column:
Unless I missed something, these two columns essentially say that statements of types R_0, mn and mx cannot be reliably (un)validated.
If so, I wonder if it is worth mentioning them in the paper at all.

- Section 7:
The state of the art goes into many direction.
I suggest dropping the generic considerations about machine learning, IoT, XML, etc.
It feels like an enumeration of loosely connected topics, which is detrimental to the argument in my opinion.
Instead, the section could focus on a comparison with alternative approaches to data quality assessment in knowledge graphs (some of which have been introduced in Sections 2 and 3).

- Page 23, "the function of logical conditions should be expanded to refine the list of pairs (lexical information, semantic relation) to more accurately identify deficient and missing semantic relations":
This is quite abstract, it could either be made more precise or dropped.

- Page 23, "Big data is the set of real-time statistical and textual ...":
Not sure that generic consideration about big data are needed here (arguably out of scope).

- Page 23, word embeddings, latent dirichlet analysis, Hadoop and MapReduce:
Again, not much to do with the content of the paper, can safely be dropped.

### Remarks

- Page 3:
The phrasing of the contributions can be misleading.
. "We introduce the value of Wikidata ... ": does "introduce the value" mean "argue in favor of using"?
. "we cover the use of SPARQL to query this knowledge graph (Section 4)": "cover the use" suggests some state of the art.
Instead, Section 4 consists of a brief description of the SPARQL language, and some information about the Wikipedia endpoint (and prefixes).
. "we demonstrate how logical constraints can be captured in structural schemas": "logical constraints" can be misleading (there is no logic here) as well as "demonstrate" (no theoretical result). Maybe something like "we empirically illustrate how semantic data quality issues can be identified via SPARQL queries"?
. "used to [..] encourage the consistent usage of ...": I guess this refers to some form of automated suggestion, but I did not see anything directly related in Section 5.

- Page 6, "to its corresponding Wikidata item":
This is too vague.
Let "p" be a property.
If I understand correctly, in a statement of the form:
p "subject item" c
the object "c" is (one of) the expected class(es) of "s" in statements of the form:
s p o
Again, a slightly more formal notation would really help.

- Page 6, "missing Wikidata statements (C_1 , P, C_2), which are implied by the presence of inverse statements (C_2, P^-1, C_1) in other Wikidata resources.":
This is arguably confusing.
In Figure 2, there is no statement of the form (C_1 , P, C_2) where C_1 and C_2 are classes.
Maybe what is meant here is statements of the form:
s p o
o q s
where q is declared as the inverse of p.

If this is the case, then it should also be made explicit that "inverse property" statements are understood as constraints, i.e. if two statements:
s p o
p inverseOf q
are present, then
o q s
should also be present.

This is fine, but needs to be discussed a little bit, because it significantly departs from the usual interpretation of "owl:inverseOf" statements, and more generally from the design of RDFS and OWL.
Traditionally, the third statement would not be considered as missing, but omitted on purpose, because it can be inferred from the two others, which makes it redundant.
In other words, by design, RDFS/OWL statements (which are very similar to the Wikidata "statements") are meant to derive additional information, not to identify missing data.

It is also unclear to me why "equivalent property" statements were not used in Section 5 similarly to "inverse property" statements to identify missing triples.

- Page 6, "and several description logics for the usage of the property":
There is no description logic here.

- Page 11, "inverse statements (C_O , P^-1 , C_S)":
Same remark as above, these do not seem to be Wikidata statements.
This is particularly confusing here, because this suggests that classes are associated to a property (or its inverse), whereas Figure 7 suggests that classes are associated to the subject and object of a statement.
This needs to be clarified.

- Page 11, "of P(S,O)":
First time that this prefix notation is used in the paper.
Again, a section with preliminaries would help.

- Table 3, "P:(C_S ,C_O) pairs":
Yet another notation.
Unclear how it relates to the "(C_S , P , C_O) statements" mentioned earlier.

- Table 3, "corresponding to each common use case":
This is unclear.
Which "use cases" does this refer to?
Also the footnote does not help ("A set of conditions" is too vague).

- Table 3, "corresponding to the most common (C_S , P^-1 , C_O )":
"Corresponding" is unclear here (define it formally).

- Page 12 "the effectiveness of the use of logical constraints to generate conditions for the verification and validation of the use of relation types":
Difficult to parse.
It is also unclear what "conditions" means (again, a more formal description would help).

- Page 12, "we used logical constraints" and "the use of logical constraints":
It is unclear what "logical constraints" refers to.
Are these constraints expressed in some logic?

- Page 13:
Again, define "use cases" (are these Wikidata triples, or something else?)

- Page 16:
Clarify what "False positive" and "True positive" mean in this context.

- Section 6:
It seems like what is called "statement" in this section is called "relation" in the previous sections.

- Page 21:
the test refers to Tasks M1, M2 and M3 as if they were previously introduced, but I could not see where (the only other mentions seem to be in appendix).

- Page 22, "These tasks successfully address most of the competency questions, particularly conceptual orientation (clarity), coherence (consistency), strength (precision) and full coverage (completeness)":
It is unclear what "the competency questions" refer to here.
Also this is arguably a bold claim.
The verification techniques that have been implemented are interesting, but I doubt they cover most COVID-19 related quality issues in Wikidata.

- Page 22:
"rule-based" can be misleading here.
The term usually refers to some form of automated reasoning (typically deductive).
But there is no reasoning involved in what is described in Sections 5 and 6 (only query evaluation).

- Page 22:
"software tools" is unclear.
I guess what is meant here is "reasoners" (whose execution can indeed be costly).

- Page 22, "depends on the requirements and capacities of the host computer".
So does SPARQL query evaluation (the triple store is hosted somewhere).

### Questions

- Page 3, "allowing embedding":
What does "embedding" mean here?

- Page 3, "fast-updating dynamic data reuse":
I do not understand what this means. Is there maybe a typo?

- Page 5, "for the reformulation of a query":
Why "reformulation" and not "generation" for instance?

- Page 11, "These constraints can be later used to define COVID-19-related Wikidata statements":
"Define" is unclear.
Do this mean "write"?
Or is this an automated procedure?

- Page 11, "disease is the subject class (C_S) and medication is the object class (C_O)":
This implies that subject and object both have a class, and that each has only one class.
Is this really the case?

- Page 13, "72 percent or more of the supported statements":
What does "supported" mean here?

- Page 13, "the medical logic being entered":
Which logic, and entered by whom?

- Page 14, "successfully sorted":
Sorted by what (number of occurrences)?

- Page 14, "three relations had clear inverse properties":
If I understand correctly, this means that the "inverse property" statement for these was not present in Wikidata, but could be inferred to hold statistically?
Or does this mean something else?

- Page 23, "should not only be restricted to rule-based evaluation but also to lexical-based evaluation:
This is confusing.
Does this mean "restricted further", or instead "expanded to"?

### Typos

- Page 4:
"their nature" -> "the nature"

- Page 4:
"encyclopedia, Wikipedia" -> "encyclopedia Wikipedia"

- Page 4,"The system, therefore, aims":
remove "therefore" (does not follow from the previous sentence).

- Page 4:
"property suggesting system" -> "property suggestion system"

- Page 5:
"(Red in Fig. 6)" -> "(Red in Fig. 2)"

- Page 5, "Another option of validating biomedical statements ...":
split the sentence in two (convoluted).

- Page 6, second column:
I guess "C_S" and "C_O" stand for "C_1" and "C_2" (or conversely).

- Page 6, "equivalents in other IRIs" -> "equivalent IRIs".
Or better, "equivalent properties" (since "inverse properties" is used in the same sentence).

- Page 6:
"some of erroneous use" -> "erroneous uses"

- Page 7, "As shown in Fig. 2, a property constraint is defined as a relation where the property type is featured as an object":
It seems like in Fig. 2, what is called a "type" constraint is a constraint on the subject, not the object.

- Page 12, "in order to omit statements that are not widely used in Wikidata":
Should "statement" be "property" here?

- Table 9, caption:
The beginning is copy-pasted from the caption of Table 8, it should not be there.

- Page 23:
"its natural language information of a knowledge graph item" -> "its natural language information"

Review #2
Anonymous submitted on 02/Mar/2021
Review Comment:

This paper constitutes an anecdotal study of the use of logical constraints for data validation in a collaborative knowledge graph, the use case being Covid-19 information in Wikidata. The paper focuses on two tasks: constraint inference of biomedical properties, and heuristic-based validation of epidemiological data. The paper resorts to Web standards and tools to achieve this, namely the SPARQL query language.

The first task is achieved by carrying out multiple data-oriented sub-activities such as schema induction (domain and range inference, identification of inverse predicates), incompleteness detection, and knowledge graph completion (specifically proposing statements that serve as pointers to extract further information). The second task is more scenario-specific and shows how to validate numerical epidemiological data.

- Presentation, Writing & Structure

The article is overall comprehensible, but its structure is rather chaotic and makes **very** difficult for the reader to distinguish the contributions from the background knowledge. I believe that the two contributions that I described above should be explicitly introduced in the first section.

Section 2 introduces Wikidata, and while I agree that a detailed description of the knowledge base is important for this paper, I would encourage the authors to change the focus: after the general information about the project, the paper could illustrate the process of adding a piece of information (in both cases when the source is a human or a bot). That will set the ground to understand how and when validation comes at play.

Section 4 is excruciatingly long. I would recommend the authors to use a more classical approach to describe SPARQL (just take any non-theoretical paper on SPARQL processing ever submitted to this journal to get the idea), and use examples, and perhaps highlight the language keywords that are interesting for this use case. One example is the clause SERVICE that appears everywhere but is never explained. Moreover, I do not see the need of making the parallels to SQL.

Section 5 should separate the analysis for each task in well-defined subsections or paragraphs.

Section 7 hints some possible research avenues that should be studied to improve the paper. I would also encourage the authors to leverage the discussion to argue the potential impact of adopting the proposed solution, and its viability for other similar uses-cases, e.g., future epidemics, or other events that generate data at a daily basis.

Visually speaking, the paper looks fairly unprofessional: the tables are unappealing (cluttered), figures 7 and 9 could be smaller, and some pages are half-empty.

The paper really needs a dedicated related work section to have a clear view of its positioning. The discussion cannot be the place to introduce the related work as it should rather provide further perspectives.

- Scientific content

This article shows the application of existing Web technologies in a real-world and timely use case: the management of Covid-19-related data in a collaborative environment. That said, this work is under no circumstance a match for SWJ because it does not offer any methodological and research insight whatsoever. The applied techniques are straightforward, and while this is not a problem in itself, the paper offers neither a proper review of the state of the art nor any sort of baseline or gold standard for comparison. This makes impossible to evaluate to which extent what the paper proposes could not be solved with existing state-of-the-art techniques for schema induction [1, 6] or inconsistency detection in RDF data [2, 3, 4, 5, 7]. Being able to solve everything using SPARQL has a value, but I am afraid that it is scientifically not interesting. I would suggest the authors to submit the paper (once revamped to be legible and scientifically apt) to the application track of any Semantic Web conference. The discussion provides some research paths that could be explored to improve the paper's contribution.

In regards to the evaluation, the paper shows a lot of absolute numbers (Tables 4-9) which are useless when given out of context. The paper does sometimes show percentages, but they are hidden in the text. I strongly suggest the authors to redefine the structure of the paper. Right now it looks like a draft.


Review #3
Anonymous submitted on 06/Mar/2021
Major Revision
Review Comment:

Review of SWJ submission "Using logical constraints to validate information in collaborative knowledge graph: a study of COVID-19 on Wikidata"

# Overview of the paper

The work proposes an alternative to KG validation using SPARQL-based constraints with COVID-19 as the case study. More specifically, SPARQL-based relational and statistical data validation approaches are developed and tested over Wikidata. The work is also qualitatively compared to other KG validation methods.

# Section by section review

> Introduction
- In case of paper acceptance, some COVID-19-related information can be updated to reflect the latest state.
- In this section, the term logical constraint is discussed in a bit too high-level way. Perhaps, consider adding a small example of a logical constraint along with the constrained data as well as motivation why such a logical constraint can be helpful in the KG development. This may then guide the reader on what s/he may expect from further reading the paper.

> Wikidata as a collaborative KG
- I believe that this section and the next section (i.e., KG validation of WD) can be shortened and merged. The introduction of Wikidata of this section is somewhat not too focused on the research problem at hand and can be made more concise.

> KG validation of WD
- There's some referencing issue, for example: "... and aliases in multiple languages (Red in Fig. 6) ..." -> Fig. 6 or some other figure?
- I suppose the readability of this section can be further improved if there are clear distinctions (or categorizations) of KG validation techniques (as subsections or any other writing constructs to support the distinction). Example categories could be NLP-based techniques, ML-based techniques, logic-based techniques, hybrid ones, and so on.
- In fact, there might be validation dimensions in terms of "what is validated" and "what technique (or class of techniques) is used for validation." Such structuring could help the reader better grasp the paper directions.
- "The inverse property relations can identify missing Wikidata statements (C_1, P, C_2), which are implied by the presence of inverse statements (C_2, P^{-1}, C_1)" -> I think this statement is debatable, in regard to materialization vs. query rewriting approach for inferences over data. An extreme case for example would be that: Would Wikidata have to materialize all owl:sameAs inference results (which would then lead to stored data explosion) or not?
- The constraints in Table 1 seem to be comprehensive enough for validating WD. The paper should be clearer and more convincing on the values the paper might add on top of all the constraints in Table 1.
- Fig. 3 is a bit inconsistent: "... but Flash blindness currently isn't" -> Yet, the value of the symptom property is 'temporary blindness'?
- "However, ShEx was chosen to represent EntitySchemas in Wikidata, as it has a compact syntax which makes it more human-friendly, ..." -> I think this can be both true and untrue at the same time. I (and perhaps other KG developers too) find that both ShEx and SHACL are quite human-friendly with their own pros and cons.
- "In the example (Fig. 4), the shape will be used, and its declaration contains a list of properties, possible values, and cardinalities." -> The purpose/context of the example could be given first to improve the readability. At the moment, it's a bit unclear what's the goal of the ShEx shape in Fig. 4, to support what kind of application, and to deal with what data validation requirements (in natural language) regarding the application?
- The final paragraph of the section brings a mixed message. Does it mean that the job's done already to provide ShEx-based constraints over Wikidata, as in [12], or is there still anything missing? I think it's the latter. In this case, what's missing from the existing implementations of WD constraints? What could be further improved? Some hints on it could improve the paper.

- "SPARQL is a human-friendly language ..." -> This depends on the vocab used. The Wikidata vocabulary which relies on numeric identifiers (Qid, Pid, etc) can be not that human-friendly unless SPARQL comments about what items/entities these identifiers refer to (i.e., labels) are provided. Also, it seems that SPARQL is situated more on developer-friendly rather than human-friendly.
- The purpose/context of this section in relation to the research problem/solution should be made clearer in text. This is also to motivate and guide the reader when reading the SPARQL section.
- There seems to be too much empty space on Page 10 (and also, some other parts of the paper).
- On Fig. 6 and its accompanying explanation: I identify four issues:
(1) The current data model has evolved from the one in Fig. 6. For example, the prefixes for the current data model in WD are now different. Moreover, the identifiers for statement nodes as well as reference nodes are not by simply taking from the numeric part of the item ID (even if the figure was meant to serve as an example, then the example could be potentially misleading). Please refer to some examples of the currenttly used data model in WD, e.g.,
(2) The accompanying explanation does not seem to be descriptive and clear enough (e.g., the difference between wdt: and p:, the qualifier and rank), particularly for readers who are not that familiar with Wikidata.
(3) To improve readability, I think it would be better if the figure could be based on real data on Wikidata (particularly, the one which will be put some constraints in a later section). I suggest that there is a (snippet of) Wikidata page screenshot as well as its corresponding RDF modeling.
(4) Actually, would the figure and its explanation be more fitting in the section where Wikidata is introduced?

> Constraint-driven inference of biomedical property constraints
- The section title seems repetitive wrt. the term constraint.
- This section seems too long without subsectioning. It's a bit difficult for the reader to manually add (mental) structuring to the long text of this section.
- "In this work, we propose a similar protocol fully based on logical constraints fully implementable using SPARQL queries to infer constraints for the assessment of the usage of relation types (P) on Wikidata based on the most frequently used corresponding inverse statements (C_O, P^{-1}, C_S)." -> There are several issues to be elaborate further:
1) Why is there the need for SPARQL-based constraints, if a similar protocol already exists?
2) What can be interesting and useful about inverse statements for deriving constraints?
3) A hint/intuition and a short example on this proposal can be given even earlier in the paper (e.g., in the section introduction). This is to give some direction to the reader as to what are the unique contributions of this paper.
- I am not quite sure, but in Table 3 T2 and T3: Shouldn't (C_S, P^{-1}, C_O) be (C_O, P^{-1}, C_S)?
- Moreover, the tasks in Table 3 could be better motivated. What is the logic behind all the tasks?
- Regarding T2 SPARQL queries in Appendix A: It seems to me that ?P1 captures any property with ?O in the subject position and ?S in the object position. It is not like the standard definition of inverse properties where the inverseness should be declared explicitly, AFAIK. Or did I miss something here in that the inverse property is defined to be more lenient/relaxed?
- "Additionally, we only considered use cases applied to more than a defined usage threshold (here set as 100 but can change according to context)" -> The threshold looks crucial in checking whether use cases make sense or not regarding the properties analyzed. Is there any elaboration on this issue?
- "... we applied our method to the main six Wikidata properties that can be used to represent COVID-19-related knowledge" -> The choice of the six Wikidata properties is by the domain expert (manual) or can be automated?
- In Table 5: There is a subclass relation between "Infectious Disease" and "Disease." Is there any consideration regarding this subclass relation for characterizing the class use cases wrt. the property?
- "All the retrieved use cases were proven to be logically accurate .." -> Logically accurate wrt. some logical formalism or logical cognitively (by manual checking of humans)?
- "This data deficiency may be due to human limitations (inexperience with wikidata or the medical logic being entered ..." -> What is medical logic in this context?
- ".. but also to identify Wikidata relation types where inverse properties do not exist or are not used as intended. In such a situation, the user should manually search for any inverse property to verify whether it exists or propose to the Wikidata community to create it as a new property if it does not exist [11]." -> This (and also Task 3, which involves the inverse property) looks like a heavy limitation of the proposed work?
- Table 6 seems to lack discussion regarding the use cases wrt. subject and object classes.
- "An example of such accurate relations is (alcohol withdrawal syndrome [Q2914873], Drug used for treatment [P2176], (RS)-baclofen [Q413717]) where alcohol withdrawal syndrome is not an instance of disease [Q12136]" -> alcohol withdrawal syndrome is an indirect subclass of disease, as I checked it.
- "Accordingly, the results sorted by Task T4 should be manually verified and validated by experts" -> While this ensures high verification accuracy, it does not seem to be highly scalable. Is there any way to get some scalability?
- Fig. 9 looks to be a bit too large and dominating.
- Also, Fig. 9 to me deserves a bit more thorough discussion as it holds key insights to the usefulness of the proposed approach.
- Table 9 analyzes the usage frequency of references for the WD properties. This is important to improve the confidence on the reliability of WD statements. However, I was wondering on the differences between the reference property usages across the six properties. For example, why is P636 only using P854? Any insights on this from the validation results?
- This section might be concluded by a paragraph or two connecting the analysis over the five tasks.

> Constraint-driven heuristics-based validation of epidemiological data
- The first paragraph looks a bit in rush, and somewhat too direct. Perhaps, better relate it to the paper context, for example, what values the section might add to the proposed solution.
- This section also has a similar structuring issue as in the previous section.
- The tasks in Table 10, would they be valid for COVID-19, or can even be generalized to other diseases?
- I think it would be interesting to show what are the limitations for ShEx and SHACL-based techniques for representing Table 10 (and also Table 3 as in the previous section).
- Heuristics in general are not perfect. In this case, what can be the exception cases when the heuristics give an incorrect analysis?
- "Concerning the variables issued from the integration of basic epidemiological counts (m, R0, mn and mx statements), they give a summary overview of the statistical behavior ..." -> The (long) paragraph started with this sentence is somewhat a bit distractive and not really clear how it relates to the proposed solution. How can SPARQL constraints and validation improve the situation?
- In the case of case fatality rate, there is the interesting discussion regarding how some math formulas could be incorporated in SPARQL-based constraints. However, I wonder whether it's possible to even generalize the analysis: Could a math formula be an input parameter for a function to automatically generate the corresponding validation SPARQL queries? Also, what could be the expressiveness of the math formulas able to be captured by SPARQL?

> Discussion
- The discussion clears up some confusion from the previous sections (I deliberately does not change my review for the previous sections, so spots where confusions occur can be noticed). It provides how the previous sections relate to the research problem and proposed solution. I was wondering whether some parts of the discussion could be briefly mentioned also in the previous sections.
- Still, the discussion could be improved by some explicit structuring.

# Additional comments

- In general, a KG tends to be more open than a relational database (which follows the closed-world assumption), and that more relaxed/lightweight constraints are deemed to be more suitable over KGs in favor of the open-world assumption. This aspect of the trade-off between constraints and the open-world assumption could be added as an enriching discussion of the paper discourse.

- Is there a reason why SHACL-SPARQL (, which seems to be as expressive (plus it's already a W3C standard), is not used?

- Despite the title, the paper does not seem to include precise, logical formalization (e.g., with first-order logic) of the constraints. Is there any specific reason for that?

- The use of SPARQL queries as constraints could give the impression for an ad-hoc approach, since: (1) SPARQL queries are expressive; (2) there should be a more abstract formalization on top of the queries to provide some constraint expressivity boundaries (to better regulate which constraint is in line with the proposal, and which is not); and (3) in comparison to SHACL and ShEx, the proposed SPARQL-based approach is somewhat unrestrictive and less principled on what constraints could be imposed.

- The work may refer at more recent work regarding knowledge graphs and COVID-19 (particularly due to fast, dynamic nature of such topics) as relevant literature.

- It's important also to see how the work can be generalized to other diseases. The work might start by introducing the general approach to the SPARQL-based contraints and validation, and then taking COVID-19 as a case study.

- The constraint/validation evaluation does not seem to be extensive enough. Also, how long was the validation time required for each task?

# Conclusions of the review

Given the above reviews, I tend to suggest a major revision to the paper.