Review Comment:
This paper discusses about the collaborative efforts in Wikidata community of building a general-purpose knowledge graph related to Covid19. The covered topics are comprehensive, illustrative, and most importantly very timely. The motivation behind this work is clear and selected examples can also generally justify the merit of knowledge graphs in multidisciplinary research, like the Covid19 pandemic. Below are my comments:
Major concerns:
1.In Introduction, the authors talk about the benefit and drawback of the ‘community developed ontology and typology’ (second paragraph). In terms of the drawback, it claims that “it makes methodical planning of the whole structure and its granularity very difficult”. However, in the main text I do not clearly see how these issues are addressed in this project.
2.In Data Model section:
a). The authors claim that ‘… an ontological database representing all aspects of the outbreak’. Is it really the case? For example, does it cover economic aspects that include information about the unemployment rate and supply chain disruption during this outbreak? I think it is a too ambitious statement.
b). What exact lessons are learned from the Zika pandemic?
c). The authors mention ‘… could all be represented in Wikidata if matters related to the coverage and conflicts of information from multiple sources are solved’. In fact, it would be great if the authors can discuss about how does the model solve the issue about conflicting statements in the project? In Covid-19, it becomes particularly essential as we see various reported ‘facts’ that are conflicting/inconsistent with each other. In addition, what does ‘coverage’ mean here? Spatial coverage? Temporal coverage? Or property coverage? A little bit confusing.
3. In Language Representation section:
a). Figure 4E is confusing, the x-axis is the rank of languages based on their usages? What does y-axis mean then? The sentence: “The degree of translation of that information is increasingly high with an important representation of the concepts in more than 50 languages (Figure 4E)” does not help to understand the figure.
b). More importantly, there are multiple correlation analyses in this section. However, no statistical analysis is applied at all. The conclusions are all made by arbitrarily checking the tables. For example, the statement “Despite several differences like the higher visibility of Asian language… the query results largely match the literature-derived data … ” has to be justified in a more scientific way, e.g., by statistical testing.
4. In Database Alignment:
This section lists multiple alignment tables for different domains. However, how are these alignments accomplished? Any automated algorithms are used or totally based on human efforts? Have these alignments been evaluated?
5. In Visualizing facets of COVID-19 via SPARQL and Conclusion
It is great to see the authors bring up a relative comprehensive and well organized list of SPARQL queries, and demonstrated several promising visualization in the paper. However, I am wondering how accessible and easy for a non-SPARQL expert to explore the graph (or simply understand the query)? Do the authors have any empirical examples/cases to show how useful the graph has been to domain experts/general public? In Table S2, it seems to be a list about fulfilled tasks; but I do not find more contexts related to this table. Maybe use one of the rows in this table as an example to elaborate would help readers understand the value of the proposed graph.
6. Last but not least, the authors have to proofread the paper substantially. There are many long sentences, inconsistent uses of terms, typos, duplicates, and many weird sentences. In general, the paper is not that easy to follow. For example, solely in the first paragraph of Section 5.2:
a). whereas others common visualization --> other
b). from scratch from granularity --> one ‘from’ has to be deleted
c). its change over time over time --> duplicates
d). Wikidata’s granularity and collaborating … --> What does ‘wikidata’s granularity’ mean here?
Minor issues (this is by no means the complete list. As my sixth major point indicated, the authors have to proofread the paper carefully and make it more readable.)
1. page 2:
a). basing --> based
b). entities named items --> entities, named items
2. page 3:
>17,000 (what is this number? Cases? Deaths?)
3. page 5:
Table S1 --> Table 1
4. page 13:
table S2 --> Table S2
5. page 14:
a). allowed --> allows
b). WIkidata --> Wikidata
|