Review Comment:
This submission implements and evaluates two anti-patterns against the April 2021 Wikidata dump. The analysis reveals millions of cases where the anti-patterns occur, and can be captured automatically with relatively small queries. Further analysis shows that while Wikidata has a mechanism for distinghuishing class levels, this is not used in practice. The links of the items in Wikidata to OpenCyc provide a possibility for such anti-patterns to be avoided in the future by automated means.
Strengths:
1. Work on anti-patterns like the ones proposed here provides interesting insights that can help Wikidata's knowledge quality.
2. The two anti-patterns are meaningful and their analysis shows that they are prominent in Wikidata.
3. The possibility to use OpenCyc to flesh out class distinctions automatically seems promising to help with AP1.
4. The automatic support tool is provided as open source software.
Weaknesses:
1. The paper analyses two anti-patterns, while the introduction claims that prior work analyzed only one. This does not seem like a huge jump in the novelty, especially given that a portion of the analysis and the automatic support is only given for one of the two anti-patterns. This questions the originality of this work. Also, given that the mentioned prior paper has been a workshop paper by the same authors, I am wondering whether the enhanced version of it is not more suitable for a conference than a journal.
2. Understanding the originality of this work is further hindered by the lack of connecting this paper to prior work. This is an obvious omission, as the paper does not include a Literature review/Related work section, and this is generally absent from the paper. I think that at least four aspects should be discussed in such a section: a) the concrete relation to the previous paper on anti-patterns by the authors; b) relation to prior work on Wikidata quality, especially the ones by Shenoy et al. (2021) and Piscopo et al. (2018, 2019), both of which analyze the confusion between instances and classes in practice at scale; c) relation between your proposed automatic support and the growing list of tools developed/supported by Wikidata for editors; and d) in-depth overview of theoretical work on distinguishing tokens and types.
3. Related to 2, the paper needs to position its content much better, especially in the introduction. I especially expected more discussion on the relation between tokens and types here, from the perspective of linguistics and philosophy. Moreover, the introduction talks about anti-patterns as problems and meta-class formalisms as solutions, but the examples for the two are similar in the introduction and it is unclear how to distinguish the two in practice (the paper, in fact, shows that meta-classes are not used in practice). Finally, the introduction says that prior work has only evaluated a single anti-pattern - but I cannot find information about how many anti-patterns there are, how were these derived, and how complete they may be as an error analysis principle.
4. On a related note, the paper does not specify its scope, which makes the analysis performed seem somewhat ad-hoc. It would help tremendously if the authors could list and motivate their research questions prior to presenting the results.
5. It is disappointing to see that the computational method that was used in this work did not prove to be reliable. Out of the two queries in 3.2, the second one could not run with Stardog. And for the analysis in 3.4 and the support in section 5, AP2 is not shown anymore because of performance issues.
6. Section 3 leaves the reader with many open questions, some of which are addressed in section 4. For instance, the entity counts may or may not be meaningful, given that many of the classes that have anti-patterns in tables 1 and 2 are generally well-populated classes, like disease and gene. It could have been more informative to distinguish the entities based on their position in the anti-pattern, or to compare their AP occurrences with overall occurences in Wikidata. Moreover, in the analysis about genes on page 6, the authors mention that their data comes from multiple sources, but it would be great if we could know whether some of these sources contribute the most to these anti-patterns and understand the underlying reason for that if possible.
7. The automatic support is nice and simple, however, it would be good to see how would this be intended to be used. Editors already have many tools within Wikidata - is the idea that the WAPA tool will be integrated with the other Wikidata tools?
|