Review Comment:
This paper presents a novel platform to match similar artworks across open digital image libraries, through a combination of Semantic Web technologies and Computer Vision methods. As such, the proposed application contributes to the broader research area of Cultural Heritage, which is becoming increasingly popular within the Semantic Web community. Moreover, the qualitative comparison of off-the-shelf CV APIs contributed in Section 3 is likely to impact many experts in the field who can benefit from a comparison of the image matching features offered with existing platforms.
Overall, the motivation of this work is clear and substantiated. However, further changes are needed to improve the clarity and organisation of this paper, as suggested in the following.
One of the main arguments underlying this work is the fact that textual descriptions of images are biased, calling for autonomous (CV-based) methodologies to objectively compare images by visual similarity (page 2, Section 2). However, this claim overlooks the fact that CV methods are also inherently biased. Thus, it should be toned down and opportunely rephrased. By contrast, the requirement of solving inconsistencies in the image metadata is stronger and more compelling, although it is not discussed until the conclusion Section.
In general, a more technical language should be favoured when describing the Computer Vision elements of this work. For example, in Section 3, references are made to “a more “fuzzy” similarity matching”, but further technical details should be provided on the type of similarity metrics under comparison. Similarly, the expressions “allowing images to dialogue with one another” (Section 2) and “Google search image type lookup” (Section 7), are overly informal and ambiguous. One option would be to define all the relevant terminology in the Background Section. This addition would also help to contextualise the role of the similarity ontology of Section 5.
The last paragraph of Section 2 very usefully summarises the motivation and intended contribution of this work so I suggest it is moved earlier in the text, to the Introduction Section.
The results in Table 1 and the conclusions drawn in Section 3 about the most suitable API seem to contradict one another: from Table 1, one would gain the impression that methods such as Inception V3 are preferable because they cover the highest number of evaluated features. On the contrary, the text illustrates why Inception V3 was the least useful APIs among the tested ones. Table 1 should be revised to (i) convey how the different features/attributes are weighted/prioritised in the evaluation, (ii) include the metrics of robustness to the angle, colour, and crop variations which are only described in the text.
A precise description of the data sample used to manually inspect the matching results should be provided as well.
A more detailed explanation of the building blocks of Figure 2 should be added, to make the paper more self-contained and more accessible to readers who are unfamiliar with the Similarity Ontology.
The author should also clarify in which context bi-directional and non-bidirectional similarity metrics are used. In section 5, it is mentioned that, in the data model, “ similarity is always bidirectional and a search for one image in pair should generally yield the same score as searching for the other”. However, in Section 8, specific cases are discussed where an asymmetric indicator of similarity is preferred, to differentiate between a copy and the original, for instance.
In sum, this paper presents an interesting and timely project, which has the potential to impact many use-case scenarios. Thus, the author should also discuss any future evaluation plans with respect to assessing the utility of the proposed platform to expert and non-expert users, i.e., across the many described use-cases.
|