Review Comment:
The paper addresses the problem of multi-level classification (taxonomy-based) of entities by comparing different workflows, based on automatic prediction and crowdsourcing. The paper features an interesting set of experiments.
(1) originality
Although it's not disrupting in terms of innovation, it could provide some small further insight with respect to the existing studies on the organization of crowd work.
(2) significance of the results
Results are significant in the context of the specific problem addressed by the paper, although of the approach is probably not so much generalizable.
(3) quality of writing.
The current version of the manuscript has improved in readability, coherency and organization.
The main criticism I have on the current version of the work is that the authors insist in dedicating a significant section of the related work to the GWAP field, which is basically irrelevant in the current context, as they state also in their rebuttal letter ("Whilst we agree with the Reviewer that improvements can be made such as involving gamification techniques or query execution plan optimization, those techniques often are not natively supported by crowdflower. We reduced our approaches to that not needed external applications ...").
Viceversa, the analysis is missing all (or most) of the works that study different crowdsourcing workflow planning strategies (e.g., considering crowd quality, performance, adaptation and reactivity, multi-step processes, spam detection, and so on). I think this could deserve a relevant section in the related work, beyond the part applying AI techniques.
Some minor aspects need to be addressed:
-"including glitches in the extraction process" --> there is no substantial evidence of this in the paper. The authors should either motivate and show some examples, or drop the comment.
- in the formula: Cost(annotation) = Cost(prediction)+ Cost(detection) + Cost(correction), the term "cost" should be specified more clearly. Is it monetary? (in this case, you should clarify that the cost of prediction may come from paid API invocation) or number of human tasks only? (it seems so based on the reported results)
- in section 5.4, you should clarify if you actually applied spam prevention or not.
|