Review Comment:
Paper 2810-4024 (Using the W3C Generating RDF from Tabular Data on the Web Recommendation to manage small Wikidata datasets), a revision of 2659-3873, significantly expands on the earlier version from a main body of 9 pages (total 14 pages) with 19 references and 14 footnotes to a main body of 15 pages (total 25 pages) with 35 references and 29 footnotes. Most of this additional material addresses shortcomings identified in the first round of reviews - notably, the lack of a substantial section on related work. The authors have also added a diagram showing how the components described fit into a workflow, and screenshots of a GUI tool for constructing the JSON metadata used to drive CSV2RDF transforms.
The authors justify and explain these additions and changes in a lengthy and comprehensive cover letter, and I am satisfied that they have addressed the concerns raised in the first round. This has resulted in a stronger paper.
The paper makes it clear that the described approach is best suited to monitoring and updating small subsets of Wikidata "items of interest" (roughly, up to the number of items in a small art gallery) by users and small organizations with limited technical expertise. Users can limit not only the set of items, but also the set of properties, references, and qualifiers used to describe those items. I buy the argument that such users find it easier to spot and fill gaps in the data by scanning rows and columns in a spreadsheet, and by copy-pasting, than by trying to work with JSON-LD or Turtle.
The paper argues convincingly that existing tools -- QuickStatements and OpenRefine for manipulating and uploading tabular data to Wikidata; Quit Store and OSTRICH for archiving RDF graphs -- are less well-suited for use by technically non-expert users who need only to track a small number of items.
Some of the limitations of their approach are actually limitations of the Tabular Data on the Web specification (CSV2RDF): lack of support for language tags, for generating more than one triple from one column, and for multiple values.
The most serious technical objection in the first round of reviews, in my reading, was the lack of consideration of statements ranks, as ranks affect the materialization of truthy statements and thus affect the resulting graphs that will be compared with federated queries. In response, the authors argue that ranks are rarely used by novice users. Ranks could in principle be accommodated by adding columns to the CSV, though these may not be worth the extra complexity. They note that QuickStatements likewise does not support ranks.
Previous reviews point out the lack of evaluation or user feedback. The authors point to workshops, blog posts, videos, and "anecdotal feedback". ("In two workshops, non-programmers were able to learn about, set up, and use the system to write data to Wikidata in less than an hour.") I personally do not think that the lack of a formal user experience study should be grounds for rejecting the paper.
One significant (but fixable) flaw: Appendix A uses the 'rdf-tabular' tool to emit RDF/Turtle, but when I execute the given command using the rdf-tabular Ruby gem version 3.1.15, it raises the error "No writer found for ttl". Specifying one of the supported formats, such as N-Triples, does work. If the paper is accepted for publication, this should be addressed.
A reference or footnote should be added for Darwin Core Archives.
|