Review Comment:
While there is at least some related works now, It doesn’t mention any aspect of best practices reused, or methodological approaches to develop a LD data set (there is none?). The discussion section states as second contribution “a set of practices and vocabulary choices for reuse in similar efforts ”, and technically one can extract a set from the various paragraphs, and they are practices and choices, but they’re not best practices, and this is not justified neither w.r.t. related literature, nor is there a clear sequence of steps presented in the paper. The onus ought not to be on the reader to extract the practices and choices from the different paragraphs when it is claimed that the paper offers this, but instead presented on a plate’/spoonfed instead (which it is not).
I also see some improvements on the schema, but the main problem of mishmash [that word in the previous review was a polite rendering of ‘unmotivated messy cherrypicked cocktail of vocabulary terms with a range of issues’] of classes and properties from a plethora or sources remains. While I disapprove of such an approach, and I don’t think it was ever the intention to do a cherry-picking of URIs to copy-and-paste one’s schema together, perhaps this could work if the sources were aligned properly—which is what the paper claims to have done—and motivated properly, which is asserted in section 7 (“vocabulary choices”). On the latter, it mentions only a ‘we started with swrc and added stuff to that along the way’ in section 4.1, but that doesn’t count as a description on trade-offs among vocabularies. More problematic is that despite some corrections made following my pointers in the previous review, there are still problems. To illustrate, selecting a few from Tables 4 and 5, the following:
- Paper swrc:inProceedings is really on papers in proceedings and essentially have an associated (scientific) event of which it is a proceedings, whereas schema:Article is only the informal notion of an article, and bibo:Article’s description lies somewhere in-between, so no equivalences.
- Journal Issue in the concept column with ofType bibo:Journal: no, in bibo, Journal hasPart only Issue, so a Journal Issue cannot be ofType Journal.
- Some missing mappings, npg:Citation and schema:Citation seem to be the same, and npg:hasCitation could relate with isReferencedBy in bibo (though that is actually purl.rog/dc/terms, not really bibo).
No doubt there are more such shortcomings regarding the schema, but I consider it the responsibility of the authors to do all this, not me, in particular because the integrated graph is promoted in the paper as a useful contribution. At present, it still doesn’t instil confidence, but instead gives the impression of a not well-executed copy-and-paste job of vocabulary items, which, while in the strict sense of providing a “set of practices”, is not one that is advisable. Given that the schema is still unstable, I wonder about its knock-on effects on the data and on doing analyses with that data, especially regarding querying the data with the kind of limitations the schema currently still has, and it’s proneness to change until a stable version has been developed.
While it is good to see the LD data set is being used and endorsed by various institutions, neither best practices nor an immediately reusable good schema is available. This being the case, then either the authors should tone down their claims of contribution, or improve the work to meet those claims asserted in the paper.
section 6
16 scientific publication -> 16 scientific publications
section 7
short-comings -> shortcomings
draw-backs -> drawbacks
|