Review Comment:
The paper describes an artificially constructed dataset with a very narrow scope - the conservation of manuscript bindings. The basic information is provided (name, URL, version date and number, license) along with a description of its intended purpose (testing software that should consume such narrowly scoped data). It covers the source of the data (artificially constructed) and connectivity (only internal, as it's artificial). The structure described is interesting, with both instance data and a referenced poly-hierarchical thesaurus. It describes the format (XML + CSV) for the data, and a reference to the version of the CIDOC Conceptual Reference Model which it uses. Section 5 goes into a lot of detail about the possible constructs that are generated. It clearly passes the initial gate of a clear description.
However, I feel that it does not reach the necessary standards for acceptance in its current form according to the three criteria for evaluating such papers:
1 - Quality and Stability: No evidence is provided of the quality of the dataset, relative to cultural heritage linked data. Was it reviewed by subject experts to validate that the records actually resemble real world data? Are there any real world datasets that this artificial construct could be a reasonable facsimile of, and if not, is there any likelihood of them being created in the coming decade? My experience would suggest this is, unfortunately, unlikely at a scale of 28000 instances.
2 - Usefulness: Given the artificial nature, the limited scope, and no determination that it resembles any real world data, it is hard to imagine it being useful other than for the stated purpose of testing software designed to consume this particular dataset. Which is not all that useful. Secondly, it is also difficult to determine the usefulness as Linked Open Data, as it is not available as Linked Open Data. The description discusses the XML and CSV files and claim that they can be transformed into LOD through software in the repository. However that software is in a non-standard (these days) rar format. After downloading an unarchiver, the code wouldn't compile and the jar wouldn't run with Java 15.0.1 under MacOSX. The included CRM ontology was many years out of date, compared to the paper which uses the 2021 version 7.1. As such, I have grave doubts that anyone would find the dataset useful in its current state.
3 - Description: As per the initial summary, the description of what is provided is very clear. However, it uses custom extensions to the CRM, isn't provided as LOD, isn't used by anyone apart from the author and collaborators, and doesn't make references outside of itself.
For it to be acceptable, I feel that the following, relatively minor but very important, revisions must first occur:
* Provide LOD in at least one of the regular RDF formats - Turtle, NQuads, JSON-LD or RDF/XML. This is the semantic web journal, after all.
* Provide evidence that the dataset is useful to anyone outside of the originating context, following criterion 2 "shown by corresponding *third-party* uses - evidence must be provided."
* Provide evidence that the dataset has been reviewed by appropriately qualified domain specialists that the artificially generated content reflects real world data to meet the quality criterion.
|