Review Comment:
The paper "Converting neXtProt into Linked Data and nano publications" is focused towards the serialization of annotations specific to nextProt, a protein knowledge platform, as well as incorporating the nanopublication approach to provide provenance information. A use case demonstrating the handling of post-translational modification data modeled as nanopublications is explained to illustrate how the different levels of provenance and data quality thresholds can be captured in this model.
The converted dataset uses a huge amount of established vocabularies to model the data, which is aligned with the principle of re-using existing vocabularies. The data is also made available for download as an RDF dump. The conversion of this data using the nanopublications model does seem a reasonable and useful effort considering the huge amount of useful information that is available.
However, there is key information that is lacking in the paper and the dataset itself
- making the data available via a SPARQL endpoint
- more use cases
- interlinks to other external dataset (even to the RDF version of UniProtKB) or other datasets: http://beta.bio2rdf.org/ and use of these links to obtain further information
- a VoiD description of the dataset including the versioning and licensing information
- update mechanism as well as policies to ensure sustainability and stability
About the conversion process, why was it necessary to transform the XML data to a relational data-model? Wasn't a conversion from XML to RDF (via XSLT) possible? What were some of the errors that were encountered during the conversion? Additionally, the one use case explained in this paper is not extremely clear. Providing a concrete example of an 'assertion' might help better understand the usage. Actual analysis of an author (or group or authors') scientific contribution could explain and illustrate the use case better (if possible). Also, why do the authors look into minting of URLs when talking about Linked Data where URIs are used? What are the known shortcomings of the dataset?
The paper is easy to read, however contains a few errors and needs clarification at certain places:
- Abstract: "…to illustrate the how the different…" - "…to illustrate how the different..."
- Introduction: "accessibly" - "accessible"
- Add references for the Open PHACTS project, UniProtKB Linked Data model, BioPAX ontology
- Expand the abbreviation PTM in the abstract and add the abbreviation at the first occurrence of the word in the text
- What does "standards based ontologies" mean?
- In Figure 1, what does RDM stand for? Relational data model? Please add the abbreviation in the text.
- Section 3: add reference [9] in the first paragraph itself instead of the third paragraph
- Section 4: "possibly" - "possibility"
- The last figure is incorrectly numbered figure 2
- Section 5: First paragraph in Conclusion is more suited for the introduction. In this conclusion section, the main contributions, limitations (if any) as well as future work should be discussed
- Section 5: "Nanopublications encoded in RDF can be more easily mined, queried and retrieved through the Internet…" I would rather say a SPARQL endpoint instead of just stating the "internet" in general !
- Reference 4 is not used anywhere in the paper
- Check formatting of reference 21
|
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-s...