An RDF guide for the Darwin Core standard

Tracking #: 636-1846

Authors: 
Steve Baskauf
John Wieczorek
John Deck
Campbell O. Webb

Responsible editor: 
Guest Editors Semantics for Biodiversity

Submission type: 
Ontology Description
Abstract: 
The Darwin Core vocabulary is widely used to transmit biodiversity data in the form of simple text files. In order to support expression of biodiversity data in the Resource Description Framework (RDF), a guide was created as a non-normative addition to the Darwin Core standard. The guide resolves a number of issues that arise from adapting terms designed to have literal values for use with URI references. Although there are some problems that are beyond the scope of the guide, the guide is an important step towards enabling the biodiversity informatics community to participate in broader Linked Data and Semantic Web efforts.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 06/Oct/2014
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'Ontology Description' and should be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.

While the general notions regarding validity and relevance of the ontological approach of this paper match those of the other Baskauf/Webb submission, I found this manuscript to be less well presented than the "DSW" paper. The exposition is too stenographic and not sufficiently embedded for the general reader to follow easily and understand the significance. A more thorough rewrite might help here.

The main motivation for this submission, as far as I can tell, is this: The authors have co-developed an extensive new resource here:

https://code.google.com/p/tdwg-rdf/wiki/DwcRdf

and also:

https://code.google.com/p/tdwg-rdf/wiki/DwcRdfGuideProposal

Overall this is wonderful in my mind, and presents great leaps ahead for representing DwC in RDF (and we absolutely need to see how far we can push on that front). However the manuscript itself does at best an average job of connecting the dots and lines for me. For one, it actually takes a little effort to understand that the SWJ manuscript in itself is *not* the Guide. It does not even directly follow the structure of the Guide (which is meticulously structured and also fairly lengthy), nor does it present all key aspects. It is instead almost like a "Musings on.." paper. This is quite fine, but the relationship between (1) the on-line resources serving in effect as the Guide and (2) the current paper need to be clarified upfront. An overview figure or two on what the actual Guide has and does, would help as well. Then the authors could state that based on the new existence of this on-line resource, a number of issues are worth highlighting and reflecting on (achievements, limitations), etc.

In other words, the manuscript needs improved initial stage setting in my view, and more clarifications throughout that connect the "musings" directly with the necessarily more comprehensive (but also less judgmental) on-line material. That way the interaction between the manuscript and the Guide is more obvious.

Other comments: the figures tend to come too early. 1-2 initial overview figures might help, before jumping right into concrete examples. 30+ footnotes and 1 reference is not an ideal way to accredit authors, or promote citations of their achievements.

None of the above is meant to diminish the effort and value of what went into creating the Guide. It is mostly about salespersonship. In general the TDWG-RDF developments are of highest importance to the community in my assessment. Additional comments are provided as sticky notes directly in a PDF shared with the Editors.

Review #2
Anonymous submitted on 16/Oct/2014
Suggestion:
Major Revision
Review Comment:

The paper describes extremely important work that is of great value to
the biodiversity community. Providing an RDF version of DwC and a
guide on how to use it is direly needed. The guide contains valuable
information and is a very solid piece of work.

Unfortunately, this is not really reflected in the paper submitted to
the special issue. I am not quite sure about its value to the
community in the current form. I am sure, however, that the
experiences and lessons learned from the work on the RDF guide offer
plenty of material for a great paper - and I would really love to see
that paper in the special issue.

Ideally, such a paper would be of interest on the one hand to people
interested in using the RDF DwC version and on the other hand to
people interested in "translating" non-semantic vocabularies to RDF or
OWL.
This would be the case, if the paper explained on the one hand as it
already does to a certain degree the rationale behind some of the
decisions made in the RDF guide, and on the other hand provided some
"lessons learned"-like discussions on pitfalls and difficulties.
Ideally, this should be generalisable to other endeavors.

Some more detailed comments that might or might not still be relevant
after restructuring the paper:

The title is somewhat misleading. One might think, that this actually
is the RDF guide. Maybe rather "Experiences in creating..."
"Lessons-learned from..." "An introduction to" ...

The same impression can be gained from the abstract, so that should be
somehow adapted, too.

Introduction:
"over 428 million Darwin Core records": For SWJ readers not familiar
with DwC: What is a record? Maybe explain using database terms.

"a single table of rows and columns": Are there other kinds of tables?

Section 2:
Figure 1:
Are the objects of the dwcuri:recordedBy triples necessarily refering
to the same realworld entities as the entries in the dwc:recordedBy or
could both be mixed? To use the example: Am I sure that there where
two people recording this or could it have been four? The same
question applies to the other constructs where dwc and dwcuri are used
in parallel.

2.2.1 You state ".. ID terms should not be used as predicates in RDF
triples". And how should the triples look like?

Section 3:
You write in Sec 3.1 "are too complex to suggest ..": Why? what makes
them complex?

Fig 7: If possible highlight the parts where the serialisation differ.

Figure 4 is badly placed.