Lessons Learned from Adapting the Darwin Core Vocabulary Standard for Use in RDF

Tracking #: 1093-2305

Authors: 
Steve Baskauf
John Wieczorek
John Deck
Campbell O. Webb

Responsible editor: 
Pascal Hitzler

Submission type: 
Ontology Description
Abstract: 
The Darwin Core vocabulary is widely used to transmit biodiversity data in the form of simple text files. In order to support expression of biodiversity data in the Resource Description Framework (RDF), a guide was created as a non-normative addition to the Darwin Core standard. This paper describes the major issues that were addressed in the creation of the guide, particularly problems related to adapting terms designed to have literal values for use with IRI references. By making it possible to express millions of existing records as RDF, the guide is an important step towards enabling the biodiversity informatics community to participate in broader Linked Data and Semantic Web efforts.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Jul/2015
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'Ontology Description' and should be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.

Darwin Core has aggregated some 533 million organismal occurrence records, and is now being translated into RDF. The new DwC-RDF specifications are detailed in a separate paper - http://www.semantic-web-journal.net/content/darwin-sw-darwin-core-based-... This manuscript, written by an expanded author team, provides relevant information on the adaptation and interpretation process of the RDF-translated DwC. The manuscript is authoritatively and accurately written and well referenced. While it remains technical and audience-focused (i.e., it is very "standard compliant" in style), the importance of this issue is considerable. Jointly the two papers in SWJ should be accepted for publication in order to disseminate and legitimize these efforts to bring occurrence-based biodiversity data into the Semantic Web realm when previously that remained external to it.

Review #2
By Ramona Walls submitted on 19/Oct/2015
Suggestion:
Minor Revision
Review Comment:

The instructions state “This manuscript was submitted as 'Ontology Description' and should be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.” However, it is not actually an ontology description, but rather a description of an RDF user guide. Therefore, I have reviewed it based on altered criteria:

(1) Quality and relevance of the described user guide (convincing evidence must be provided).

The paper and corresponding guide are very well thought out, thorough, and well written. In fact, the user guide is one of the most thorough pieces of documentation I have ever read. If only everyone would document their work this well! For example the User Guide includes an explanation of RDF (with links for further reading), an implementation guide with examples, and a reference for the terms contained in the RDF vocabulary. The guide provides information in a form that is actually understandable and useful to developers wanting to use DwC as RDF.

This manuscript is of very high importance to the biodiversity standards community. The effort to convert the Darwin Core vocabulary to RDF, and the challenges involved, are also of interest to a broader audience, because there are many existing vocabularies that may need to be adapted for use with RDF and are likely to face similar challenges. The fact that the RDF implementation of DwC has the backing of a large and active standards community adds extra credence to this report.

(2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described guide.

The manuscript does a very good job of explaining the different aspects of the user guide, as well as the challenges that were faced in providing DwC as RDF. Some of the sections of the manuscript are a bit challenging to follow, I think in large part because the subject matter is complex and requires a lot of context to understand. I have added some specific suggests below that I hope will allow the authors to make things clearer.

Specific suggestions:

1. Maybe move Box 1 to the bottom of the page, so it is closer to the mention of it. Maybe move this to the bottom of the page, so it is closer to the mention of it.

2. P. 2 mentions “the results of experimentation re-ported on the TDWG mailing list8 between 2009 and 2011”. It would be better to have links to some specific messages relevant to the topic, rather than just pointing to the mailing list.

3. Section 1 refers to QNames. Could also mention CURIE syntax: http://www.w3.org/TR/curie/

4. Section 2.1 states “Relatively few members of the organization are familiar with RDF.” Relative to what? Maybe better to say "a small proportion" or similar.

5. The wording of the second half of the first paragraph of section 2.1 may not be all that accessible to novice RDF users. Maybe that is not necessary for SWJ, but perhaps it could be reworded slightly for a more general audience.

6. First sentence in the second paragraph of section 2.1 should read, “…was identified as an important point.”

7. Fig. 1: Are there really DwC records that contain ids as URNs? This may be lost on some readers. Maybe better to use a non-RDF id, which is what most records contain (I think - if I am wrong on this, then leave it as is).

8. Fig. 2: Shouldn't eventDate have an Event as the subject, rather than an Occurrence? Actually, this is discussed but much later, in section 2.4.2 Maybe mention in section 2.2 that it will be discussed later.

9. Fig. 2 legend: Suggest changing to "Attempt to represent graphically as RDF". This isn't an actual RDF representation.

10. Section 2.2.1, regarding the dual term approach: I understand why this was done, but I can't say I like having two copies of every term. Is there an automated mechanism for maintaining the two lists, so that when a definition of a property changes in one, it is automatically changed in the other? Also, is there a formal relation between two equivalent terms in the two names spaces (here or in DC) or is their similarity just assumed?

11. Section 2.2.2 brings in OWL reasoning, without any introduction or even mention of what it is. It might be helpful to provide a little more context or background here.

12. Section 2.2.2: between “…"États-Unis", etc.” and “Following this approach…” does not need to be a new paragraph.

13. Section 2.2.2: I think you should include a table (as a supplement, if space is a premium) listing all the convenience terms.

14. Section 2.4 Mentions object properties. Does RDF distinguish between different types of properties? I thought they were all just "rdf:property" and that the distinction between object, data, and annotation properties was part of OWL. Even though OWL properties can be serialized as RDF, I don’t think they are part of the RDF standard. If I am wrong here, please provide a citation at the first mention of object properties.

15. Section 2.4.2: I think it is worth mentioning that Figs 1 &2 in the BCO paper by Deck et al. (2015, SIGS) also shows how very Darwin data can be modeled with BCO, using general properties from the OBO Foundry's Relation Ontology. I realize that the Deck et al. paper may have come out after this manuscript was submitted.

16. Section 3, The Gazetteer Ontology (http://purl.obolibrary.org/obo/gaz.owl - beware if you click on this link that the file is huge) can also be used for place names.