The ACORN-SAT Linked Climate Dataset

Tracking #: 457-1634

Authors: 
Laurent Lefort
Armin Haller
Kerry Taylor
Andrew Woolf

Responsible editor: 
Oscar Corcho

Submission type: 
Dataset Description
Abstract: 
The Australian Bureau of Meteorology has recently published a homogenised daily temperature dataset, ACORN-SAT, for the monitoring of climate variability and change in Australia. The dataset employs the latest analysis techniques and takes advantage of newly digitised observational data to provide a daily temperature record over the last 100 years. In this article we present how ACORN-SAT can be published as linked data with the help of the Semantic Sensor Network ontology and the RDF Data Cube vocabulary. We describe how the proposed service can make such datasets more accessible and linkable with other resources and how to handle issues which are specific to such long term climate data time series. The resulting Linked Sensor Data Cube is accessible online via a pilot government linked data service built on the Linked Data API at lab.environment.data.gov.au.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Marta Sabou submitted on 30/Apr/2013
Suggestion:
Minor Revision
Review Comment:

This paper describes the ACORN-SAT dataset, which provides recordings of daily temperatures for the last 100 years in a variety of locations in Australia. I will comment on this paper along the main criteria of the call:

Quality of the dataset: High
This is a high quality data set. To start with, the source of this linked open dataset is a large-scale dataset of the Australian Bureau of Meteorology containing over 100 years worth of daily temperature readings from hundreds of locations in Australia. The publishing process described ensures that the obtained LOD is also of high quality. In particular: (1) the authors have reused several existing vocabularies paying special attention to the statistical nature of the dataset in question and therefore adopting the RDF Data Cube Vocabulary amongst others; (2) the URI’s of the published entities follow clear patterns; (3) the data is accessible through several machine and human-friendly interfaces, including geographic mashups created by the authors

Usefulness of the dataset: High
This dataset has a high potential of being of great use for supporting climate research. For example, it could enable the development of novel and intuitive visual interfaces over the existing data. Or it could facilitate the integration of this data with other relevant data sources (for example, those provided by the WorldBank).

Clarity and completeness of the descriptions: Good.
Overall, the paper is well structured and clearly written, addressing all the aspects indicated in the call. An exception is paragraph 2 of section 2.2, which is overly long and complex, and should therefore be revised. Some aspects of the work that should be discussed in more detail or clarified are:
* The usefulness of linking to the UK interval object (Section 2.4) was not clear. Could the authors better clarify the potential use of this mapping?
* It would be interesting to know more about the process of linking with GeoNames. Did the authors encounter any specific issues? Could they link all locations?
* How is the dataset updated? Is there any versioning mechanism in place?
* Did the authors consider linking the locations to the corresponding DBpedia entities, to allow getting more information about these locations?

Review #2
By Danh Le Phuoc submitted on 17/Jun/2013
Suggestion:
Accept
Review Comment:

The paper presents the Linked Climate dataset of Australia for 100 years with at lot challenge properties that need to be captures. The paper is well-written, all the aspects of the dataset are nicely covered. It would be nicer if the paper provides more statistics about internal links and external links of the dataset.

Review #3
By Raphael Troncy submitted on 19/Aug/2013
Suggestion:
Minor Revision
Review Comment:

This paper presents the ACORN-SAT dataset, published as Linked Data. This dataset records all observations made by weather stations in Australia over the last 100 years. It aims at monitoring climate variability and change in Australia. This dataset makes use of two important ontologies, namely the Semantic Sensor Network ontology and the DataCube vocabulary, and some other custom-based ontologies. All resources are available online from http://lab.environment.data.gov.au/. The ontologies developed can therefore be re-used and the data can be browsed online (the technologies for serving the data include Virtuoso as triple store and ELDA as implementation of the linked data API). I confirm that the data published is 5 stars in Tim Berners Lee scale.

The paper is very well written and well in-scope of this journal special issue about datasets description. My first concern is about the similarity of this paper with respect to another paper published by the authors in the SSN 2012 workshop (reference [11]) which is sometimes even clearer that this journal submission. I suggest that the authors state upfront that this paper is an extended version of [11] and that they state clearly what is new.

My second concern is about the various ontologies developed, their associated prefixes which is sometimes awkward, and the URL design pattern finally chosen for representing the data. As a side note, I observe that neither the ontologies developed nor the prefixes used are known by linked data services such as prefix.cc OR LOV, see [1] or [2]. I suggest that the authors do register those prefixes and vocabularies for fostering their re-use.

When modeling the acorn-deploy ontology, you used DUL for representing the temporal relationships between the deployment phases and interval from data.gov.uk (which makes use of OWL Time). Why did you not simply use OWL Time and temporal calculus enables by OWL Time (basically, Allen83)?

When modeling the acorn-sat ontology, why did you create a new 'Observation' class? Is it just a placeholder to state an equivalence between qb:Observation and ssn:Observation? If yes, this should be clearly stated. More intriguing, why did you create the 'TimeSeries' class? It seems to me as exactly a qb:Slice and I don't understand why you say it is also a ssn:Observation. Can you please explain why?

My third concern is about how this dataset has been interlinked with other datasets. It seems that it has just been interlinked with Geonames, using the geonames API, i.e. algorithmically as opposed as declaratively using a instance matching tool such as Silk and an appropriate configuration file. Is this correct? I believe this should be further clarified. I really don't understand why you can't interlink this dataset with the World Bank one of the AWAP dataset. You're implying that the (spatial?) granularity used in those two other datasets prevent to interlink, because ACORN-SAT would be too fine-grained? This seems wrong. If you have more precise observations, you could certainly aggregate them in bigger spatial regions, using your declarative knowledge (e.g. geonames:parentADM1, 2, 3 or 4). Therefore, what prevents you to perform at least a partial alignment?

In the Figure 1, why do you use the edge label 'instance' instead of 'rdf:type' since you're depicting an RDF graph? Similarly, why using 'subclass of' instead of 'rdfs:subClassOf' if this is what you meant?

The table 2 is really confusing. Why didn't you simply model two ontologies, one for acorn including the time series, observations, deployment, etc. and one for the stations? I would indeed simplify your modeling with just two prefixes, two ontologies and two URI patterns. Can you explain why you split those into so many components?

I believe that the keyword 'year' and 'month' are an overkill in your URI design pattern in Table 3. Hence, why is not simply ? It is also inconsistent with a daily observation identified by (use of the keyword 'date'). If the purpose is to manage collections, then you could simply play with the URI pattern so that (note the trailing slash):
- returns all observations for this station
- returns all observations for this station in 2010
- returns all observations for this station in June 2010

The table 6 that includes key statistics should be further explained. What 'ALL' refers to? This is not the sum of all other count.

In your various mashups you propose, I believe you could do much better in presenting the information to the end-user. Hence, looking at http://lab.environment.data.gov.au/data/acorn/climate/slice, why don't you display the rdfs:label of the station rather than its id? For example, 'All observations for series 023090' should be replaced by 'All observations made in Adelaide'

Who has already re-use the ACORN-SAT dataset? Who are the ones you would expect to re-use this dataset?

Minor comments:
* In the section 3.1, the '0900-0900' or '0000-0000' notations should be explained. I assume, you mean from 09:00am to 09:00am or from 23:59 to 23:59.
* Could you please provide a link of the Australian Linked Data Working Group (AGLDWG)?
* The reference [5] should be updated, DataCube being a Candidate Recommendation as of June 2013.

[1] http://prefix.cc/acorn
[2] http://lov.okfn.org/dataset/lov/search/#s=acorn


Comments