Review Comment:
This paper presents the ACORN-SAT dataset, published as Linked Data. This dataset records all observations made by weather stations in Australia over the last 100 years. It aims at monitoring climate variability and change in Australia. This dataset makes use of two important ontologies, namely the Semantic Sensor Network ontology and the DataCube vocabulary, and some other custom-based ontologies. All resources are available online from http://lab.environment.data.gov.au/. The ontologies developed can therefore be re-used and the data can be browsed online (the technologies for serving the data include Virtuoso as triple store and ELDA as implementation of the linked data API). I confirm that the data published is 5 stars in Tim Berners Lee scale.
The paper is very well written and well in-scope of this journal special issue about datasets description. My first concern is about the similarity of this paper with respect to another paper published by the authors in the SSN 2012 workshop (reference [11]) which is sometimes even clearer that this journal submission. I suggest that the authors state upfront that this paper is an extended version of [11] and that they state clearly what is new.
My second concern is about the various ontologies developed, their associated prefixes which is sometimes awkward, and the URL design pattern finally chosen for representing the data. As a side note, I observe that neither the ontologies developed nor the prefixes used are known by linked data services such as prefix.cc OR LOV, see [1] or [2]. I suggest that the authors do register those prefixes and vocabularies for fostering their re-use.
When modeling the acorn-deploy ontology, you used DUL for representing the temporal relationships between the deployment phases and interval from data.gov.uk (which makes use of OWL Time). Why did you not simply use OWL Time and temporal calculus enables by OWL Time (basically, Allen83)?
When modeling the acorn-sat ontology, why did you create a new 'Observation' class? Is it just a placeholder to state an equivalence between qb:Observation and ssn:Observation? If yes, this should be clearly stated. More intriguing, why did you create the 'TimeSeries' class? It seems to me as exactly a qb:Slice and I don't understand why you say it is also a ssn:Observation. Can you please explain why?
My third concern is about how this dataset has been interlinked with other datasets. It seems that it has just been interlinked with Geonames, using the geonames API, i.e. algorithmically as opposed as declaratively using a instance matching tool such as Silk and an appropriate configuration file. Is this correct? I believe this should be further clarified. I really don't understand why you can't interlink this dataset with the World Bank one of the AWAP dataset. You're implying that the (spatial?) granularity used in those two other datasets prevent to interlink, because ACORN-SAT would be too fine-grained? This seems wrong. If you have more precise observations, you could certainly aggregate them in bigger spatial regions, using your declarative knowledge (e.g. geonames:parentADM1, 2, 3 or 4). Therefore, what prevents you to perform at least a partial alignment?
In the Figure 1, why do you use the edge label 'instance' instead of 'rdf:type' since you're depicting an RDF graph? Similarly, why using 'subclass of' instead of 'rdfs:subClassOf' if this is what you meant?
The table 2 is really confusing. Why didn't you simply model two ontologies, one for acorn including the time series, observations, deployment, etc. and one for the stations? I would indeed simplify your modeling with just two prefixes, two ontologies and two URI patterns. Can you explain why you split those into so many components?
I believe that the keyword 'year' and 'month' are an overkill in your URI design pattern in Table 3. Hence, why is not simply ? It is also inconsistent with a daily observation identified by (use of the keyword 'date'). If the purpose is to manage collections, then you could simply play with the URI pattern so that (note the trailing slash):
- returns all observations for this station
- returns all observations for this station in 2010
- returns all observations for this station in June 2010
The table 6 that includes key statistics should be further explained. What 'ALL' refers to? This is not the sum of all other count.
In your various mashups you propose, I believe you could do much better in presenting the information to the end-user. Hence, looking at http://lab.environment.data.gov.au/data/acorn/climate/slice, why don't you display the rdfs:label of the station rather than its id? For example, 'All observations for series 023090' should be replaced by 'All observations made in Adelaide'
Who has already re-use the ACORN-SAT dataset? Who are the ones you would expect to re-use this dataset?
Minor comments:
* In the section 3.1, the '0900-0900' or '0000-0000' notations should be explained. I assume, you mean from 09:00am to 09:00am or from 23:59 to 23:59.
* Could you please provide a link of the Australian Linked Data Working Group (AGLDWG)?
* The reference [5] should be updated, DataCube being a Candidate Recommendation as of June 2013.
[1] http://prefix.cc/acorn
[2] http://lov.okfn.org/dataset/lov/search/#s=acorn
|
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-s...