Transforming Meteorological Data into Linked Data

Paper Title: 
Transforming Meteorological Data into Linked Data
Authors: 
Ghislain Atemezing, Oscar Corcho, Daniel Garijo, José Mora, María Poveda-Villalón, Pablo Rozas, Daniel Vila-Suero, Boris Villazón-Terrazas
Abstract: 
We describe the AEMET metereological dataset, which makes available some data sources from the Agencia Estatal de Meteorología (AEMET, Spanish Meteorological Office) as Linked Data. The data selected for publication are generated every ten minutes by 250 automatic weather stations deployed across Spain and made available as CSV files in the AEMET FTP server. These files are retrieved from the server, processed with Python scripts, transformed to RDF according to an ontology network (which resuses the W3C SSN Ontology), published in a triple store and visualized using Map4RDF.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Pascal Hitzler
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Resubmission after "accept with minor revision", now accepted for publication. Previous reviews are beneath the second round reviews.

Solicited review by Michael Lutz:

The authors have satisfactorily addressed most of the reviewer comments in this revision. However, I still have 2 minor comments that should be addressed in the final version of the paper:

1) The authors also mention the possibility to do spatial filtering using "proximity to a location" or links to the "GeoLinkedDataset exploting its geospatial information", but they do not say how this could be achieved in detail. It should be described whether they envisage using explicit relationships (such as the spatial predicates defined in GeoSparql) or whether they are planning to do GIS-type coordinate comparisons.

2) I see one further important option for linking to other data sets, namely for the "observed properties" of the observations. There are several existing code lists / vocabularies that could be used for this purpose (admittedly, the SWEET phenomenon ontology I suggested earlier seems not to be suitable), enabling better interoperability between systems or when doing queries. One such example is the code list available at http://rda.ucar.edu/docs/formats/grib2/grib2doc/code4.2.html, which is being recommended by INSPIRE as an interoperability code list for describing observed properties in the domain of meteorology.

Finally, a minor comment: The query in Section 6.1 still contains a mistake in the temporal filter. According to http://www.w3.org/TR/owl-time/ inXSDDateTime has the following definition:

:inXSDDateTime
a owl:DatatypeProperty ;
rdfs:domain :Instant ;
rdfs:range xsd:dateTime .

i.e. it has Instant (not DateTimeDescription) as a domain. Thus, the query could probably be simplified to:
?inter w3ctime:hasBeginning ?instant .
?instant w3ctime:inXSDDateTime

Solicited review by Willem van Hage:

I am happy to see that my comments have been addressed properly. I am in favor of accepting this publication as it is now.

Solicited review by Tomi Kauppinen:

I read the answers to the reviews, and in general I am happy about the result, and would accept the paper.

One very minor point remains: authors say that outcoming links are now listed at http://aemet.linkeddata.es/links/

However, that link does not work so it does not much help a reader.

First-round reviews:

Solicited review by Michael Lutz:

The paper describes the transformation and publication of meteorological data from the Spanish Waether Service (AEMET) as linked open data.

The description of the data set and the method to derive it from table data already published by AEMET are clear and the paper is well written. In particular, the authors focus on the used modelling patterns and the re-use of existing vocabularies for describing the data. Since the data is based on official quality-controlled data provided by a public agency, the quality of the data can be assumed to be high.

Of the characteristics of the data set requested to be described in the call, the authors cover:
- Name, URL, version date and number, licensing, availability, etc.
- Topic coverage, source for the data, purpose and method of creation and maintenance
- use of established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language expressivity, growth
- Examples and critical discussion of typical knowledge modeling patterns used

However, they do not mention reported usage (other than their own visualisation and "Another potential use of the dataset shows off when combined with other datasets in LD. For example, it can be used with the geolinkeddata dataset exploit- ing its geospatial information (ranging from airports or rivers to villages and provinces)."), and metrics and statistics on external and internal connectivity. Thus, it is difficult to judge the overall usefulness (or potential usefulness) of the dataset. Also, even though the authors point to some areas for future work, they do not explicitly address known shortcomings of the dataset. In the final version of this paper, these topics should be addressed more explicitly.

Furthermore, the following detailed comments should be addressed:
- Table 1:
- The patterns proposed for intervals is still not very clear. Why have you chosen composite strings (e.g. .../tenMinues_since_130644...)? How do you encode a "from-to" interval?
- What is the difference between DateTime and Instant?
- Section 4: Explain the reasons for choosing wgs84_pos over other "geo" vocabularies, in particular the more expressive geoSparql (http://www.opengeospatial.org/standards/geosparql).
- Section 6, query:
- I would suggest not to mix English with Spanish property names.
- The ultimate part (last 4 lines) of the query seems overly complex. Explain better why this is necessary.
- Why do you use a URI for the location rather than coordinates? In this context, the draft GeoSPARQL standard (http://www.opengeospatial.org/standards/geosparql), which defines a vocabulary for representing geospatial data in RDF, and it defines an extension to the SPARQL query language for processing geospatial data, may be relevant.
- Section 6.1: Explain why for , you don't use existing vocabularies, e.g. the SWEET phenomena ontology.
- Section 6.2: Explain the benefit of using the presented (LD/RDF) approach to create maps over other service-based methods for providing geographic data, e.g. OGC Sensor Observation Service (http://www.opengeospatial.org/standards/sos).

Solicited review by Willem van Hage:

This paper presents a linked open dataset and the corresponding ontology. It also offers an extensive methodological description on ontology development for linked open data.

Currently, the added value of _Linked_ Data in this paper is limited. The AEMET dataset is currently linked to only two other datasources. From the paper it is not clear which concrete opportunities would be provided by linking the AEMET dataset with other datasets available on the web. For this special issue I think this paper is fine, but it would become a much better paper if it gets extended to cover a use case that clearly can't be solved without the links to other data sets.

Please include references to
- the SPARQL endpoint of the AEMET dataset (section 6)
- the Map4RDF tool (section 6.2)

The python files for data processing are not available. Although ad hoc and specific to this case, these files could be useful for other users as they show the approach the authors took in collecting and converting data.

Solicited review by Tomi Kauppinen:

Authors present a work on publishing meteorological data as Linked Data. The work is a very interesting contribution, and shows an elaborated example about publishing observations. The paper is nicely written, and I have only some minor issues:

- Section 4: In sentence "Sensor ontology models the network of sensors and weather stations was based …" there is something missing.
- Section 5: Authors mention there are 153 links to geolinkeddata and 24 to DBPedia. Please give some examples about these links. Perhaps the DBPedia links could even be all listed since there are so few of them.
- Section 6: Please consider using always prefix versions of URIs to improve readability.
- Section 7: Authors mention that "less time would be required for the development of applications". Please argue why Linked Data allows to create applications in less time (and if possible please approximate how much less time is needed).
- Some references lack details. For example, where was "Linked Data on The Web" organized?

Tags: