TourMISLOD: a Tourism Linked Data Set

Paper Title: 
TourMISLOD: a Tourism Linked Data Set
Authors: 
Marta Sabou, Irem Arsal, Adrian M.P. Brasoveanu
Abstract: 
The TourMISLOD dataset exposes as linked data a significant portion of the content of TourMIS, a key source of European tourism statistics data. TourMISLOD contains information about the Arrivals, Bednights and Capacity tourism indicators, recorded from 1985 onwards, about over 150 European cities and in connection to 19 major markets. Due to licensing issues, the usage of this dataset is currently limited to the TourMIS consortium, however, a prototype application has already revealed its usefulness for decision support.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Revised manuscript after a "reject and resubmit", now accepted for publication. Reviews of the first round are beneath the second round reviews.

Solicited review by Michael Lutz:

The authors have satisfactorily addressed all comments raised by the reviewers.

Only one minor comment: using "LOD" rather than "LD" is misleading, even if the authors state that this is "in anticipation of the data being openly accessible".

Solicited review by Amit Joshi:

The revised version has sufficiently addressed the shortcomings I pointed out earlier. It's well written and includes detailed description on methodologies used for creation of ontology, triplification and external link generation.

Solicited review by Jesse Weaver:

Altogether, I am satisfied with how the authors have addressed my previous concerns. The new issues discussed in this review are very minor.

Regarding the fact that the dataset is not entirely available to the public, I am satisfied with the authors' rebuttal on this matter, that being "open" is not requisite for being Linked Data. This issue is also amply addressed in section 2.3.

Regarding the dereferencing behavior of the URIs, I am satisfied with the implemented solution. Perusal of the data suggests that slash URIs are used, and that the URIs 303 redirect appropriately. Section 3.3 is a complete description of this except that I think it would be appropriate, but not necessary, to state that the expected behavior is specifically 303 redirection (not just non-specific redirection, which could be 301, 302, 303, or 307), although such could possibly be inferred by default given the current resolution of httpRange-14. One other comment concerning the first sentence of section 3.3: "We use different namespaces for ontology elements and for resources, ...." Technically, ontology elements are resources (everything is a resource). A rephrasing like "for ontology elements and for assertion elements" (or something similar) might be more appropriate.

The problem with objects being exclusively literals (even when they clearly should be URIs) appears to have been resolved.

Following are issues not given in the previous review. They are all very minor.

There are very few and very minor grammar errors. Most common is the misuse of "however" as a conjunction (abstract, last sentence of section 2.2, in second paragraph of section 5). In these cases, "however" should be preceded with a semi-colon, not a comma. Additionally, the last paragraph of section 2.1 delimits complete sentences with only commas. A more appropriate way would be to precede the list of sentences with a colon, and then delimit each sentence with a semi-colon.

Figure 1 has boxes labeled "xsd:Int" which upon perusal of the dataset seems to be short for xsd:integer. This is a bit confusing, though, since there is also a xsd:int distinct from xsd:integer. This could be cleared up simply by changing the figure or even just adding a footnote to say that xsd:Int is an abbreviation for xsd:integer. Again, these are very small matters.

Also, it is common practice to include a "last accessed" date with URL references in footnotes.

First round reviews:

Solicited review by Michael Lutz:

The paper describes a dataset on tourism indicators computed for different locations and markets since 1985.

The description of the data set is clear and the paper is reasonably well written. In particular, the authors describe in detail the development of the used ontology and the creation of links to other data sets.

Of the characteristics of the data set requested to be described in the call, the authors cover:
- Name, URL, version date and number, licensing, availability, etc.
- Topic coverage, source for the data, purpose and method of creation and maintenance
- use of established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language expressivity, growth
- Examples and critical discussion of typical knowledge modeling patterns used

However, since the data set is only available to the members of the TourMIS consortium, the usage and usefulness of the data set is difficult to judge. While the authors mention this as one of the shortcomings of the data set, they do not argue convincingly that this situation will change in the future ("we hope that this restriction will be lifted soon"). Furthermore, they don not describe metrics and statistics on external and internal connectivity. In the final version of this paper, both topics should be addressed more explicitly.

Furthermore, the following detailed comments should be addressed:
- Section 2: It may make the paper more easy to read if sections 2.1 and 2.2 were swapped (see also next point), so that the reader gets some idea what the data set is about.
- Section 2.1:
- "Tourism indicators" should be explained earlier in the text (or some examples should be given) to give a reader unfamiliar with the domain some idea what the data set is about.
- "Indeed, if the decision-maker only makes use of one, isolated, data source his analysis is limited to the data available in that source and ignores other indicators that would allow discovering complex phenomena and designing more accurate forecasting models." -- This sentence seems to be a tautology.
- "However, tourism data sets primarily exist in isolation and they are often difficult to combine and compare automatically" -- The authors should explain why this is difficult / what the exact problems are.
- Section 2.2: It may be interesting to see the coverage of the TourMIS data on a map (which countries/cities are covered?). Does it only cover Europe? Which parts?
- Fig. 1: Why is dbo:PopulatedPlace not used directly in the ontology (rather than Country)? In general it would be good to show links to concepts in other ontologies directly in this figure.

Solicited review by Amit Joshi:

This paper presents TourMISLOD, a linked data version of TourMIS which contains tourism statistics about arrivals, bednights and capacity at European destinations. This information is undoubtedly important for tourism decision makers as well as for organizations that work closely with tourism related facilities like hotels and airports. As such, the linked data representation would prove useful to the semantic web community.

One of the contributions is that authors have created an ontology to represent the above mentioned tourism indicators. Authors have established links between dbpedia resources and the corresponding destinations in TourMIS. In addition, authors have also created a schema level mapping for one of its concepts to dbpedia ontology concept. These connections are important from linked data perspective. One suggestion would be to extend it to include links to geonames resources as well.

The paper is well written and easy to understand. Authors should provide few triples for illustration in section 3.3.

Solicited review by Jesse Weaver:

This paper motivates the importance of tourism data and describes the translation of an existing tourism dataset into the RDF data model utilizing a custom OWL ontology. While the content of the data (information about tourism) is interesting and its desirability well-motivated, there are major issues with the RDF version in regard to its qualification as Linked Data (in the widely promoted, Tim Berners-Lee sense as described in http://www.w3.org/DesignIssues/LinkedData ).

Firstly, aside from a sample, the RDF dataset is not publicly available. Although the authors make clear that there is significant interest and motivation in making (at least parts of) the dataset publicly available, there is no assurance that such will ever be achieved. Therefore, the dataset is Linked Data only in a hypothetical sense, nullifying the quality and usefulness of the current state of the dataset. This is unfortunate since the use case for such tourism Linked Data is well-motivated.

Secondly, should publication of the sample of the dataset with its associated OWL ontology be considered sufficient publication of the dataset, there are still significant issues with the dataset in regard to its qualification as Linked Data. The authors only describe the content of the data and its triplification into RDF, but triplification alone does not constitute Linked Data. There is no discussion about the design of RDF URIs in the dataset and how they should be dereferenced in order to find more, relevant information -- the foundational characteristic of Linked Data. Even so, upon perusal of the sample dataset, it appears that instance terms as well as ontology terms utilize the same namespace URI: http://www.modul.ac.at/ontology/TourismOntology.owl# . Attempting to dereference this URI results in HTTP code 404. Therefore, neither the sample dataset nor the ontology appear to meet the basic requirements of Linked Data.

Thirdly, the sample dataset seems to frequently (almost exclusively) place RDF literals in the object position of the triples, even when they clearly should be URIs. This prevents linking of resources even within the dataset.

Given these issues, the work seems to lack maturity for publication as a Linked Dataset Description at this time.

Tags: 

Comments