Review Comment:
The paper presents the method and process for creating a tourism ontology, including code and tools for generating parts of the ontology and related knowledge graph. The paper is well written, and detailed. It is also quite long, and in some sections feels a bit repetitive. However, overall it is a good paper, that could be quite valuable to many practical ontology- and KG-engineering projects.
Regarding novelty and originality, the paper is not extremely novel, in that it describes quite a typical task, and a typical outcome, i.e. yet another tourism ontology and associated KG. However, it is the level of detail, amount of reusable resources, and discussion of the detailed issues, solutions, and lessons learned that become valuable to other researchers and practitioners. Also, the fact that some parts of the ontology are generated automatically by a script, is somewhat novel in itself. Regarding significance, the scientific value of the paper is also not extremely high, but in my opinion that is outweighed by its practical value. It really shows, from start to finish, how a large-scale ontology and KG generation project can be set up, what are the issues, and how can the results be evaluated. The quality of writing is also quite good, the paper is mostly clear, and does not contain many language issues, and few unclear or unsubstantiated claims.
Regarding the associated data and code, it seems to be available and it is documented with read-me files that explain how to run the code. I have not tested it, but it looks like it could be replicable without too much effort.
Based on these overall remarks, I suggest that the paper would be accepted with just some minor revisions. Below, please find some more detailed comments and questions, which should be addressed in such a revision:
The only comment I have that could warrant a larger revision is that some paper sections feel almost overly detailed, and a bit repetitive. It may sound a bit inconsistent, since the level of detail is on the other hand one of the merits of the paper. However, I think this is more a matter of improving the reading experience, than actually leaving something out. Perhaps some parts could be lifted out into an appendix, if that is allowed by the editor? Or collected in a larger table? It is in particular the subsections of section 3.4, which many times mostly consist of bullet lists, and the triple structures explained in the subsections of 3.5, that I feel become a bit hard to read. In this part of the paper I might have rather preferred just a “running example” throughout the text, and then the details and exhaustive listings of all the options and categories in some appendix or table. However, I realise this might be hard to achieve. It would also be beneficial to separate a bit more the “general” methodology, from the example of the TAO - which parts are generic and can be used for other domains, and which are TAO-specific?
Further, I have an issue with the paper title. The term “modelization” is used in the title, but is not used otherwise in the paper. And although it seems to be a correct term, it is perhaps not the most common one to use when talking about ontology modelling. In fact, when first reading the paper I had a hard time understanding the connection to the title, and had to look up the term. Perhaps it would be better to use a more common term, in the title, and at least to be consistent with the rest of the paper.
The related work section starts out well, but then on lines 39 and onwards there is no longer any proper comparison of the described approaches to the one presented in the paper. Also the last few approaches described there need to be compared to the current work.
The methodology that is nicely illustrated in Figure 1, is that the authors’ own invention, or based on some existing methodology? Citing the methodological basis of the described methodology would be suitable here.
It is not entirely clear how the use cases in section 3.1 are connected to the requirements (e.g. CQs) in section 3.3.1. For instance, use case 1, about topics of interest in reviews, does not seem to have any corresponding CQ in section 3.3.1. Overall topics of interest appears several times in the use cases, but not in the requirements at all. Some CQs are also all formulated as “examples”, i.e.with what I interpret as concrete instances mentioned, such as “Wi-FI”, and a concrete distance of 2km, while others are formulated in a much more generic way. Is there a reason for this difference, or can they be formulated in a more uniform manner?
It is a bit unclear if and how the Hontology is used. It is mentioned together with other reused ontologies, and described in detail on page 11, but then does not seem to be reused in the same way.
On page 10, the STI Accommodation ontology is mentioned on line 25, then Accommodation ontology is mentioned on line 32 - do you refer to the same ontology here? On the same page, the authors state “We reused…” on line 46 - what does reuse actually imply here? Import? Mentioning the URIs? Something else? Same on page 11, line 22, where the authors state that “We reused… by importing and extending a few classes…” - How did you do that? Import generally brings in the whole ontology, not just a few classes, so how did you just reuse a few classes then?
On page 11, in the paragraph about DBpedia both the term subsumption hierarchy and taxonomy are used - is there a difference, or are they synonyms? If yes, then why use two different terms.
At the beginning of section 3.3.3 the first aim states that the ontology should be “compatible with all the requirements”. What does that mean, i.e. to be compatible with? Shouldn’t all the requirements be fulfilled?
Overall, in section 3 it is not so clear what the actual scope of the ontology is. It was not until I got to the list in 3.3.3 that I understood that the aim is only to model lodging, not any other kinds of facilities or events. This could be made more clear from the start.
Figure 2 is illustrative and gives a nice overview, although it is a bit too small to be readable in all its parts, at least when printing the paper. Also, it is not entirely clear how to interpret the arrows - do they refer to domain and range of properties, or something else?
Line 31-32 on page 13 has two problems: first, it exemplifies a more general problem of paragraph divisions in the whole paper. Usually a paragraph should consist of more than one sentence, but in the paper there are many paragraphs with just one sentence, which should be fixed. The other issue is with the term “entity linking” - I don’t understand how entity linking could be used here? Or do you mean entity recognition?
I understand that the authors want to be comprehensive, but I am wondering if it is wise to include such detailed taxonomies into the ontology, as described on page 14, lines 36-43. In my experience, this is usually where ontologies go wrong, i.e. over-specifying bits that could easily be extended by later reuse, and instead become much less reusable, since there will always be some little bit in the taxonomy to disagree with.
Page 15, line 41, do you mean disjointness?
Page 16, lines 4-6 - this is VERY specific. I am sure that lots of users of the ontology would rather set these limits themselves. This seems like overspecification.
Section 3.3.4 discusses the way the ontology is generated from code. Here some example could be suitable, since this is a novel way to mange the process. Additionally, I would expect some discussion and comparison to similar approaches. For instance, I wonder why OTTR templates where not considered useful for this task, since they seem to have a similar function? This should be discussed.
It is not entirely clear what the numbers in Figure 4 are supposed to mean. Do they give an order of the steps?
Sections 3.4 and 3.5 would benefit from a running example that can be followed throughout the sections.
Section 3.4.6: Why only restrict the types of entities in the filtering step, why not from the beginning, when spotting & selecting candidates etc.?
How scalable is the data transformation process? Interms of time, space etc?
It is a bit unclear in section 3.6.1, whether this is just a description of the process, or if this is data that is actually generated and stored, i.e. does Figure 7 only describe a workflow, or does it describe the data collected and stored about the workflow?
Regarding the evaluation section, it is not very clear from the beginning whether you aim for mere validation of the solution, e.g. that it fulfils its requirements, or an evaluation, i.e. studying “how good is it” with respect to some quality aspects.
A more detailed discussion of what test data was used, should be included in section 4. Was the test data different from the data used to develop the ontology? Otherwise the ontology could be biased towards that specific set of data?
Top of page 33: I am not sure I agree that you can draw all those conclusions, simply based on a few measures. For instance, cognitive ergonomics seems to be a complex aspect, still there are only three simple metrics used to assess it. Here I would have expected an additional user study or similar, to be able to really state that it has good cognitive ergonomics. This should not be interpreted as that I think the evaluation is bad, in fact it is much better than what I usually see in most paper, but the authors draw sometimes a bit too bog conclusions from it. The conclusions can be expressed a bit more modestly, and limitations should be discussed.
Future work on page 34 could be extended by saying exactly how the ontology could be extended, as well as reused. Now the section is a bit vague.
|