Review Comment:
This is a revised version of an earlier submission, presenting a methodology and some resources related to tourism KGs and in particular assessment of tourism-related entities, such as places and accommodations. The paper has been considerably improved since the last version, and is now much more clear and the contribution is clear based on what is described in the paper. However, the paper still needs some work in order to be publishable, see suggestions and questions below:
There needs to be a section/paragraph on delimitations early on. Now it is only when we get to the KG building itself, and the sources, that we learn that including restaurants and museums etc is part of future work.
The related work section still needs some improvement, in terms of analysing the related work, and not only listing it. I do agree that this work has novelty, but it is not clearly defined in relation to the related work. In particular, I find section 2.1 to be quite confusing, and it does not specify a clear gap that this proposed work fills. First the authors say that the categories of methods are “collaborative”, “non-collaborative” and “custom”. Then the authors go ahead and describe the “custom” category sounding as if this is all about collaboration, e.g. “involvement of communities of practice” and “most of the time collaborative”. So what is the collaborative method category then? However, it is the last paragraph in that section that needs most extension, since it is not clear if the authors actually use an existing methodology, or develop their own? And in the latter case, why, and how is it based on existing work? Also section 2.2 needs re-writing, since that reads as a list of no-accessible or non-maintained resources, while not analysing them any further and without a discussion how this work builds on them. It is clear that the exact same resource does not exist, but the resource is not the main contribution anyway, as stated in the introduction, so the authors have to instead compare methods and scope etc., to show how these were still used, or why the authors could not simply resurrect or reimplement one of the older resources.
Additionally, section 3 also needs re-writing. The heading is “methodology for ontology design” but the section is not really about that, it has a much broader scope. This is the whole knowledge engineering methodology, i.e. one of the contributions of the paper. Only one small part of this methodology is about the ontology actually, in step 3.
I think the ontology sections are the ones that improved the most since the last version, it is now much more clear how the ontology is structures, what it actually contains, and how things were defined/used/reused etc. However, some small unclear points remains, such as: how do you know that the ontology can be easily extended, as you claim in point 8 on page 11? There are also still some small unclear points in Figure 2: I am pretty sure you don’t mean that owl:equivalentClass has domain tao:LocationAmenity and range acco:AccommodationFeature? Some arrows being used several times with the same label also seems to indicate that the domain is not a single named class, but an expression? tao:partOf also seems to be a very generic name for something that only holds between accommodations and lodging facilities. Figure 3 is also not entirely clear. What is the TAD base ontology? And how does it relate for the TA ontology, where I assume the latter is TAO? But then why is TAO the label of the man at the desk to the bottom left? That seems to be some user rather than an ontology. Further, I have a bit of an issue with the formulation of the competency questions, which seems to be a bit random in terms of their level of abstraction. I know the authors mention that they sometimes use very specific things, like wi-fi, to get questions directly translatable to SPARQL. However, I don’t really agree on this motivation. Also a more general question would be directly translatable to SPARQL, since I am sure wi-fi has superclasses or a type. I would suggest to keep the CQs abstract and “instance-free”, and then if it is useful for the testing, say that you create one or more detailed questions out of them, which will render a SPARQL query that should be usable without additional inferences of the ontology (e.g. without materialising all the types of some instance based on the is-a hierarchy).
Finally, I would suggest to include a discussions section, e.g. as a separate section before the conclusions, with some outlook on the possible implications of the work, how it can be used, what the limitations and lessons learned are etc. With the list of contributions stated in the introduction, really discussing all of them here would be very valuable to the reader. Similarly also the conclusions section should refer back to the introduction, and the list of contributions (which are now not even all mentioned in section 6).
Overall, I think all of these things can be fixed, and none of them risk to invalidate the conclusions or results of the paper, hence, I therefore do not think that another round of reviews is needed, but merely a quick check of the final version before publishing.
Detailed questions/comments:
The introduction section nicely lists the contributions, but could have benefited from also having some clear research questions of the work, that can be revisited in the conclusions section.
What is the difference between contribution 2 and 4 in the list on page 2? Are these different softwares? Methods?
What does “formal” mean in the first paragraph of section 2.1?
I find UC3 quite unclear. The other use cases are specific tasks, such as identifying topics in text (UC1 and 2), do sentiment analysis to identify sentiments towards something (UC4), classification of destination (UC5), but UC3 just says “support the recognition and linking of tourism entities in the KG for different applications revolving in the domain of social media …”. What are the applications of the KG there then?
In footnote 13: are you sure that a meeting room should count as accommodation according to your definition? If the definition is that broad, then also cars can accommodate people (a person can be inside a car).
Step 4 on page 14 is that entirely automatic? Seems very challenging. Or is there a user involved in validating the mappings?
There is a difference between the PROV model as such, and the PROV-O (ontology). It is not so clear which is meant in section 4.3.1. For instance, at the bottom of page 20 the authors say “In PROV we have three main classes:”, which should probably be “In PROV-O we have three main classes:”. Also in this section, it is not so clear why all this provenance data is collected. What are the use cases that need that? What are the requirements? And how is this data used in the end?
In section 5.2.2 suddenly the authors mention a SHACL file with further constraints. If this is not part of the KG actually, then why is it created and tested here? If it is actually a part of the KG definition and development, then it should be mentioned and described much earlier.
I am not really happy of the references in section 5.3. Ref [8] is mentioned as a source of metric definitions, while it seems to be rather applying and using existing metrics. [25] on the other hand is an appropriate reference there. On the other hand, OntoMetrics is just mentioned and links are provided, but never cited. As far as I am aware there is at least a workshop paper about it to cite. Still, some parts of section 5.3 are really good - those that really analyse what the resulting numbers mean, and don’t just list them, nice!
Layout, language and structural issues of the paper:
- In the abstract: “semi-automatic generating” -> “semi-automatically generating”
- Introduction, first sentence: shifting -> shift
- Page 2, first contribution: automatically -> automatic, and graph -> graphs or a graph
- Figure captions are sometimes placed above the figure and sometimes below. The normal placement would be to always put them below the figure (while the opposite is often used for tables).
- The division of the text into paragraphs needs to be revised throughout the paper. Many paragraphs are just 1-2 sentences long, and actually belong together with other paragraphs. Normally, a paragraph should be at least 3-4 sentences long, at least.
- The structure of subsections is not good. In several cases there is only one subsection of a section, this should not happen. Either you have several subsections or you skip them all together. Examples of this is 3.2 that only has a 3.2.1, and 4.2 that only has 4.2.1 and so on.
- I may be getting old, with bad vision, but Figure 4 is really too small for me to comfortably read when i print the paper.
- Section 4.1, first paragraph: direct -> directed?
- Once you get further into the paper, you start to be confused by the different numbers of different things. There are too many bullet lists and figure with numbering. For instance, how is the numbers in figure 1 related to the sections in chapters 3 and 4, and what about the numbered list at the top of page 17, are these sub-steps of one of those phases in Figure 1? I would suggest to think about some consistent numbering, e.g. the main phases in Fig 1 named 1-6, then all subtasks/steps have numbers like 1.1, 1.2 and so on. Otherwise it is hard to follow where in Figure 1 we are at any given time.
- Sometimes you write TAO and sometimes TA Ontology (e.g. in Fig 5), probably this is the same thing, but the reader starts to doubt it after a while.
- Section 4.3, bullet 1 of the first bullet list: “at the and of”?
- Convert the query and the table on page 23 into a proper (numbered) listing and table, and refer to their numbers in the text. Also, for the text right beneath the table: use a footnote for the URL instead of having it in the text directly.
- On pages 26-27 Accommodation Ontology is suddenly called Acco.
- Appendices should be places after the bibliography.
- The reference list needs more work, many of the references are incorrect. Just reading the first 10 references, I find at least 5 that lists the publisher or the book series as if it was the book title, and totally lacks the book title (see 1, 3, 4, 7, and 9), and one that lacks the publisher instead (10). I did not go through the whole list, but this should be corrected throughout.
|