Review Comment:
*Originality:*
This paper is an extension of the work presented at the 5th workshop on Semantics for Smart Cities. Moreover, this is a revised version of the article following a first round of reviews.
The paper describes an interesting contribution and it is definitely relevant for the special issue.
The authors introduce an approach, called Semantic Abstraction, that makes use of Linked Data features for a generalised classification of social media text (tweets).
Linked Data based features such as DBpedia categories and entity types (together with other more traditional features extracted from text), are evaluated on Twitter data collected from ten different large cities. The proposed scenario is focused on classifying incidents from tweets and the proposed approach aims at: (i) improving the classification of datasets derived from only one city; (ii) enabling training and testing to be performed on datasets from different cities.
---
*Significance of the Results:*
According to the results of the experiments, the approach is promising and relevant, also considering its comparison with the state of the art. The evaluation is clear, sound and comprehensive. The datasets of the experiments conducted are provided by the authors for reproducibility.
The authors have taken most of the comments of the reviewers into consideration and improved the paper considerably. I would recommend to accept this paper for publication.
My main suggestion would be to include a short discussion on how to use different Linked Data sources (not only DBpedia but also domain specific ones) and additional Linked Data based features (not only categories and types), as suggested by one of the reviewers. The authors mention this only as future work but this is an interesting challenge and the authors' opinion on this would add value to the paper.
In this work tweets are represented as a set of words, as unigrams and bigrams, what about trigrams, etc.? Please add just a short justification about this.
---
*Quality of the writing:*
The paper is well written, clear and properly structured. It has been improved according to the reviewers' feedback.
A few minor remarks:
* Section 1:
In: "Semantically low ("accident" and "car collision") or more abstract level ("I-90" ....)."
it seems to be the opposite.
"The first involve training and testing our models on data" -> "The first involves training and testing our models on data"
* Section 2:
"Listing 1: Extracted DBpedia..." (not DBPedia)
** Section 2.2:
The words "proper" and "common" could be emphasized using italic font.
* Section 3:
** Section 3.2:
"explanation marks" -> "exclamation marks"
* Table 4 & 5: add "F-Measure" in the caption of the tables or in the tables themselves.
** Section 4.3:
"we investigated two crucial questions regarding the properties the different datasets." -> "we investigated two crucial questions regarding the properties of the different datasets."
* Table 8: Align the caption as in Table 9.
* References: refs. [25] and [26] are duplicates.
|