Review Comment:
The paper presents a new approach to fine grained news classification underpinned by an ontology of news topics.
I agree with the authors that "topic modelling" is too non-deterministic in its outputs, while those outputs are also often difficult to interpret.
Rather, a predefined set of topics may be determined to which news articles can be associated. Here, the choice of those topics is significant as it constrains the subsequent annotation of the documents, and hence it is particularly difficult to come up with annotation schema that can adequately cover broad use cases. In this respect, my first consideration was also how to justify any other classification scheme as better as the IPTC NewsCodes. I am not sure I agree it is "coarse grained" with 1,300 concepts. However, the authors go on to discuss concept detection within news content, which is indeed a finer form of annotation than classification. Maybe referring to the work as "news classification" is misleading in this sense, as it is IMHO used to refer to the allocation of data items to classes, and in this regard IPTC already does offer a broad classification schema. Since the authors move on in Chapter 3 to discuss annotation of news articles with "entities, events, situations.." it seems to me their goal is rather a structured and semantic description of news content and not just a classification (closest to that being 'categorical topics'). This should be clearer from the beginning of the paper (that classification per se, e.g. IPTC, is not expressive enough).
Regarding events, connections between events may be even more significant than collections of events. For example, the spatio-temporal relations between a number of subevents makes up a larger event. This seems similar to your "dependent events" but that you restrict relations to causality? Since your notion of situations also appears to depend on causality - the event's occurrence has created these situations - I feel the distinction between them needs to be made clearer.
Having defined the various aspects of a news classification framework, we turn to their formal modelling. This appears as a regular triple-based knowledge model. I would have preferred to understand first the ontology (the set of classes and relations fundamental to the framework), as here in Chapter 4 a relation like "hasBusinessConnection" simply appears in an example without the clarity if this is part of your news classification framework's vocabulary. A graph illustration of core classes and relations of your ontology at the beginning would greatly help (another example of this is in 4.5.1., where the relation "isEffectiveAgainst" is used but it does not seem to be a core part of the class 'Claim').
Most of the presented modelling is not new or innovative, indeed existing ontologies are (rightfully) reused where they already model the desired aspect. As such, I am not sure if so much space is needed for Chapter 4. Then Chapter 5 largely focuses on the representation of this framework in OWL. An evaluation is focused on the logical and structural consistency.
In my view, the value of developing this framework (and its representation in RDF/OWL) needs to be shown in its usage (creation of semantic descriptions of news items - how it compares to the state of the art) and how those descriptions (conform to the ontology) evaluate "better" than existing classifications (e.g. more expressiveness). In Chapter 6, how many annotators manually classified the 224 news articles? Was annotator agreement tested? I would shorten the previous chapters and provide a more much comprehensive insight into the evaluation results. The results of this manual annotation are presented but I miss a comparative reference -- how can I know this is better or worse than using some other framework? For example, Chapter 7 begins by presenting this to address a "comprehensive model of what types of concepts provide the main subject matter for news items" - so why not compare this framework with other works to capture the meaning of news articles in terms of coverage and completeness? Of course, I appreciate we may be lacking a "ground truth" here of what would constitute the most desirable model.
This would however offer a more holistic view of the work being presented (the "classification framework") and how it compares. Chapter 7 still considers contributions separated into the different aspects. A table at the end would be helpful to highlight in which aspects some innovation is present in your work, and what (other parts are clearly reusing state of the art).
Section 7.4 states truthfully "our model follows standard practice...", and this is for me the major problem with this as a journal paper: highlighting the actual contribution of the work that goes beyond the state of the art. For me, the "limitations of the current solutions" (Chapter 8) are not adequately highlighted, and less so how your work addresses them. It may be an issue of better structuring the paper, and spending less time on the formalization/ontology modelling and more on evaluating and discussing results (a news article described using this framework - how does it compare to SoA models, how is it "better"?). You mention "the needs of media scholars and practitioners", but have these been formalized in any way through a focus group or expert interviews? This could act as a baseline for evaluation.
In conclusion, you also write this is "the first step" and I feel it is indeed a good step forward but possibly not far enough for a journal publication. More empirical evaluation is needed to justify the ontological decisions made.
|