Review Comment:
Overall evaluation
Select your choice from the options below and write its number below.
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject
0
Reviewer's confidence
Select your choice from the options below and write its number below.
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)
5
Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4
Novelty
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
2
Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
3
Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present
2
Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
3
Review
This paper presents ongoing research of a system for event extraction from news articles.
The vision and results are at an intermediate stage in the course of the related EU project NewsReader. Useful components appear to be under development, and examples of how to use them for news stakeholders are provided.
My impression, despite the generally good approach taken, is that the work is still incomplete, both in its foundational aspect, and in its evaluation. For this reason I deem it barely appropriate for a conference like EKAW, but definitely not yet ready for SWJ.
For the first point (foundations), I have two issues:
a) the authors do not address significantly the state of the art, missing basic work on event extraction techniques in NLP, as well as in the SW (see below for a couple of references from within the DERIVE workshop series), and specially missing specific work on EE from news articles (see below for some examples).
Some important references in EE from news:
http://piskorski.waw.pl/papers/p749.pdf (from JRC)
http://www.rn.inf.tu-dresden.de/uploads/Studentische_Arbeiten/Diplomarbe... (from Dresden)
Some relevant DERIVE work:
http://ceur-ws.org/Vol-779/derive2011_submission_1.pdf
http://ceur-ws.org/Vol-1123/paper3.pdf
b) the theoretical approach is not deep enough in terms of linking knowledge extracted, and knowledge represented. In particular:
- the GAF and NAF annotation frameworks are more vocabularies for linking text and data: why not using existing ones? E.g. Lemon/Ontolex, NIF, Earmark, etc.?
- the pipeline is reasonable enough, and I like a lot that the modules are available in github; however, no details are given in the paper about the nature of the modules: how are they chosen? do they perform better than existing ones? how are they integrated, besides the generic STORM implementation? I am also puzzled by the time taken by some of the components to process one document on average! How nig is an average document? If it is a typical news article, the time taken by the NER (12.9s) and the SRL (39.8s) is huge! How scalable can such a system be on large, realtime news processing?
- finally, the demo at http://ixa2.si.ehu.es/nrdemo/demo.php is a nice showcase of the capabilities from the modules, but wrt the state of art, I have the impression that the system does not much more that annotating text spans with results of NLP. For example, in existing approaches such as LODifier and FRED (implemented and available at http://wit.istc.cnr.it/stlab-tools/fred), the annotations are organized in a connected RDF graph where each useful element extracted by NLP is given a semantics along the practices of LOD and the SW. In the case of NewsReader this seems completely missing. Not a big problem per se, but probably a paper for EKAW or a SW journal should be more aware of the issue.
For the second point (evaluation), the authors propose two studies on an automotive corpus and on wikinews. They are nice, but in what sense are they an evaluation? For sure a lot of triples have been produced, but how to assess that those triples constitute a good resource? The papers contains a couple of examples in showing links between persons or localization of facts told in the news, but these are only episodes rather than assessments. In practice, you might want to create a sample of what you expect from extraction and linking, and evaluate how the components perform wrt that.
|