Review Comment:
The article presents a survey on Information Extraction systems in the context of Semantic Web. Particularly, it discusses extraction and linking of entities, concepts, and relations for unstructured and semi-structured resources.
Although it describes almost recent techniques --with details of features/background resources, etc.-- in the field, there are still important points/discussions missing. That is, I would love to see deeper analysis of the difference between these systems (or some prominent systems) by design in all main Sections. For example, a suggestion for Section 2 could be some systems (e.g., AIDA, J-NERD, etc.) focus on output quality while some others (e.g., AIDA-light, TagMe, etc.) focus on speed. Or some systems (e.g., AIDA variants) only work on named entities while some others (e.g., DBpedia Spotlight and TagMe) also include Wikipedia concepts. There are some discussions here and there in the article, but it should be explicitly presented.
Any pros and cons of using a system? Any pros and cons of using an architecture (e.g., in Section 4, page 43, EEL and then open IE vs. open IE and then EEL)? Any ways to tune a trade-off between precision and recall? etc. This is useful for people who want to choose an off-the-shelf tool for further work.
Additionally, as the article touches many tasks, it would be great if the authors can discuss which tasks are relatively well-studied and which tasks are still promising for new research. Any open problems still need to be addressed? This is useful for researchers or PhD students who want to work on this field.
All in all, the manuscript provides a comprehensive survey on an important field (i.e., Information Extraction) which is highly relevant to Semantic Web community. In general, it is well written and is easy to follow, however, it can be improved.
-----
Minor comments:
1. Descriptions of systems should be checked more carefully, for example: 1.1.) the main difference between AIDA and KORE is the semantic relatedness computation between entities. Even though this point is discussed in page 19, I see it neither in the main descriptions of the two systems nor in the Table 1; 1.2.) AIDA-light uses Stanford NER to spot mentions, and only take sliding windows over the text for extracting the context of a mention (page 13); etc.
2. Title of Section 3 (i.e., Concept Extraction & Linking) may be misleading. E.g., at first, I thought this section is only about Word Sense Disambiguation.
3. Different writing styles (e.g., marry relation in page 2 and meet relation in page 45).
4. Typo in page 10 "...consider Wikipedia as a KB ... as a reference KB".
5. It is probably better to know some state-of-the-art results on some prominent corpora.
6. Some citations needed (e.g., in page 40 "Some systems rely on traditional RE processes, where extracted relations are linked to a KB after extraction...").
etc.
|