Review Comment:
This manuscript provides a survey on the topics of claims, fact-checking, and argumentation, proposes a conceptual model on these topics, and then reviews existing information extraction and knowledge engineering tasks in from the point of view of the introduced model. The paper is well written, easy to read, and generally well structured.
The topic is very interesting and relevant to the journal. In particular the fact that it brings together the approaches and viewpoints from natural language processing with the ones from formal modeling is very valuable.
The literature survey is well done and seems to have good coverage.
The model as introduced makes sense and seems to succeed in bringing the variety of existing works together. I have a number of more minor comments on the model below, but the overall structure is convincing.
The final discussion and review of existing information extraction and knowledge engineering tasks is also interesting and valuable, but could in my view be a bit better structured, in particular to emphasize the different connections to the conceptual model. These connections are indicated but it is a bit difficult to get a general overview of how the full set of existing tasks maps to the concepts of the model. It would, for example, be useful to have a version of Figure 5 where we can see where in this conceptual space the different tasks are located. I feel that they would cover the diagram to a large extent, which would be very nice to actually see.
Apart from this, I would like to raise the following important points:
- This seems to be an extended version of a previous paper of the authors [25]. The details of the extensions and differences to this earlier paper need to be made more explicit and more detailed in my opinion.
- I am missing a URL to a machine-readable version of the Open Claims model.
Below I provide a list of more minor issues. Overall, however, I judge the quality and value of this manuscript as very high and expect that a version that is revised accordingly should be accepted for publication.
Minor comments:
- sometimes acronyms are introduced but then never referred to, e.g. "pay-level domains (PLDs)"
- Generally, I would not introduce acronyms for short phrases like "argumentation mining (AM)" but use the full phrase throughout. Papers are often not read linearly, and then such acronyms can be confusing.
- "research focused on natural language claims": I think this research branch should be better labeled/described.
- lines 39-41 of page 1 column 1, and surroundings: the difference between the terms "utterance", "statement", and "sentence" is not clear here.
- "fact-checking sites [23, 24]": unclear how this relates to the main claim of the sentence about strongly diverging models.
- "This work is meant to facilitate an unambiguous representation of claims across various communities": I feel that "unambigous" is a bit too strong a word in this context (as the representation of the claims is only unambiguous at a relatively shallow/syntactic level).
- "[...] comes to show that these works do not fully contribute to closing the terminological and conceptual gap that exists in and across fields": I find this only partially convincing. Good quality and coverage of a survey/model doesn't necessarily imply good uptake. From this argument, moreover, it's not clear what should make us confident that the presented survey won't follow the same fate. A more convincing argument, in my view, would be to say that all these existing surveys looked at claims/facts in a more narrow sense or more narrow domain than the overarching model/survey presented here.
- the quoted introductory definition of evidence as "text, e.g. web-pages and documents [...]" doesn't seem very helpful. In fact, "evidence is a kind of text" seems conceptually wrong. Similarly later on with "Stances are usually defined as text fragments [...]" (though it is clarified somewhat later in the same paragraph).
- "Closely related to this is the notion of a rumour.": There is a bit of a sharp transition here, as the previous sentences talk about scientific claims and evidence, and the term "rumour" doesn't seem to be closely related in this domain.
- Maybe the two short subsections 3.2.5 and 3.2.6 could be merged.
- I find it a bit confusing that in section 3.3 entitled "Summary" new references are introduced (e.g. "[121]"). I think "Summary" is not a good title here.
- "In the case of a fact extracted from a knowledge base, the speaker equals the knowledge base reporting the fact": I feel a bit uneasy about equating the knowledge base with the speaker role. What about a knowledge base that stores provenance information about the stored facts/claims, including who said it? Who would in this case be the "speaker"?
- "In contrast to a claim, it is not necessarily embedded in a discourse": unclear what "it" refers to here.
- "A representation can have the form of freetext, e.g., a sentence that best describes the proposition": Wouldn't in that case the representation also be an utterance?
- "one or more reviews, and iii) one or more attitudes": shouldn't that be "*zero* or more ..."?
- "attitude is an opinion on a given topic (e.g., a viewpoint)": Does that imply that "viewpoint" would be a subclass of "attitude"? And what about the notion of a "stance" as introduced earlier? Would that also be a kind of attitude?
- I find "Annotation" to be a confusing class name, as many of these things can be seen as annotations. I think something like "Linguistic Feature" would be more appropriate.
- "what was said (linguistic representation of claim utterance)": shouldn't the "what" also include the content of what was said, so the claim proposition?
- Maybe "Time" or "Date/Time" would be a more appropriate class name instead of "Date".
- "Author" seems to indicate that something was written down. Does the model also cover spoken utterances? I think there is no reason not to, in which case the name "Author" seems confusing.
- The relations described in section 4.2.4 don't seem to be depicted in Figure 5, but I think that would be helpful.
- Can the formal representation of a claim proposition also point to a set of RDF statements, for example by the use of a named graph?
- Section 4.3: You don't mention OWL here. Did you not use OWL for basic ontological restrictions, e.g. domain/range for relations? I think that would be useful.
- I was wondering whether section 5.3.1 are part of the model definition. Are these three relation types part of the model introduced earlier? If not, wouldn't it make sense to include them?
- Typos: "Ressource", "et al.."
|