Review Comment:
The paper is a survey on entity linking (EL) works that have Wikidata as their target knowledge graph (KG). Entity linking is defined as the task of finding the KG entities that correspond to a given set of extracted entity mentions (aka surface forms) from a given natural language utterance. In that sense, Entity Recognition (ER), which is the process of extracting entity mentions from utterances, is assumed to have been already performed, even if some EL works also perform ER first and then EL. The authors clearly state the coverage/scope of this work (i.e., the research questions that are investigated), and how they have decided to include/exclude specific papers (e.g., all EL papers not targeting Wikidata explicitly are excluded). After providing the scope, the authors define the overall problem, then they provide some background information about Wikidata (e.g., what is considered as a statement, a property, a reference in Wikidata) and what makes it special compared to other KGs, and then, they discuss existing EL approaches and EL benchmark datasets. The paper ends with a high-level discussion on the pros and cons of existing approaches and datasets, as well as some specific areas of improvement for future works to consider.
Overall, I found the paper very interesting, very useful and informative for someone that wants to quickly gain the big overview of the area, know about existing works, approaches and datasets. I have spotted some weak points though, which I believe will make the paper stronger if properly addressed. You can find them at the end of my review.
(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
The level of detail (not too detailed, but also not too abstract), the coverage, and the high-level discussion in this paper are great for a survey. I would not change much there, but I would include generic EL works (i.e., those working with Wikidata, but not targeting Wikidata explicitly).
(2) How comprehensive and how balanced is the presentation and coverage.
As in my previous comment, I found the paper very well-balanced. If space allows, I would add a few more words about the each datasets covered (see detailed comments).
(3) Readability and clarity of the presentation.
Overall, this was a pleasure to read, the structure was good and there were very few issues that can be easily resolved (see detailed comments and typos).
(4) Importance of the covered material to the broader Semantic Web community.
Even if this survey focuses on EL works that target Wikidata only, yet, due to the central role that Wikidata has played in the recent years, this work is of great importance for the whole Semantic Web community.
Detailed comments:
Before going to the weaker aspects of this work, let me first commend the authors for their writing, the inclusion of high-level comments and remarks on existing works (Sections 5.2, 6.2), and the suggestion of future improvements (Section 8).
I found two major issues (MIs) and some smaller issues (SIs) in the paper.
Major issues:
MI-1: Table 8 is not only non-informative, but also confusing and should be removed. Even if the seven (!) footnotes shed some light into this issue, yet, at the end, we have a table that is supposed to compare the works, but in that table, we compare F1 scores to accuracy scores and recall scores (whichever is available per paper), for different datasets. I don't see a good reason for keeping this table and discussion. If the authors wanted to experimentally compare the works (which is not mandatory of course), they should have run (new) experiments on a fair basis (using the same evaluation scores and the same parts of the datasets). Table 9 may remain there, as long as the numbers reported are for the same measure (which could be also placed in the table caption to be more clear); even if missing values appear, which is not ideal, but acceptable.
MI-2: I can understand excluding works that target different KG than Wikidata. What I don't understand and I am not convinced by the existing justification, is why the authors have excluded works that are generic enough to capture not only Wikidata, but also other KGs. I don't understand why the authors consider generalization beyond Wikidata to be a bad thing. I am not saying that targeting Wikidata only is bad, but the opposite is not bad either. What if Wikidata, similarly to DBpedia and Freebase in the past, stops existing or is overcome by a newer. more popular KG in the (near) future? In that case, all tools working only with Wikidata would be lost, while works going beyond Wikidata would still be valid.
With that in mind, as well as with MI-1 in mind, I would ask the authors to include in their overview the works that go beyond Wikidata (without needing to report experimental results for them).
Smaller Issues:
SI-1: Section 2.1: What if the paper title contained the words "Linking (...) Entities" (instead of "Entity Linking") consecutively or with other intermediate words? Are those works excluded? For example, one such paper (not targeting Wikidata though, and older than 2017) is "A declarative framework for linking entities" by Burdick, Fagin, Kolaitis, Popa and Tan (TODS 2016). Similarly for "Disambiguating (...) Entities", and also "benchmark" or "data" for the dataset search.
SI-2: Formalisms in Section 3:
SI-2a: You may want to use a different subscript than n in m_n, to avoid confusing with the number of words in an utterance, which may be different.
SI-2b: Why do the rank functions have the real numbers as their range? Isn't a rank function always returning a natural number?
SI-2c: Related to SI-2b perhaps, why do you want to maximize (argmax) instead of minimize (argmin) the ranks in the objective functions? Isn't rank 1 the preferred rank?
SI-3: Datasets like LC-QuAD are referred in 5.2, before they are introduced in Section 6. While reading 5.2, it took me some time going back to forth to even understand if LC-QuAD for example is a method or a dataset.
SI-4: The references should be more carefully examined. For example, the last reference [137] seems to be noise, while references 65 and 66 are duplicates.
Typos/syntax/grammar issues and minor comments (in order of appearance):
- abstract: "which is Wikidata lacking" -> "which Wikidata is lacking"
- page 1, col 2: "DBpedia Live [21] exists, which is consistently updated with Wikipedia information. But (...)" -> "DBpedia Live [21] is consistently updated with Wikipedia information, but (...)"
- Table 2: "Datasets must include Wikidata identifiers from the start" quite understandable, but please elaborate to avoid misinterpretations.
- page 7: "specify how long" -> "specify for how long"
- page 7: "qv_i \in V. (s,r,o)" -> "qv_i \in V. The triple (s,r,o)"
- page 7: "(Ennio Morricone, nominated for, Academy Award for Best Original Score, (for work, The Hateful Eight), (statement is subject of, 88th Academy Awards))" -> "(Ennio Morricone, nominated for, Academy Award for Best Original Score, {(for work, The Hateful Eight), (statement is subject of, 88th Academy Awards)})." (add curly brackets outside the pairs and end the sentence with a period.)
- page 9: "are item labels/aliases" -> "item labels/aliases are"
- page 10: "Both, Q76 vs Q61909968" -> "Both Q76 and Q61909968 (which is a disambiguation page)"
- page 10: "However, as Wikidata is closely related to Wikipedia, an inclusion is easily doable." please elaborate briefly on the close relation and on the inclusion
- page 10 (and in multiple occurrences): "the amount of" [times/methods] -> "the number of"
- page 11: "This link probability a statistic on (...)" -> "This link probability is a statistic on (...)"
- Table 7: remove the comma after "Wikipedia" (for Boros et al)
- page 14: "As it only based on rules" -> "As it is only based on rules, "
- page 15: "an existing Information Extraction tool" -> which one?
- page 15: "Open triples are non-linked triples" -> please elaborate a bit more
- page 15: "E2E" -> "end-to-end"? please introduce the acronym before its first usage
- page 15: "Therefore, 200 candidates are found" -> If my understanding is correct, this should be "up to 200 candidates", since overlapping candidates are possibly generated by the 4 methods. If not, please clarify.
- page 17: "micro F1" I think is more commonly referred to and more easily understandable as "micro-averaged F1"
- page 17: "tp are here the amount of true positives, fp the amount of false positives and fn the amount of false negatives." ->
"Here, tp is the number of (...) over a ground truth" (I guess there is a ground truth of correct links).
- page 22: "Eleven datasets were found for which Wikidata [...]" place the references after "datasets" or at the end of the sentence.
- page 26: "a limited amount of datasets were created" -> "a limited number of datasets that were created
|