Robust Named Entity Disambiguation with Random Walks

Tracking #: 1438-2650

Zhaochen Guo
Denilson Barbosa

Responsible editor: 
Harith Alani

Submission type: 
Full Paper
Named Entity Disambiguation is the task of assigning entities from a Knowledge Base to mentions of such entities in a textual document. This article presents two novel approaches guided by a natural notion of semantic similarity for the collective disambiguation of all entities mentioned in a document at the same time. We adopt a unified semantic representation for entities and documents—the probability distribution obtained from a random walk on a subgraph of the knowledge base—as well as lexical and statistical features commonly used for this task. The first approach is an iterative and greedy approximation, while the second is based on learning-to-rank. Our experimental evaluation uses well-known benchmarks as well as new ones introduced here. We justify the need for these new benchmarks, and show that both our methods outperform the previous state-of-the-art by a wide margin.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Sep/2016
Minor Revision
Review Comment:

The paper presents a new approach based on Random Walk on the problem of Named Entity Disambiguation (NED).
The paper is well written and the methods represented are clear.
However, I have the following notes:
1) The authors mentioned that the Random Walk approach is time consuming. Can you derive the time complexity of your approach? It would be also nice to compare the disambiguation time to the other approaches.
2) The paper mainly discusses a (Robust) approach for NED. Robustness is proved by using different datasets and by introducing a new datasets in which mention in the documents that vary in their difficulty level. However, still all the used benchmarks represent documents from almost the same domain which is the news articles domain. How about robustness across different domains and with a different style of writing like Tweets? There are many datasets available in series of challenges introduced by workshops like (Making Sense of Microposts (#Microposts)), and (Noisy User-generated Text (W-NUT)). It would be a great addition to the paper to show the robustness of the proposed approach on such datasets.

Review #2
By Ziqi Zhang submitted on 09/Sep/2016
Major Revision
Review Comment:

The paper introduces a named entity disambiguation/linking method based on random walks. NED/NEL is a very relevant task to the Semantic Web community and certainly a very challenging one. Despite a plethora of studies carried out in the recent years, there remains some issues to be addressed. For example, as the authors pointed out, previous benchmark datasets could be biased to 'easy' tasks, hence it is extremely valuable that the authors developed a new dataset in a controlled way and used it to evaluate a wide range of state-of-the-art methods.

Clearly this is an extension of the author's previous work in [10]. However, to publish as a journal article it is normally expected that a substantial part of the article must be new. Unfortunately at its current state, I cannot see this requirement being satisfied. Specifically, I'd like to point out that, the introduction, the related work, and the conclusion section of this article has re-used those from the original work [6] with very few changes. It is easy to see the same structure of content being used, and in many places, almost verbatim repetition. Hence I am seriously concerned that this could cause copyright issues if accepted as-is, and I strongly urge the authors to double check with the publisher of [10] and SWJ.

In any case, it is totally understandable that an extended work may want to re-use many of the content previously published. However, I would expect reasonable effort to be made to re-structure, and re-word the content. For example, you should consider using different examples for illustration (introduce); you should consider moving the discussion of limitations of existing work at such a detailed level in section 1.1 into related work, instead of just changing section title as you have done so far; you should do more detailed related work discussion, using e.g., tables/figures to compare and contrast. It is crucial that you must reduce the level of verbatim (or nearly) copying. This is the first issue.

Second, it is not clear to me what do you mean 'revised and simplified our optimization goal'. It seems to me that this is merely changing equations 3 and 4 from a linear model (sum) to a non-linear (multiplication). But there is no discussion on why and how this benefits your method. If this is one of the major changes in the newer work, you should give it a larger weight in the paper. Also, as a general rule, whenever you introduce changes to previous method, you should always always explain clearly the changes in the paper.

Third and in relation to the second, I think you should rebalance the weight given to different parts of the methodology section, or enrich it. In almost 5 pages of methodology nearly 4 pages are the same as [10]. You should focus more on the new changes (i.e., the two points you mentioned in the summary of changes)

Fourth, many parts of the paper need further clarification. See below:
- I notice difference of notations being used in the equations from [10] and this paper. However, in almost all cases, the equations are essentially identical. If so, what is the motivation to change them? Isn't it clearer and easier for readers to follow if you use consistently the same notations?
- The graph construction process is not as clear as it was described in [10]
- non candidate entities are pruned if they have a degree below 200: if this means the entity must be linked with at least 200 other entities then this sounds surprisingly high. Can you give readers some context, what is the average, max, min of the degree of these non candidate entities?
- In table 1, which of the WNED, L2R-CoNLL, L2R-SELF is the same as the system proposed in [10]? I suppose the answer is none, because you changed the equations 3 and 4, and hence the slightly different results from [10]. However this is not explicitly described at all.
- When you say you use a more recent and cleaner Wikipedia you should say exactly what version and provide a link
- it is not clear enough how the new benchmark datasets are created. How do you sample from ClueWeb and Wikipedia? What is the FACCI annotated ClueWeb dataset, is there a link?

Finally, it would make a lot of sense if the authors make a remark on how your method can be generalised to other KBs. The method currently is very tailored to Wikipedia. Will it work for, e.g., DBpedia and Linked Data in general, and if so how?

Review #3
By Ricardo Usbeck submitted on 14/Sep/2016
Minor Revision
Review Comment:

The paper extends a NED algorithm presented at CIKM 2014 by three aspects, namely a simpler optimization goal, a tuning for a learning to rank algorithm and a deeper experimental value. However, the paper is a bit outdated (see citations) and does not make use of the data infrastructure for repeatable and understandable science which has been created since 2014. I recommend updating the paper as it has a significant contribution and is well-written. Furthermore, I would like to see source code or a demo or a web-service as it is a nice contribution but will only be used by the community if it is somewhere out there for the sake of science. A last point, the paper is only slightly related to the topics of this journal, especially with respect to Linked Data and Semantic Web.

Positive points
Figure 1 is an excellent example to introduce the problem and give a first idea of the solution.
Section 4 is well-written and easy to understand.
Datasets used are available at (Licencing?) However, I doubt that hosting such datasets in dropbox is of value for the community. Please consider hosting it at datahub or the like.
Another really interesting point is the discussion of the used datasets and the performance analysis in table 2.
Section 5.5 is well-written and gives a good overview of why most NED approaches fail.

Major issues
No evaluation done via GERBIL. GERBIL is an community-accepted online platform for archiving and comparing experiments using semantic annotation tools, see Your experiments should be easily repeatable and extended to a wide range of tools and datasets. For example, compare yourself against
Your related work seems to be considerably outdated. Especially, you have only 1 work of 2014 and only 2 of 2013. Within the last two years and especially since the uprise of online benchmark frameworks like BAT, NERD, and GERBIL, there have been a dozen new approaches not covered here.
How did you account for overfitting your L2R algorithm using only 3 datasets without splits? Especially, how large is the fraction in L2R-Self if for example datasets have only 20 documents?
How exactly do you tackle outdated URIs and Wikipedia IDs in your datasets? This has a great influence on how systems perform. Especially, if the competing systems were trained on other version of the underlying KB.
Section 1.1.: Following your definition of NED, any approach is unable to identify if a mention does not (yet) exist in an knowledge or the mention is an entity at all. Please clarify w.r.t. emerging entities and rejection of entity mentions. Section 2.1 You introduce NIL, but is it used for non-existing entities or out-of-KB entities?
Why do you use Boosted Regression Trees? How do other models perform?
Why do you introduce a Learning to Rank Algorithm? Please motivate in the beginning of Section 4.2

Minor issues
Please list exactly the classes of entities your tool is able to link/disambiguate.
Please define what a KB is exactly.
Please define what a mention is exactly.
P2, left column, missing citation of local vs. global disambiguation methods (Cucerzan et al. 2007)
P3, Contribution 3, what does ‘meaningful’ mean?
P4, an NER process -> a NER process
P4, what should |M| stand for?
P4, can you please give an intuition what “normalized” means?
P4, please clarify G w.r.t. neighbour nodes. How does the depth of the surrounding influence your algorithm. See further ‘AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data’ by Ricardo Usbeck, Axel-Cyrille Ngonga Ngomo, Michael Röder, Daniel Gerber, Sandro Athaide Coelho, Sören Auer, and Andreas Both in The Semantic Web -- ISWC 2014
P5, In iteration i, we assign to mention m_i => shouldn’t that be mention m_j because m_i would mean that you can only have number of operation many mentions?
P6, ...disambiguation pages.To keep => ...disambiguation pages._To keep
P6, please clarify that you mean node degree in wikilinks if you speak about degree
P6, xperimental => experimental
P8, the threshold for NIL assignments should be mentioned earlier since this is an important plus point for your algorithm
P9, GLOW [28]: It seems to me that there is missing a word in the description of the approach

Post scriptum:
Using only accuracy to report performance gains does not capture the full reality since only f-measure also captures how often your algorithm overperforms, see
Please do not hesitate to contact me directly to discuss.