Robust Named Entity Disambiguation with Random Walks

Tracking #: 1511-2723

Zhaochen Guo
Denilson Barbosa

Responsible editor: 
Harith Alani

Submission type: 
Full Paper
Abstract. Named Entity Disambiguation is the task of assigning entities from a Knowledge Base to mentions of such entities in a textual document. This article presents two novel approaches guided by a natural notion of semantic similarity for the collective disambiguation of all entities mentioned in a document at the same time. We adopt a unified semantic representation for entities and documents—the probability distribution obtained from a random walk on a subgraph of the knowledge base—as well as lexical and statistical features commonly used for this task. The first approach is an iterative and greedy approximation, while the second is based on learning-to-rank. Our experimental evaluation demonstrates that our approach is robust and competitive on well-known existing benchmarks. We justify the need for new benchmarks, and show that both our methods outperform the previous state-of-the-art by a wide margin.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Ricardo Usbeck submitted on 12/Dec/2016
Review Comment:

Review Comment:
The authors addressed successfully the comments of all reviewers and extended the article significantly. Although I found some issues (as described below) I suggest to get that piece of work published as it advances the state-of-the-art in NED.
The chosen examples are good and very self-explanatory!

* I would rephrase "outperform the previous SOTA..." by "is able to outperform ..."
* Please check the use of the present and past tense in the introduction
* "while preserving the semantics of the NER problem" => Isn't it a NED problem?
* Section 2.2. I am missing an explanation for the random walks with restart that you do, i.e., the math behind it. Please add a paragraph about it.
* Is Graph G the disambiguation graph or the full graph. Overall, it is not clear how the disambiguation graph is constructed (BFS with depth 1?)
* Please add to the computational cost the big-O notation of your problem. I assume it is somewhere in O(n^3)
* P5 "commonly available" please define that more precisely
* Please add a description of how you constructed your dictionaries for the candidate generation and which indexing technique you used
* How does a ten-fold cross-validation with 1/5 training size work without overlaps. In my opinion, a ten-fold cross-validation needs to be done without overlaps in the train sets.
* Where is your code, models, webservice. Why isn't it open source? Basically, this leads to not comparable evaluations in the future (e.g. if GERBIL gets updated)
* Please describe why you didn't use DoSER (Zwicklbauer et al.) or the IITB dataset.

Minor Issues:
* P1 Missing citation for the claim that manual annotation is difficult (sake of completeness)
* P2 "in the second stage: mention disambiguation.\nThe second stage, mention disambiguation" weird formulation
* P3 Figure 1 is mentioned on page 5 for the first time, please mention it earlier in Section 1 or2
* P4 "This is accomplished by appropriately..." There is a word missing
* P4 Footnote 1 has a space too much
* P5 Algorithm 1 and 2 should be exchanged for the sake of the reading flow
* P6 In Algorithm 2 “ZKL” and “phi x psi” is not defined
* P7 cff. the naming of L2R.WNED and L2R.CoNLL is not concise
* Do not use but rather full URLs in case proprietary shuts down
* P11 Watch out for overflowing margins in column 1
* P15 "We described a method" Didn't you describe two methods?
* How exactly do you define "unpopular entities"?
* Table style in the appendix is different from the other tables

Review #2
By Mena B. Habib submitted on 14/Dec/2016
Review Comment:

I am satisfied with the improvements made in this version.
I just suggest to move the appendix which shows evaluations on GERBIL to the experimental results section in the paper.

Review #3
By Ziqi Zhang submitted on 22/Dec/2016
Review Comment:

The authors have made a good effort to improve the paper, which I think now is more compact, focused, and overall reads better. I think it is ready to be accepted. Though I still have a few questions that I hope the authors will clarify.

1. section 2.1: when viewing the KB as a graph, do you use a constrained set of relationships to form edges? Using DBpedia as example, did you use all object-object relations? For reproducibility this should be clarified.

2. page 7, left column, the 'training data' paragraph: so effectively your training data contains a set of pairs where e is the desired entity for the mention and e' is anything that e \neq e'. How many training pairs are used for each mention? Is it e paired with every negative candidate e'? Or is it more selective? Please clarify and explain.

3. Table 2: highlight the highest figures

4. w.r.t. your first claimed contribution, you added a paragraph on page 4 that says '... we simplified the objective function by changing the .... to remove one of the two parameters...' I still do not understand which parameters are we talking about? Please clarify, cross-reference to specific sections/equations in your previous paper if necessary. Also, I think you should talk about empirical benefits (but this may become clear if you clarify the simplification due to parameters) - compared to your cikm paper the results are comparable and there is no noticeable gain from this change in terms of effectiveness. So does the change improve efficiency? Can you discuss it?