Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

Robust Named Entity Disambiguation with Random Walks

Submitted by Zhaochen Guo on 11/29/2016 - 01:42

Tracking #: 1511-2723

Authors:

Zhaochen Guo

Denilson Barbosa

Responsible editor:

Harith Alani

Submission type:

Full Paper

Abstract:

Abstract. Named Entity Disambiguation is the task of assigning entities from a Knowledge Base to mentions of such entities in a textual document. This article presents two novel approaches guided by a natural notion of semantic similarity for the collective disambiguation of all entities mentioned in a document at the same time. We adopt a unified semantic representation for entities and documents—the probability distribution obtained from a random walk on a subgraph of the knowledge base—as well as lexical and statistical features commonly used for this task. The first approach is an iterative and greedy approximation, while the second is based on learning-to-rank. Our experimental evaluation demonstrates that our approach is robust and competitive on well-known existing benchmarks. We justify the need for new benchmarks, and show that both our methods outperform the previous state-of-the-art by a wide margin.

Full PDF Version:

swj1511.pdf

Previous Version:

Robust Named Entity Disambiguation with Random Walks

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Ricardo Usbeck submitted on 12/Dec/2016

Suggestion:
Accept

Review Comment:

Review Comment:
The authors addressed successfully the comments of all reviewers and extended the article significantly. Although I found some issues (as described below) I suggest to get that piece of work published as it advances the state-of-the-art in NED.
The chosen examples are good and very self-explanatory!

Issues:
* I would rephrase "outperform the previous SOTA..." by "is able to outperform ..."
* Please check the use of the present and past tense in the introduction
* "while preserving the semantics of the NER problem" => Isn't it a NED problem?
* Section 2.2. I am missing an explanation for the random walks with restart that you do, i.e., the math behind it. Please add a paragraph about it.
* Is Graph G the disambiguation graph or the full graph. Overall, it is not clear how the disambiguation graph is constructed (BFS with depth 1?)
* Please add to the computational cost the big-O notation of your problem. I assume it is somewhere in O(n^3)
* P5 "commonly available" please define that more precisely
* Please add a description of how you constructed your dictionaries for the candidate generation and which indexing technique you used
* How does a ten-fold cross-validation with 1/5 training size work without overlaps. In my opinion, a ten-fold cross-validation needs to be done without overlaps in the train sets.
* Where is your code, models, webservice. Why isn't it open source? Basically, this leads to not comparable evaluations in the future (e.g. if GERBIL gets updated)
* Please describe why you didn't use DoSER (Zwicklbauer et al.) or the IITB dataset.

Minor Issues:
* P1 Missing citation for the claim that manual annotation is difficult (sake of completeness)
* P2 "in the second stage: mention disambiguation.\nThe second stage, mention disambiguation" weird formulation
* P3 Figure 1 is mentioned on page 5 for the first time, please mention it earlier in Section 1 or2
* P4 "This is accomplished by appropriately..." There is a word missing
* P4 Footnote 1 has a space too much
* P5 Algorithm 1 and 2 should be exchanged for the sake of the reading flow
* P6 In Algorithm 2 “ZKL” and “phi x psi” is not defined
* P7 cff. the naming of L2R.WNED and L2R.CoNLL is not concise
* Do not use bit.ly but rather full URLs in case proprietary bit.ly shuts down
* P11 Watch out for overflowing margins in column 1
* P15 "We described a method" Didn't you describe two methods?
* How exactly do you define "unpopular entities"?
* Table style in the appendix is different from the other tables

Review #2

By Mena B. Habib submitted on 14/Dec/2016

Suggestion:
Accept

Review Comment:

I am satisfied with the improvements made in this version.
I just suggest to move the appendix which shows evaluations on GERBIL to the experimental results section in the paper.

Review #3

By Ziqi Zhang submitted on 22/Dec/2016

Suggestion:
Accept

Review Comment:

The authors have made a good effort to improve the paper, which I think now is more compact, focused, and overall reads better. I think it is ready to be accepted. Though I still have a few questions that I hope the authors will clarify.

1. section 2.1: when viewing the KB as a graph, do you use a constrained set of relationships to form edges? Using DBpedia as example, did you use all object-object relations? For reproducibility this should be clarified.

2. page 7, left column, the 'training data' paragraph: so effectively your training data contains a set of pairs where e is the desired entity for the mention and e' is anything that e \neq e'. How many training pairs are used for each mention? Is it e paired with every negative candidate e'? Or is it more selective? Please clarify and explain.

3. Table 2: highlight the highest figures

4. w.r.t. your first claimed contribution, you added a paragraph on page 4 that says '... we simplified the objective function by changing the .... to remove one of the two parameters...' I still do not understand which parameters are we talking about? Please clarify, cross-reference to specific sections/equations in your previous paper if necessary. Also, I think you should talk about empirical benefits (but this may become clear if you clarify the simplification due to parameters) - compared to your cikm paper the results are comparable and there is no noticeable gain from this change in terms of effectiveness. So does the change improve efficiency? Can you discuss it?

Log in or register to post comments
10096 reads

Main menu

Editorial Board

Syndicate

Robust Named Entity Disambiguation with Random Walks

Tracking #: 1511-2723

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Robust Named Entity Disambiguation with Random Walks

Tracking #: 1511-2723

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles