Review Comment:
In this manuscript, the authors propose a framework for ranking of semantic associations (i.e. facts about entities) of entities appearing in a textual snippet. The premise is that, due to the different interests of users, ranking functions need to take into account the user interests in providing SA ranks that match those interests.
The approach makes use of heuristics to capture user interests, and additionally train a separate learning to rank (L2R) model for each user. The instances used for training are taken in an active learning scenario, with early stopping criterion, such that the amount of labeled instances is minimal.
Contextual information when reading news articles or other textual snippets per se is an important topic. Finding the right facts or triples that may help the user to have a better understanding for the topic at hand is crucial. Often the challenges are related to the coverage of facts from the existing KBs; identifying facts that take into account the entities, which co-occur in a textual snippet etc.
To this end, I think the problem in itself is interesting and it is worth pursuing techniques that aide the user in understanding content in which they are no proficient.
There are several issues with this manuscript, which I am listing and explaining them in detail.
1) First of all, the novelty of this manuscript is very limited. Even though there are changes to the previously published article (DaCENA), however, the previously published article is a demo and the scientific contribution is very limited. The heuristics that are supposed to capture user interests have no user features nor features that can maybe exploit the user similarities (as it is done in recommender systems) for training L2R models that are sensitive to users' interests.
2) Training L2R models for individual users is highly ineffective and additionally you will use a lot of information that can be leveraged from the other users that may share similar interests. I would strongly advice to have a look at the literature in the recommender systems. For instance, [1,2] (to name a few) propose personalized recommender systems in the use case of online news. Now, here the task is recommendation of online news, however, the task can be easily tweaked to recommend SAs.
3) The proposed SAs for a user are done through the DaCENA approach, which ranks the SAs based on a serendipity measure. How can this be a useful measure for a user who has no idea about a topic in which she is reading a news article? Furthermore, it is hard for me to understand how can you use TF-IDF measure to measure the relevance between a fact or triple with a news article? This is probably the wrong measure as the IDF will be the same for each term, thus, you are comparing only tf word vectors. In that case, the similarities will be eventually decided by the distribution of stop words or frequent words which may not have to do anything with the topic.
4) How can you justify that a user wants to read 50 to 100 SAs for an article? This seems like an arbitrary number and I find it hard to believe that any user would be interested in reading an additional 100 facts for a news article. What happens in the case where the knowledge base has no sufficient coverage of facts for an article? What is the fallback mechanism in such scenarios?
5) Another claim which I find unjustified is that the SAs are provided for some subject entity, which is supposed to be the salient entity in a news article. Based on your previously published approach, you state that this is simply the most frequent entity. Previous research has shown that predicting the salient entity for a news article is not trivial [3,4], especially, in the case of the SAMU dataset where you have only one paragraph, and most likely each entity will appear at most once.
6) Another unjustified evaluation setting is that in the SAMU dataset and LAFU dataset, you have different Likert-scales. Whats the rationale for having 6 in the first, and 3 in the second. These decisions are not justified and explained in the manuscript.
7) The last issue that I have is with regard to the expectation that for each news article or text snippet there is an expectation that the users will have completely different expectation or no common ranking of SAs. How can you justify this? It is clear that for a news article there can be only a predefined set of information facets. It is highly unlikely (if not impossible) that you will have endless facets. This leads to my point that for any news article for a given topic, the SAs will be probably highly centered towards the salient entity. Thus, a very low, nearly zero, inter-rater agreement is not convincing for me at all. This means that the users did some perfectly randomized rankings of the SAs. Otherwise, such a score does not make sense.
8) Connected to the previous point, it would be interesting to see what is the actual overlap in terms of SAs (without ranking) for two different users and the same news article.
9) It is also not clear how you have recruited the users? Furthermore, one conjecture that comes from the nearly random rankings between the different users may be explained by the amount of time the users take in completing the task (up to 12 mins).
Finally, the manuscript has several language issues (syntax errors and typos, e.g. "texts small" Pg. 9), which needs careful proofreading. In one case there is an entire sentence missing, which makes the understanding of the LAFU setting quite difficult.
[1] Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: a heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining (WSDM '14). ACM, New York, NY, USA, 283-292. DOI=http://dx.doi.org/10.1145/2556195.2556259
[2] F. Garcin, K. Zhou, B. Faltings and V. Schickel, "Personalized News Recommendation Based on Collaborative Filtering," 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Macau, 2012, pp. 437-441.
doi: 10.1109/WI-IAT.2012.95
[3] Besnik Fetahu, Katja Markert, Avishek Anand: Automated News Suggestions for Populating Wikipedia Entity Pages. CIKM 2015: 323-332
[4] Jesse Dunietz, Daniel Gillick: A New Entity Salience Task with Millions of Training Examples. EACL 2014: 205-209
|