Review Comment:
The revision has addressed the major issue i.e., lack of novelty, by adding substantial carefully designed experiments to discover some interesting insights into the proposed measures of NEL datasets. I very much appreciate the dedication of the authors and I think the quality and value of the paper has largely improved. But there remains many issues that perhaps a minor revision can address.
Among these, the most prominent one remains to be the math notations. Unfortunately the improved math still contains too much errors making it difficult to read. *** I urge the authors*** to invite a math person to proof read your next revision before submission, to make sure your math makes sense. To help you, i have rewritten large parts of your math and include this at the very end of my review. PLEASE BE WARNED that my recommendation is no guarantee of correctness, as for many, I have to base on my own understanding of your intention, which was unclear in many places.
Next, please answer/correct following minor issues
 page 13, just under section 4.2 title: to gain more insights on the interplay of annotators  I think you should use 'systems' instead of 'annotators' from this point onwards, as the second by convention is often understood as human annotators. Your previously occurrences of the word 'annotators' also meant humans, so it is confusing here.
 table 4, add a note to say the table is best viewed in colour. it was not clear to me until later that I understood the table is not just black and white. this did cause confusion while reading
 page 16, figure 19: 5 systems had a dip in performance in partition 7. This is certainly not just because of the confusion measure, can you give some insight why this happened?
 page 17 bottom: 'there is a weak correlation among hits and confusion of entities. This could be interpreted as with increasing partition number there are less entities with lower popularity, which might cause better results'. Why do we need to know the correlation between hits and confusion of entities? I thought you only need to talk about the correlation between hits and performance?
 page 20, 2nd paragraph on the left: the unfair dataset results are poor, could this be due to the amount of data is very small? Are any NEL systems supervised? Because this may cause overfitting and thus bad results, not necessarily bias in your datasets. Please comment on this
 page 20, last paragraph on the left: you should say a few words drawing conclusion from section 4.2, to explain why the 'easy' dataset is easy, and why the 'difficult' is difficult.
Finally, run a spell checker to correct misspellings. I will not list individual cases.
>>>>>>>MATH>>>>>>>>
Your math still has a lot of errors. I suggest 1) make corrections based on recommendations below – I have done these base on *my own* understanding which is no guarantee they are all correct or best form, so; 2) you MUST have someone familiar with math and your work to proof read it before next submission; 3) have a table as appendix to list all notations you used and an explanation  this is optional but will largely improve readability, considering that you have too many different notations in the paper.
ON PAGE 3:
 above equation 1: A dataset D is a set of docs d \in D (instead of t, which is confusing). A document consists of annotations and text as a tuple (d_t, d_a), where d_t is the text of d and d_a is the annotations in d
 Equation 1: change t to d
 Equation 2: delete this, the number of annotations in a document can now be d_a. Change accordingly your text underneath this equation
 Under equation 2: `let E^{D} (instead of E_{D}) denote all entities within a dataset D and S^{D} (instead of S_{D}) denote all used surface forms within a dataset D. QUESTION: what is the relation between ‘annotations’, ‘entities’ and ‘surface forms’? you should clarify this.
 Above equation 3: in general, the number of annotations within a document d_a is a measure … the average number of annotations PER DOCUMENT IN THE CORPUS (if not what you meant, change it), na(D), divides the total number of annotations in the corpus by the total number of documents: na(D) = \sum_{d \in D}{d_a} / D  your original equation does not make sense because your upper term is total # of annotations in corpus, your lower term, according to your current definition, is ‘the number of annotations WITHIN A (SPECIFIC) DOCUMENT’. What’s the semantics of dividing these two numbers and which document would it be?
ON PAGE 4:
 Equation 4 change to: nad(D) = {d  d_a = 0} / D  your original equation is wrong because the upper term does not make sense. The sum must add up numbers, but inside your sum ‘a(t)=0’ returns a Boolean
 Above equation 5: however, we propose…. As the relation between the number of annotations in the ground truth and the overall document length len(d) determined by the number of words, with ma …
 Equation 5: ma(D) = \sum_{d \in D}{d_a} / \sum_{d \in D}{len(d}
 Equation 6: By convention, big letter applies to set, small let applies to individuals. So ‘we use PageRank p(e) denote the pagerank computed for e, and the category interval is denoted by a, b \in [0, 1]: E^{D}_{a,b} = {e \in E^{D}  a <= p(e) <= b}  your original equation p(D,P) is misleading: the equation returns a SET of ENTITIES, so use variations of the big letter E instead.
ONE PAGE 5:
 Left, first paragraph: The overall set of all possible entities for a surface form is E^{s}, which is also referred to… The dictionary know to the annotator E’^{s} is a subset of E^{s} …  again, you are referring to set of entities, so use consistently big E instead
 Following the above part: the surface form of a dataset S_{D} can also be interpreted as a subset of V_{sf} =>> this does not make sense because S_D are surface forms, V_{sf} (now E^{s}) are entities. How could the first be a subset of the second??
 Following above, ‘the likelihood of confusion for the surface form …. Determined by the cardinality of the union of the know entities IN THE DATASET and the known entities to the annotators: E^{s}_{D} \cup E’^{s}  again your original equation D \cup W_sf does not make sense. D is DOCUMENTS, W_sf is ENTITIES, the union doesn’t make sense
 The second paragraph on the left column: … the overall set of all possible surface forms for an entity is S^{e} (outer lower box), which is also …. The annotations know only a subset S’^{e} which is a subset of S^{e} … the dataset … only contains S^{e}_{D} which is also a subset of S^{e}…. The likelihood of confusion ….by the cardinaity of the union of the known surface forms S^{e}_{D} \cup S’^{e} again for the same reason stated above, your original D \cup W_e does not make sense because it is a union between D documents and W_e surface forms
*** you should also correct **** your figures 2 and 3 accordingly, once you changed your math notations.
 On the right column: this part is very confusing. First, before you already said the confusion measure will use ‘E^{s}_{D} \cup E’^{s}’ and ‘S^{e}_{D} \cup S’^{e}’, but these did not appear in your equations 7 and 8. Also, what is ‘annotator system dictionary W’, and how is it related to ‘entities known in the dataset’ and ‘entities known to the annotator’. My understanding is that this is neither the first or the second, but it includes the first? I do not know how to correct this part. But it seems your equations 7 and 8 should be: c_{S} (D) = ({E^{s}_{D} \cup E’^{s} \forall s \in S^D }) / S^{D},
and c_{E} (D) = ( S^{e}_{D} \cup S’^{e} \forall e \in E^D ) / E^D. In words, the first is the ratio between (all entities known to the dataset plus all entities known to annotator) and (number of surface forms in the dataset); the second is the ratio between (all surface forms known in the dataset and known by annotators) and (number of entities in the dataset). Please rephrase these properly.
PAGE 6
 Equations 9 and 10: these appear to be generally ok but you need to use  to return the size of sets, change the notations accordingly also you need to make it clear how W (now should be replaced by something to do with E) relates to ‘entities known in the dataset’ and ‘entities known to the annotator’. So:
 Equation 9: dom_S(W,D) = ( \sum_{s \in S^D} { E^D_{s} / E^W_{s} } ) / S^D
 Equation 10: dom_E(W,D) = ( \sum_{e \in E^D} { S^D_{e} / S^W_{e} } ) / E^D
 Equation 11: this is still wrong. s \in W_sf doesn’t make sense. W_sf returns a set of entities, how can a ‘surface’ s belong to a set of entities? According to your words definitions, it should be something like: max_recall(W, D) =  { S^{e}_W \forall e \in E^D }  / S^D, where S^{e}_W returns the set of surface forms for entity e, in dictionary W. Again, I cannot emphasize less that you ***must*** clarify the relation between W, to ‘entities known in the dataset’ and ‘entities known to the annotator’.
 Equation 12 is again, wrong. You say t is in the range of (0,1), but your equation 12 certainly does not return a fraction number, but integers greater than 1. Rewrite as: let T(e) return the type of an entity e, the set of entities of a specific type t in the dataset D is {e  e \in E^D and T(e) =t}.

Comments