Review Comment:
Originality and Significance:
- Using Word Associative Norms is an innovative approach to construct word embeddings.
- the proposed method of construction of word embeddings outperforms other widely used methods such as Word2Vec, Glove, and Fast Text.
- the proposed method is computationally efficient. Its easier to train.
Needed Corrections:
- Page 2, right column, end of 2nd paragraph, says: “In Section 4, we present the evaluation of the generated vectors using standard data sets for word similarity in Spanish.”
However, there is no connection to “Spanish” The dataset used is EAT which is English and evaluations are also English. So the mention of Spanish is confusing.
- Page 2, right column, last paragraph, says: “the Small World of Words deals with nine different languages.” However, the footnote link https://smallworldofwords.org/en/project has 14 languages. maybe: “the Small World of Words contained datasets in 14 languages at the time of writing”
- Page 3, left column, footnote 3 http://www.eat.rl.ac.uk/ is a broken link. You may want to take it down or use a link from internet archive: https://web.archive.org/web/20161030032628/http://www.eat.rl.ac.uk/
- Page 3, right column, top paragraph says: “Their age ranged from 17 to 22.64% of the participants were males and 36% females”. Its ambiguous since a white space is missing between 22 and 64%. Maybe this rephrasing should fix: “The participants were age between 17 and 22, among which 64% were males and 36% were females”.
- Page 3, right column, bottom section. Edge weight is defined as φ : E → R, i.e. φ : (v_i , v_j) → R . However, the notation used for defining Frequency and Association Strength is not the same, which makes it hard for the readers.
- Page 4 left column, section 3.1 : The authors refers to 4 parameters of node2vec : p, q, d, and l. They do a thorough analysis of d and l (later in the results section) however it is not clear what values were chosen for p and q. Please describe the values for p and q.
- Page 5, Figure 1: The header is stated as “weight=Association” and “weight=Frequency“. However as told in the writing (section 4, paragraph 2), these weights are “Inverse association strength” and “Inverse Frequency”. Please make it consistent.
- Page 5, left column: Authors mentioned about the overlap between Wan2Vec vocabulary and the test datasets. However, it is not clear what exactly is done to the words that were not seen in vocabulary (out-of-vocabulary, aka OOV) words. Are those excluded from test scores or assigned with 0 score?
- Table 1 and Table 2: Note that both tables share the same caption “The WAN graph was built using the inverted frequency as weighting function.” From the writing, I can see that table1=IF and Table2=IAS. (It will be a favor to the readers, if authors can match the order between IAS and IF in Figures and Tables; currently in figures, IAS is before IF and in Table it is opposite)
- Table 3: Where are the footnotes text for 8, 9, 10 and 11?
- Table 4: What is the meaning of n(overlap) column here? Is it still the same as overlap with Wan2Vec (The numbers look same as table 1 and 2) ? What is the n(overlap) for Glove, Word2Vec and FastText?
|