Review Comment:
ORIGINALITY
The paper concerns the general problem of developing robust methods for name-based entity classification, an NLP problem which the authors label as foundational, insofar as it can touch many particular downstream tasks. One of these is the particular subject of the paper: automated classification of public sector organisations (PSOs) across EU States using only their official names. This has emerged following the authors' own involvement in a study on procurement at EU level within the health sector, for which the classification of public service organisations is crucial.
Although there are good reasons behind the restricted nature of this particular case, it gives rise to a shortcoming that the authors do not dwell upon: the extent to which conclusions reached are transferable to other particular use cases whose parameters differ. Examples would be classification of named entities where the classification structure is richer and/or more universally accepted than the case studied, in the medical field , for example, i.e. names of diseases, anatomical nomenclature. The classification structure of the case chosen is an artefact of socio-political and linguistic choices, in contrast to, e.g. the names of drugs or chemicals which bear a more objective relationship to the underlying reality.
Given the importance of the classification problem tackled, a clear diagram of the target class structures on p5 showing both nested and linear classes would be better than the nested bullet points that appear now. The authors also need to explicitly explicitly state that the target class structure (as opposed to the entity instances) were obtained from Wikidata and how this was done.
Nevertheless, even though the authors recognise the formidable challenges of the chosen use-case, as listed in section 1.1., neither the problem itself, nor the approaches adopted are particularly original. The main novelty of the paper lies in the pipeline developed for systematically investigating solutions which harness "the leverage of structured knowledge bases".
QUALITY OF WRITING
The English is generally of high quality with only a few typos as listed below. However, there are some issues with the structure and organisation of the paper.
Thus the introduction clearly indicates (p3) five challenges pertaining to the case at hand which "the leverage of structured knowledge bases" is proposed to address, leading to the formulation of the two research questions investigated in this study: RQ1 - How effectively can NLP techniques paired with KG data distinguish between medical, government, and educational organizations and RQ2 - How do naming conventions of PSOs exhibit semantic variation across different EU members, and how effectively can these variations be captured by KG resources for entity disambiguation. These are reasonable RQs but:
(i) What to the authors mean by "NLP techniques paired with KG data" and "variation captured by KG resources". Two distinct interpretations are exploited in the paper: the use of Wikidata (a) to generate an annotated dataset suitable for entity classification (p6 line 35), and (b) to supply "class prototypes" used as a basis for comparison in embedding-based methods as described in section 3.2.3 (p11 line 39). These two uses (and there may be others) need to be more clearly distinguished. These are each related to different RQs, but that relation is quite subtle and not clearly stated. The authors should therefore more clearly distinguish these two usages of KG resources and expand on the two research questions are derived from them.
(ii) The two RQs are clearly stated at the outset of the paper. However, they are not mentioned subsequently. The discussion and conclusion are indeed loosely connected to issues relevant to the RQs, but the impact on answers the RQs themselves is lost in the discussion. Therefore, I would recommend that material in the discussion and conclusion sections be restructured so that the RQs become the main organising principle for the results of investigations.
Section 3 provides a comprehensive description of methods used. These are summarised in Fig. 1, whose structure (the column labelled "Parameter Optimisation" as well as the two rightmost columns) is confusing and needs further explanation. However, a reference to Fig 1 is missing in the text.
Authors should note that several other tables and figures appearing in the paper are not referred to in the text. The authors are urged to check for references to all tables and figures. Also, references to tables appearing in annexe should be explicitly labelled as such.
There is some confusion in the overall organisation of sections 3, 4 and 5 with insufficient separation between methodology, review of relevant literature, results of experiments, discussion of those results, and future work. Thus, the description of experiments is very long and includes some results, whilst the section results and conclusions occupy less than a page. Some of the material under conclusions would be better placed under future work. Here then, some restructuring would be desirable to increase the clarity and impact of the paper.
Many references in the bibliography are to arxiv preprints when full references exist e.g.
L. Wang, N. Yang, X. Huang, L. Yang, R. Majumder and F. Wei, Improving Text Embeddings with Large Language Models, arXiv preprint arXiv:2401.00368 (2024)
https://arxiv.org/abs/2401.00368
=>
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11897–11916
August 11-16, 2024 ©2024 Association for Computational Linguistics
Authors need to check the entire bibliography for updated peer-reviewed references.
SIGNIFICANCE OF RESULTS
The results are interesting and reflect considerable amount of work carried out. However, the main conclusions concerning the three classes of solution methods investigated are somewhat limited. Although the authors have designed a useful experimental pipeline, the paper does not provide much insight into how to solve the harder use-cases (e.g. overall coverage of rule systems, preprocessing morphologically rich language data, threshold tuning for NLI approaches). More details on the problematic areas would boost the significance of what has been achieved.
TYPOS
p2 lin 31 named developed => developed
p2 lin 42 co-oficial => co-official
p4 NLI acronym should be defined earlier
P4 Lin 41 languages, we focus => languages focuses
p9 metods => methods
p11 lin 40 its => it is
p11 lin 42 chosen, organisation => chosen organisation
p12 conceptual dispersion the capacity => conceptual dispersion and the capacity
p12 we also supports => we also supported
p12 due to their absence => due to its absence
p13 lin 51 the Github => Github
p14 lin 22 variability. In comparison => variability, in comparison
p15 wikidata => Wikidata (check all occurrences)
|