Review Comment:
Generally, the idea of a separate lexical and syntactical phase before applying ontology matchers is appealing. Usually, this step is hardcoded in the ontology matching systems. It could be done either in general or considering given ontologies. Furthermore, lexical and syntactical steps could also directly consider the other ontology to be matched. Then, it is already a kind of ontology matching, but all these options could help ontology matching tools in their matching activity where not only the lexical aspect is involved.
The paper proposes the lexical analyzer, which considers ontologies and matching activity. My concern is that the paper is mainly about the lexical analyzer approach (Sections 1, 2, and 3), but then it is evaluated only within the ALIN matching system. It is inconsistent, and the paper should either be about the lexical analyzer with a general evaluation regarding more matching systems or rewritten to be about the ALIN matching system from the beginning. Currently, the experiment setting is not appropriate, and it does not evaluate the idea of a lexical analyzer but rather the one matching system that applies lexical preprocessing.
If the paper would be about the ALIN matching system:
* The new ALIN metric should be explained in the main text, which would need a substantial rewriting.
* The whole ALIN pipeline should be explained, including using string-matching techniques, etc.
If the paper would be generally about the lexical analyzer approach:
* It is applied to the human and mouse ontology matching pair, but discussing how to apply it generally in any ontology matching pair is needed.
* Throughout the paper, it is described that the lexical analyzer approach would save time for domain experts when they are less involved in the ontology matching validation step. However, the currently proposed iterative lexical analyzer approach is very time-consuming, and NLP experts are expected to be involved. Therefore, the burden moves from domain experts to NLP experts. It should be discussed how demanding and doable it is, and ideally, it should also be tested with a couple of NLP experts in the experiment to determine whether it is doable.
Further remarks:
* The paper contains definitions of lexical aspects. However, I miss the entities' definitions and entity names, i.e., what the ontology is and its content need to be clarified.
* The related work section is overcrowded with references. Instead, it would help to add the summary table where the systems are in rows, and their lexical processing techniques are depicted in the columns, such as stemming, separating words, etc. Individual cells would depict whether the system applies the given lexical processing technique.
* Generally, the description of the whole process should be improved, e.g., it is mainly about the lexical analyzer, but there is also a lexical analyzer generator. It could help depict the whole process in pseudocode.
* The workflow for NLP experts is described in Section 3.2, but it often needs clarification. It is also quite demanding and time-consuming for NLP experts, e.g., "copy the lines and paste it into the lexical analyzer," "assess how to implement the standardization technique," "check to see if there is an existing program," "adjust the lexical analyzer." Moreover, it seems that steps 2 and 3 should be swapped. The section needs a substantial rewriting.
* As I explained, I think the experiment setup is improper: the evaluation or the first part of the paper should be changed. It is about evaluating ALIN, written in 4.3, "To evaluate ALIN." However, based on the paper title and the first three sections, it should evaluate the lexical analyzer with regard to more matching systems and experiment with an NLP expert's involvement.
* The results about the comparison of executions 4 and 5 are not very convincing: F-measure 0.941 vs. 0.953, number of interactions 357 vs. 405. However, we should also consider the time-consuming work of NLP experts. Although it is claimed that the ALIN with current lexical analyzers achieves better results than in OAEI 2023, the F-measure is the same, 0.952, with lower total requests: 405 vs. 514.
The paper is original, and the authors provided the code in the OSF repository equipped with a README file. It appears the provided resources enable reproducibility of the results from the paper.
|