Review Comment:
The paper presents a bibliometric analysis of the field of Ontology matching. It applies a 'semantic analysis', trying to extract topics from papers, and a 'structural' analysis studying only the bibliographic characteristics of the literature (authorship, citation, etc.).
Since, the Semantic web journal is not a journal about bibliometrics, this paper is rather particular for the journal. To be clear, it only uses classical techniques and does not apply semantic web technologies to bibliometrics. However, a part of the semantic web, Ontology matching, is the object of this study. That could be of interest to the journal readership, especially if remarkable features of the field were unveiled.
The work seems to have been seriously done, as far as I can judge and most of what is expressed is clear.
Unfortunately, after reading it, it does not seem worth publishing.
The main problem is the lack of objective: what is this work trying to assess? For most of the presented study, there is no hypothesis tested and no interesting finding reported (except at one point, but without seriously seeking to explain it, see below). It is just like if the reported figures were totally indifferent and that the paper would have been the same with different figures.
Another issue is the lack of baseline for the presented data. Indeed, it is impossible to know if the observed properties are specific to the Ontology matching field, or if they apply equally in other fields. At least, it would have been good to have a comparison with the broader context, i.e. comparing with Semantic web and Computer science. It is possible to observe features of the Ontology matching field, but no way for the reader to understand if these are remarkable or not.
Finally, in these times of Open science, it is regrettable that no mention is made of the availability of the data.
These points are the major issues. I discuss below various problems, some of them related to the issues above, some of them discussing particular points. They may help the authors to improve their paper.
* Organisation:
- The introduction does not state any goal for the paper, neither claim any findings. It rather describes the applied treatments.
- Section 2 provides a methodology. However, in absence of statement about the goal of the analysis, it is not possible to judge the relevance of the methodology.
- Section 2.1 details lenghtly the preprocessing of WoS, before turning more succintly to Scopus which was actually used. This seems a strange way to present things.
* Data interpretation:
- The topic analysis is not particularly insightful. In particular, it does not provide much information on ontology matching but on its use. It seems to gather the terms in an unprincipled way (heterogeneous and automatic appear in most of them, biomedical is in the learn cloud while anatomy is in the query one, etc.). It may have been interesting to see all the generated 'topics' that the authors did not retain. In the end it is unclear, which conclusions may be drawn from the topic analysis.
- p10: 'Ontology alignment outputs form 6.1% of the top 10% most cited article worldwide in year 2013' (I simplified). How can this be? From Fig. 3, there seems to be no more than 300 papers in scopus on 'ontology alignment' for 2013. If they are all in the 10% most cited papers and these are 6.1% of them, this means that there was only 10*300/6.1%~50000 papers indexed by Scopus (if not all 300 are in the most cited, then this is even less). This does not seem to be right: I counted 2.8Mdocuments in Scopus for 2013.
It seems to me that what was meant is that 6.1% of the 'ontology alignment' papers are in the 10% most cited papers. Again with no comparison to the same figure for Semantic web or Computer science, it is difficult to tell that this figure is specific to Ontology alignment (there are fields with more citations and fields with less citations, e.g. Mathematics, and putting them all together means that some are above average and some other are below).
- Section 4.3 is about disciplines relevant to Ontology matching. Given the broad categories used here (the level 2 categories of Fig 7), is it unclear that this characterisation is useful for something.
- Section 5.1-5.2 about collaboration are those that could be thought of as providing some findings. Figure 8 is stunning at providing two identified clusters. The authors do not provide much explanation about this phenomenon, they suggest that may be the researchers from one cluster are not curious about the others. However, these graphs being computed on collaboration, a symmetric measure, it seems that this explanation should, at the very least, be applied symmetrically. It is difficult from this data alone to provide an explanation, but many could be put forth. In particular, the fact that one of this cluster is mononational and the other international suggest that the explanation comes from some national elements (but see discussion below). These may be linguistic factors, the collaboration approach, work approach (many coauthors, many authors of only one paper, e.g. undergraduate students: this can be studied bibliometrically), publication policies (strong incentive to publish many papers and in scopus indexed journal, hence less in the Ontology matching workshop). It is possible that many of these factors play some role together... Finally, again in the absence of comparison with other fields, it is difficult to assess if this is due to the Ontology matching field.
- This judgement made on collaborations is also made on citations (though to a far lesser extent). That could have helped sheding light on this matter because citation is not symmetric. Unfortunately, in 6.2, citations are only reported as numbers assigned to papers and country so, they are not helpful. This is too bad because if a community has less citation per paper than another, it is difficult to explain it by discrimination if both communities have the same citation pattern (they both cite less the same community). At least, it would have been worth to rule out this possibility.
- As I understand from the text, six communities were extracted and only two are shown in Figure 9. If the number 6 was not given to the algorithm and is significant, then the six should be shown.
- 5.1 Author collaboration: the conclusion drawn on page 14 are very general and not specific to Ontology matching.
- p16 "the research outputs with at least one Chinese author have not gained enough attention": it is unclear on what ground this statement is based. Same thing for "they do not get enough attention, possibly the attention they deserve".
- Again, 5.3-5.4 would deserve to be compared with the broader Semantic web/Computer science fields.
- The authors "encourage the organisers of OAEI" to have benchmarks on the identified topics. Unfortunately, these topics are not application domains, like biomedicine, but application techniques, like "Semantic Web Services, agent-based modelling, knowledge-graphs, and business processes (cited directly from the paper)". This means that there are not many ontologies to match there... and some of them have been considered, e.g. Process matching.
* Data presentation:
- Figure 2 displays data as tag clouds. The precise interpretation of tag clouds is quite unclear to me, so if there is one, it should be provided. In general, it does not seems like tag clouds are a proper scientific visualisation instrument (no unit, no scale, esthetic arrangement).
- Figure 8 is interesting, but it would also be interesting to understand the space, i.e. what are the principles of entity placement. The same applies to Figure 17.
- The assignment of authors to countries is not specified. One of the most collaborative "Chinese scientist" is "S. Wang". I assume that this is Shenghui Wang. Shenghui published her work while at VU Amsterdam. It is unclear that she should count as Chinese (in such a case, Pavel Shvaiko is from Belarus, Ernesto Jimenez-Ruiz from Spain, Cassia Trojahn from Brazil, etc.). In this sense, she is atypical (less and less atypical as time passes), and seems to indicate that the two clusters are rather based on the involvement in an international collaboration network or not, rather than nationality. This, in turn, may have other causes (see above).
- Figure 9 is unreadable in black and white.
* Form:
- The title of the paper is quite strange: ontology alignment is not really revisited and there is not real "narrative" provided here. Moreover, this is not really the purpose of scientific journals to publish "narratives", but findings.
- The introduction uses a flourished language that is also a bit remote from fact. For instance: "the heterogenity problem was quite epidemic" is not particularly clear.
* Details:
- p4: '"ontology alignment", which is interchangeably referred to as "ontology matching" or "ontology mapping"': it is not clear by whom.
- It may have been interesting to look for outliers in this data set. In particular, books and review papers traditionally get a lot of citation: do these figures look the same if they are retracted from the corpus? I do not know if it is accepted practice in bibliometrics and this is less important than comparing with external fields.
- p22 there seem to be a missing reference.
- In some instances, such as reference 2 or Table 2, problems with characters.
|