Review Comment:
This is a very interesting paper on an extremely important issue for the future of biodiversity research, and my recommendation is that it be published. Tables 3 and 4 in particular provide some really interesting insights into how differently a single species complex consisting of only 8 species-level names can be reinterpreted by different taxonomists, and clearly demonstrate the value of taxonomic concepts as opposed to simply using species names in biological analyses. The authors show a way forward on this front, combining expert information with computational results to derive visual and logical representations of the relationships between the taxonomic concepts that are part of this taxonomic group. I particularly liked your scheme of using '==/' to represent relationships between concepts while simultaneously using '=/!=' to represent relationships between names, and the idea of "reliable" and "unreliable" names. The authors present several different ways to represent taxonomic information, including an easy-to-read tabular view, a harder-to-read-but-more-detailed concept view, a graphical representation of relationships between concepts within a single publication and between different publications.
I would recommend major revisions before acceptance for three main reasons:
1. I found this paper hard to read: this is unsurprising, seeing as understanding it requires a good grounding in both taxonomy and logic, both difficult subjects, but I think the authors could have done a better job of:
(a) Trimming unnecessary material: were all the taxonomic visualizations in figure 4, 5 and 6 necessary? Could some be moved to supplementary materials? Also, you discuss type-bearing names at the start of section 2, but I'm not sure that that's necessary to understanding name/concept relationships.
(b) Using simple language (the second sentence of the manuscript, "The challenge arises due to limitations inherent in using taxonomic names as identifiers of increasingly granular semantic differences being expressed in original and revised classifications", could potentially be simplified to "Because of the way in which they are defined, taxonomic names are inadequate to expressing fine semantic differences between original and revised classifications", for instance).
(c) using a more traditional paper opening, in which you start with a research question and hypotheses before delving into methods and results. In particular, it wasn't clear to me what you were planning to do until I got to the methods, results and discussion; I think making that clearer right at the start of your introduction would be helpful to readers.
2. I think you should emphasize your problem statement and key findings more, either in your introduction or in a conclusion. How is Euler/X useful for taxonomists, biodiversity researchers or logicians? I can see real value in being able to identify taxonomic names that are "reliable identifiers of taxonomic identity" and "merge concept regions for which there are no unique identities and names" -- are there others I missed? I think you should emphasize how your findings here differ from previous research and what's new here.
3. I think you should talk about some of your findings a bit more in the discussion, including:
- What is the taxonomic significance of "merge concept regions for which there are no unique identities and names present in the respective input taxonomies"? Should they be named by taxonomists, or are they just a necessary crutch when relating taxonomies to each other? I found it particularly interesting that Euler/X came up with 127, while figure 2 only shows 100 -- are there concepts here that taxonomists would have found hard to discover without this tool? Is Euler/X "oversplitting" in a way that no taxonomist would?
- What are the challenges towards expanding Euler/X so that it supports all 55 possible pairwise comparisons? Is this something that will become possible in the next few years because of Moore's Law, or is it unfeasible until more sophisticated logical analysers are developed over the next decade?
- Is your research of immediate benefit to biodiversity researchers? For example, could a scientist analyzing records from GBIF use your results to determine that records referencing, say, _Andropogon hirsutior_, has only had a single meaning and does not require further reconciliation, while recording referencing _Andropogon virginicus_ has had a great many meanings and so should not be used without being absolutely sure of the circumscription intended by the person who created the records.
Apart from that, the only other suggestion I have is that references like "The logic foundations for this particular approach were developed in [15,42,83]" can be hard to read -- I would suggest replacing this with "were developed in previous work by these authors [15, 83] and others [42]".
As described above, I was very impressed with this paper and would definitely like to see it published! However, I would strongly counsel major revisions to improve its readability before it is published -- this will ensure that this excellent paper will be of maximum benefit to the largest number of people!
|