Review Comment:
Thank you for revising and resubmitting your manuscript. The revised version addresses most of my comments but there are still various aspects that need to be taken into account still.
My main comments regarding the revised version relate to aspects of presentation.
1. Introduction: Please shorten the introduction by clearly stating what this paper presents. All remarks and projects that belong to the category of related work should be moved into the Related Work section (especially page 1, right column, lines 27 ff. until page 2, left column, line 9).
4. Corpus description: like many other parts of the paper, Sections 4.1 and 4.2 are very verbose and detailed. The level of detail is partially way too fine-grained and can be trimmed down accordingly (also see my last remark down below).
Figure 1 and Figure 2 seem to be screenshots. Please recreate these in LaTeX to improve the quality and use both columns to present the resulting figures (the current versions are simply too small).
Section 4.2: While this subsection is too long and too detailed (see below), the presentation needs to be improved. As this subsection essentially presents the different vocabularies used in the annotation, my suggestion would be to use a description list, to use one vocabulary per \item and to include one paragraph per vocabulary to (a) present this information in a more structured way to the reader and (b) to keep the amount of detail on a reasonable level.
Figure 3: please include this graphics not as a bitmap but as vectorised PDF so that the quality is improved.
Appendix A: In its current form, this appendix is not helpful. If you want to keep it, then the appendix needs one or two paragraphs of explanation. You should also consider including comments in the actual code.
Furthermore, please note that a "Data Description" paper at SWJ should be "a concise description of a Linked Dataset". While most of the aspects of your dataset, as specified in what a "Data Description" paper is, are properly addressed in the manuscript (title, repository, use cases etc.), most of the paper is simply too verbose and too detailed. Many of the very detailed aspects can simply be removed from the paper. To provide rough guidance, all in all, from the revised version of the paper, at least two to three pages can be safely removed without compromising the key informational aspects of your work, i.e., your core research results. I'm suggesting this especially since the paper is a rather straightforward data set/corpus paper, i.e., nearly all of the readers of this article will be familiar with the main approaches followed in this paper. The suggestion is to remove all the unnecessary detail from the paper that does not directly relate to your core research results, for example, the short history of NER at the beginning can be trimmed, the explanation what geonames is can be trimmed, the information which columns were added to which files to accomplish certain annotation layers can be trimmed, so can many other aspects that are too detailed or too operational or too technical.
There appears to be an over-use of quotation marks in the paper. Whenever a term is already marked in itself (for example, when a prefix is used that is separated from the rest with a colon), then quotation marks do not really need to be used.
There are various overfull boxes that need to be fixed.
Finally, the English needs to be substantially checked, ideally by a native speaker.
|