LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain

Tracking #: 3267-4481

Vasile Pais
Maria Mitrofan
Carol Luca Gasan
Alexandru Ianov
Corvin Ghiță
Vlad Silviu Coneschi
Andrei Onuț

Responsible editor: 
Harald Sack

Submission type: 
Dataset Description
LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain. It provides gold annotations for organizations, locations, persons, time expressions and legal resources mentioned in legal documents. Furthermore, GeoNames identifiers are provided for location entities when linking was possible. The resource is available in multiple formats, including span-based, token-based and RDF. The Linked Open Data version, in RDF-Turtle format, is available for both download and interrogation using a SPARQL endpoint.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Enrico Francesconi submitted on 27/Sep/2022
Minor Revision
Review Comment:

I don't have other comments with respect to my previous reviews concerning technical issues.
Considering the submission type as "Dataset Description", an additional contribution about quality and stability of the dataset would be desirable. On the other hand, the modality to access the dataset are now clearly described.

Review #2
Anonymous submitted on 10/Oct/2022
Review Comment:

In this paper version, the authors have considered most of my previous concerns. However, there is a minor concern about the encoding information provided in the appendix. It would be nice to have a detailed step-wise description of the encoding example for better understanding.
The quality of the writing and the structure of the paper have also improved a lot from the first version of the article. It is now better organised and easy to follow. The description of the concepts and the relations in Table 2 is useful for better understanding. Also, they provided a good description of the content of the files in the Zenodo link. Therefore, I would like to accept the paper.

Review #3
Anonymous submitted on 21/Oct/2022
Minor Revision
Review Comment:

Thank you for revising and resubmitting your manuscript. The revised version addresses some but not all of my comments:

As mentioned in the previous review, most of the paper is still too verbose and too detailed. The previous version had 14 pages, the new version still has 14 pages (in fact, the main part of the paper is even longer in the new version); in my previous review I had suggested to remove 2-3 pages, which can indeed be safely removed without compromising the key informational aspects of your work, i.e., your core research results. In that regard, I had suggested to concentrate especially (but not only) on Sections 4.1 and 4.2, which are, for what the paper presents, too fine-grained.

I'm still suggesting this since the paper is a straightforward data set/corpus paper, i.e., nearly all of the readers of this article will be familiar with the main approaches followed in this paper. In my previous review I had suggested and I still suggest to remove all the unnecessary detail from the paper that does not directly relate to your core research results (see the examples I provided in the previous review and the suggestions included in the PDF file that I will send to the editors).

Figure 1 and Figure 2 still seem to be screenshots. I suggested to recreate these in LaTeX to improve the quality of the figures (this would also enable copying and pasting the content of these two figures from the paper). This comment also relates to some of the other figures.

Figure 3: I suggested to include this graphics not as a bitmap but as a vectorised PDF or SVG file so that the quality is improved. This still needs to be done.

There are still various overfull boxes that need to be fixed.

As mentioned, an annotated PDF version of the paper is attached, which includes additional remarks and especially many suggestions for content to remove from the paper.