Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Sebastián Ferrada
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Krzysztof Janowicz
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
Sanju Tiwari
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain

Submitted by Vasile Pais on 04/08/2022 - 05:42

Tracking #: 3108-4322

A new version of this paper is available

Authors:

Vasile Pais

Maria Mitrofan

Carol Luca Gasan

Alexandru Ianov

Corvin Ghiță

Vlad Silviu Coneschi

Andrei Onuț

Responsible editor:

Harald Sack

Submission type:

Dataset Description

Abstract:

LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain. It provides gold annotations for organizations, locations, persons, time and legal resources mentioned in legal documents. Furthermore, GeoNames identifiers are provided for location entities, when linking was possible. The resource is available in multiple formats, including span-based, token-based and RDF. The Linked Open Data version, in RDF-Turtle format, is available for both download and interrogation using a SPARQL endpoint.

Full PDF Version:

swj3108.pdf

Revised Version:

LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain

Previous Version:

LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain

Tags:

Reviewed

Long-term Stable Link to Resources:

https://doi.org/10.5281/zenodo.4772094

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 15/Apr/2022

Suggestion:
Minor Revision

Review Comment:

This new version of the paper addresses my main remarks about ontology description and RDF mapping of the extracted entities. In order to meet the requirements for a "Data Description" paper I suggest the authors to group and to give more evidence to the information about quality and stability of the dataset, as well as the modality to access and to reuse it.

Review #2

Anonymous submitted on 22/Apr/2022

Suggestion:
Major Revision

Review Comment:

Thank you for revising and resubmitting your manuscript. The revised version addresses most of my comments but there are still various aspects that need to be taken into account still.

My main comments regarding the revised version relate to aspects of presentation.

1. Introduction: Please shorten the introduction by clearly stating what this paper presents. All remarks and projects that belong to the category of related work should be moved into the Related Work section (especially page 1, right column, lines 27 ff. until page 2, left column, line 9).

4. Corpus description: like many other parts of the paper, Sections 4.1 and 4.2 are very verbose and detailed. The level of detail is partially way too fine-grained and can be trimmed down accordingly (also see my last remark down below).

Figure 1 and Figure 2 seem to be screenshots. Please recreate these in LaTeX to improve the quality and use both columns to present the resulting figures (the current versions are simply too small).

Section 4.2: While this subsection is too long and too detailed (see below), the presentation needs to be improved. As this subsection essentially presents the different vocabularies used in the annotation, my suggestion would be to use a description list, to use one vocabulary per \item and to include one paragraph per vocabulary to (a) present this information in a more structured way to the reader and (b) to keep the amount of detail on a reasonable level.

Figure 3: please include this graphics not as a bitmap but as vectorised PDF so that the quality is improved.

Appendix A: In its current form, this appendix is not helpful. If you want to keep it, then the appendix needs one or two paragraphs of explanation. You should also consider including comments in the actual code.

Furthermore, please note that a "Data Description" paper at SWJ should be "a concise description of a Linked Dataset". While most of the aspects of your dataset, as specified in what a "Data Description" paper is, are properly addressed in the manuscript (title, repository, use cases etc.), most of the paper is simply too verbose and too detailed. Many of the very detailed aspects can simply be removed from the paper. To provide rough guidance, all in all, from the revised version of the paper, at least two to three pages can be safely removed without compromising the key informational aspects of your work, i.e., your core research results. I'm suggesting this especially since the paper is a rather straightforward data set/corpus paper, i.e., nearly all of the readers of this article will be familiar with the main approaches followed in this paper. The suggestion is to remove all the unnecessary detail from the paper that does not directly relate to your core research results, for example, the short history of NER at the beginning can be trimmed, the explanation what geonames is can be trimmed, the information which columns were added to which files to accomplish certain annotation layers can be trimmed, so can many other aspects that are too detailed or too operational or too technical.

There appears to be an over-use of quotation marks in the paper. Whenever a term is already marked in itself (for example, when a prefix is used that is separated from the rest with a colon), then quotation marks do not really need to be used.

There are various overfull boxes that need to be fixed.

Finally, the English needs to be substantially checked, ideally by a native speaker.

Review #3

Anonymous submitted on 14/Jun/2022

Suggestion:
Accept

Review Comment:

The authors have addressed the concerns that were raised in the last review in the updated version of their paper. The addition of the federated query and the example has improved the comprehensibility of the paper. The SPARQL endpoint also worked properly this time.

However, the readability of the paper could be further improved by adopting the following minor suggestions:
1. By explicitly mentioning the contributions of the paper in bullet points at the end of the introduction.
2. Even though a short description of the key concepts and relationships mentioned in Figure 3 is provided in Section 4.2, the explanation of the meaning of the concepts and the relations are missing. It would be nice to have a brief description of each of the concepts and relations separately in form of a table.
3. It would be nice to somehow fit the sentences in proper columns unlike Line no L28, and R10 in Page 5.

Log in or register to post comments
4106 reads

Main menu

Editorial Board

Syndicate

LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain

Tracking #: 3108-4322

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain

Tracking #: 3108-4322

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles