Review Comment:
This manuscript was submitted as 'Data Description' and should be reviewed along the following dimensions: Linked Dataset Descriptions - short papers (typically up to 10 pages) containing a concise description of a Linked Dataset. The paper shall describe in concise and clear terms key characteristics of the dataset as a guide to its usage for various (possibly unforeseen) purposes. In particular, such a paper shall typically give information, amongst others, on the following aspects of the dataset: name, URL, version date and number, licensing, availability, etc.; topic coverage, source for the data, purpose and method of creation and maintenance, reported usage etc.; metrics and statistics on external and internal connectivity, use of established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language expressivity, growth; examples and critical discussion of typical knowledge modeling patterns used; known shortcomings of the dataset. Papers will be evaluated along the following dimensions: (1) Quality and stability of the dataset - evidence must be provided. (2) Usefulness of the dataset, which should be shown by corresponding third-party uses - evidence must be provided. (3) Clarity and completeness of the descriptions. Papers should usually be written by people involved in the generation or maintenance of the dataset, or with the consent of these people. We strongly encourage authors of dataset description paper to provide details about the used vocabularies; ideally using the 5 star rating provided here .
=============================
The major comments of the previous review were fixed. However there are a couple more minor details:
- paper has a little bit over 10 pages (excluding the references)
- there are some strange '/' at the beginning of each page
Section 3:
- paragraph 1: remove comma after "3.2),"
- before going into details about the classes and properties used to model JRCNames as linked data the authors should refer Figure 1 earlier rather than later so that one can look at the example.
Going along an example while describing the model is much more useful for understanding this representation. I recommend that the last paragraph of section 3 to be put at the beginning of section 3 or at least at the beginning of subsection 3.3.
- Figure 1:
- the figure in printed version is barely visible. Online, due to zoom, it can be followed.
- typo: jrc-names:Jean_Cluade_Juncker__it => jrc-names:Jean_Claude_Juncker__it
- the base variant concept is not exemplified in the figure. Will it be somethign like jrc-names:Jean_Claude_Juncker with no language associated ?
- Table 1
- lack of consistency. In text you use MEP and in the table you use "Talk of Europe"
- Section 5 also uses "Talk of Europe". Maybe it makes sense to not use at all the MEP acronym?
|
Comments
Extra notes
The task of identifying names within a text is known to be very difficult and a good dataset such as JRC Names is vital to a wide-range of text processing tasks that rely on named entity recognition. The linking of this dataset to other resources enables these systems to easily be extended to entity linking, a common task in Semantic Web systems, further improving its usability and the likelihood of third party adoption. In addition, there is already work using this dataset for Social Media recognition (as part of http://languagemachines.github.io/mbt/) and other applications.