Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Dbnary: Wiktionary as a Lemon Based RDF Multilingual Lexical Resource

Submitted by Gilles Sérasset on 06/25/2013 - 05:47

Tracking #: 504-1702

A new version of this paper is available

Authors:

Gilles Sérasset

Responsible editor:

Guest editors Multilingual Linked Open Data 2012

Submission type:

Dataset Description

Abstract:

Contributive resources, such as Wikipedia, have proved to be valuable to Natural Language Processing or multilingual Information Retrieval applications. This work focusses on Wiktionary, the dictionary part of the resources sponsored by the Wikimedia foundation. In this article, we present our effort to extract multilingual lexical data from Wiktionary data and to provide it to the community as a Multilingual Lexical Linked Open Data (MLLOD). This lexical resource is structured using the LEMON Model. This data, called dbnary, is registered at http://thedatahub.org/dataset/dbnary.

Full PDF Version:

swj504.pdf

Revised Version:

DBnary: Wiktionary as a Lemon-Based Multilingual Lexical Resource in RDF

Previous Version:

Dbnary: Wiktionary as a Lemon Based RDF Multilingual Lexical Ressource

Tags:

Reviewed

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Judith Eckle-Kohler submitted on 31/Jul/2013

Suggestion:
Minor Revision

Review Comment:

The author addressed my comments. I appreciate in particular that he has pointed out the problem of potentially changing URIs of lexical senses in different Wiktionary dumps along with a possible strategy to further investigate this issue (by way of diachronic studies across Wiktionary dumps of a longer time span).

However, there is a minor issue left:
the proper reference for UBY is not Zesch et al. (2008) - this is the proper reference for the Java API to Wiktionary developed at UKP Darmstadt, JWKTL -
but this one:

@INPROCEEDINGS{TUD-CS-2012-0023,
author = {Iryna Gurevych and Judith Eckle-Kohler and Silvana Hartmann and Michael
Matuschek and Christian M. Meyer and Christian Wirth},
title = {Uby - A Large-Scale Unified Lexical-Semantic Resource Based on LMF},
booktitle = {Proceedings of the 13th Conference of the European Chapter of the
Association for Computational Linguistics (EACL 2012)},
year = {2012},
pages = {580--590},
month = {Apr},
location = {Avignon, France},
pdf = {fileadmin/user_upload/Group_UKP/publikationen/2012/uby_eacl2012_cameraready.pdf},
pubkey = {TUD-CS-2012-0023},
research_area = {Ubiquitous Knowledge Processing},
research_sub_area = {UKP_p_QAEL, UKP_p_EduWeb, UKP_a_ENLP, UKP_p_UBY, UKP_p_InCoRe},
website = {www.ukp.tu-darmstadt.de/uby (Link: http://www.ukp.tu-darmstadt.de/\"http://www.ukp.tu-darmstadt.de/uby\"
)}
}

Review #2

By Jorge Gracia submitted on 12/Aug/2013

Suggestion:
Minor Revision

Review Comment:

As I wrote in my first review, this short paper fits into the topics of the special issue very well, as a "dataset description" paper. In this version, many typos were corrected and details added, as well as more recent data were provided.

I still miss, though, a more critical analysis of the chosen representation scheme and how the authors expect to evolve it in the future. In fact I have the feeling that they are underutilising lemon, and creating some weird constructs such as the union between LexicalSense and LexicalEntry (see my previous review). I understand, though, that some pragmatic temporary solutions were adopted at the beginning of this work, which is perfectly OK. But I miss more details about how the model will evolve (towards lemon or not).

Review #3

By Sebastian Hellmann submitted on 12/Dec/2013

Suggestion:
Minor Revision

Review Comment:

Overall, the quality of the paper has improved a lot and the issues raised have been addressed. I checked also all the technical issues as well and the database uses IRIs now and responds well.

Section "2.2 Scope..." now describes the usefulness of the system. The link to http://blexisma.ligforge.imag.fr seems to be missing, but would be useful.

Furthermore the quality evaluation has been addressed in an optimal way. According to my judgement the measures "comparison to the MediaWiki API" and the "evaluation of time slices" are two very good ideas and help to sustainably track data quality.

The description is also very well understandable, clear and complete, now.

Below are some minor comments, which can be fixed quite fast:

I am still unsure about the class "LexicalEntity". Are there any advantages of the current definition. i.e. when querying the model?
If it were just for the rdfs:domain and rdfs:ranges for properties, then one can just define the owl:unionOf there without introducing a new class.
However, this is more a question out of interest and not a request to change the ontology.

One more thing: I would like to see the title changed to:
Wiktionary as a Lemon-Based Multilingual Lexical Resource in RDF.

"Lemon Based" -> "Lemon-Based" http://www.grammar-monster.com/lessons/hyphens_in_compound_adjectives.htm
"RDF Multilingual Lexical" is too much -> "Resource in RDF"

"However, such studies are not trivial to implements as a change in a definition does not necessarily implies that the lexical sense has changed."
-> "However, such studies are not trivial to implement as a change in a definition does not necessarily imply that the lexical sense has changed."

"legacy lexical data is underspecified" -> "are underspecified"

"an unusually ambiguous"-> "a unusually ambiguous"

"# to transl." -> "# of transl."

"lexicon-semantic" -> "lexica-semantic"

Log in or register to post comments
18116 reads

Comments

Some errors in the article

Permalink Submitted by Gilles Sérasset on 07/02/2013 - 13:23.

I realized that the urge to submit the revised version of the paper led to (at least) 2 errors in the paper:

1. In Figure 2, dbnary:Equivalent has not been changed to dbnary:Translation as in all other places in the paper,
2. In table 3 the caption is not clear enough, it should read "Extracted translations vs interwiki links RATIO, on a random sample of 1000 entries". Moreover, the ratio are presented as a percentage (99.1% instead of .991) which is not a good way to present a ratio, especially when such a ratio leads to values above 1...).

Should the paper be accepted, these mistakes will be corrected (I did not find a way to update the submission pdf...)

Gilles,

Main menu

Editorial Board

Syndicate

Dbnary: Wiktionary as a Lemon Based RDF Multilingual Lexical Resource

Tracking #: 504-1702

Comments

Some errors in the article

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Dbnary: Wiktionary as a Lemon Based RDF Multilingual Lexical Resource

Tracking #: 504-1702

Comments

Some errors in the article

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles