PAROLE/SIMPLE 'Lemon' ontology and lexicons

Tracking #: 496-1694

Authors: 
Marta Villegas
Núria Bel

Responsible editor: 
Guest editors Multilingual Linked Open Data 2012

Submission type: 
Dataset Description
Abstract: 
The PAROLE/SIMPLE 'Lemon‟ Ontology and Lexicon are the OWL/RDF version of the PAROLE/SIMPLE lex-icons (defined during the PAROLE (LE2-4017) and SIMPLE (LE4-8346) IV FP EU projects) once mapped onto Lemon model and LexInfo ontology. Original PAROLE/SIMPLE lexicons contain morphological, syntactic and semantic informa-tion, organized according to a common model and to common linguistic specifications for 12 European languages. The data set we describe includes the PAROLE/SIMPLE model mapped to Lemon and LexInfo ontology and the Spanish & Catalan lexicons. All data are published in the Data Hub and are distributed under CC Attribution 3.0 Unported license. The Spanish lexicon contains 199466 triples and 7572 lexical entries fully annotated with syntactic and semantic information. The Catalan lexicon contains 343714 triples and 20545 lexical entries annotated with syntactic information half of which are also anno-tated with semantic information. In this paper we describe the resulting data, the mapping process and the benefits obtained. We demonstrate that the Linked Open Data principles prove essential for datasets such as original PAROLE/SIMPLE lexicons where harmonization and interoperability was crucial. The resulting data is lighter and better suited for exploitation. In addi-tion, it easies further extensions and links to external resources such as WordNet, lemonUby, DBpedia etc.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Judith Eckle-Kohler submitted on 31/Jul/2013
Suggestion:
Minor Revision
Review Comment:

The authors addressed all my major comments.

The revised version needs to be checked again for a few grammar issues (mainly missing articles).

Something went wrong with the references, e.g. in sec 1.2,
the references given for Lemon and LexInfo point to the wrong entries in the references list.

Review #2
By Philipp Cimiano submitted on 08/Aug/2013
Suggestion:
Minor Revision
Review Comment:

The paper describes the conversion of the PAROLE/SIMPLE lexicon to RDF lemon format. The authors have addressed most issues raised in my previous review. Yet, there are a number of minor issues that should be addressed, see the list below:
Lemon should be always lower case, i.e. "lemon"

Abstract:

Original PAROLE/SIMPLE lexicons contain mor- phological, -> "The original..."

it easies further extensions and links to external resources such as WordNet, lemonUby, DBpedia etc.

should be "eases", but better would be:

"it facilitates further extensions and linking to"

Introduction

project producing corpora and lexicons in so many lan-guages -> languages (no hyphen)

and built according to the same design prin-ciples -> prinnciples (no hyphen)

is an ISO standard (ISO-24613:2008) for Natural Language
Processing lexicons -> for computational lexicons???

for modeling lexicon based on LMF and expressed in RDF -> technically not corret, lemon is inspired by LMF but in some sense a different model.

I would propose: for modelling lexica in RDF

Lemon is highly compliant with LMF. -> This is true, but of course technically they are not really compatible, but can be transformed into each other by syntactic transformations

LexInfo builds on the Lemon model and it is also highly compliant with LMF. -> Well, LexInfo is in principle logically independent of the lemon model as it can be used with other lexicon models. So it does not build on it technically speaking.

Note that many PAROLE/SIMPLE ele- ments are not mapped and simply disappear in the target model. This is partially due to the fact that RDF allows a better modeling and they are no longer needed. -> can you please give some examples of such elements that are not longer needed?

Figure 3: it is quite weird to have a lemon relation between XML elements. I would propose changing the figure so that you show the XML structure with the implicit IDREF link, highlighting this graphically, and then show the transformation into RDF with the explicit link via the lemon property.

for the association linguistic information- > association * of * linguistic information

In the following lines we describe the clues of the mapping process and highlight some of the benefits obtained. -> In the following sections we describe the mapping process and highlight

Elements from the DTD were mapped onto Classes. -> classes (should be lowercase)

Though the mapping process implied a considera- ble effort we think the task was worth it. -> effort? In which sense? Can you quantify it? Was it conceptually difficult or difficult to migrate/transform all the data / instances?

Section 3: Lemon model simplifies -> The lemon model simplifies

by means of agent and patien proper- ties. -> patient * t *

BTW. It is nice to see that lemon simplifies the syn-sem mapping so much, but I wonder if it is not the case that some expressity is lost in the mapping and that the more complex machinery is needed to account for some examples? Your thoughts on this would be very appreciated.

LexInfo defines a subcategorization ontolo- gy based on the Lemon model. Not true, LexInfo instantiates some of the classes introduce in lemon. This is the correct phrasing I think.

First, we defined a style sheet converter that reads our PAROLE XML lexicon and for each Description element it generates a new Frame. -> remove "it"

for in- stance we can easily get all "control" verbs; verbs with a sentential complement; verbs with an indirect object, etc.-> could you provide these queries as a footnote?

7. Summary and conclusions -> Conclusions should be upper case

The dataset described here is the result of mapping PAROLE/SIMPLE Spanish and Catalan lexicons onto Lemon model following the LexInfo ontology. -> onto * the * lemon model

nd all shared descriptive elements are inte- grated with LexInfo ontology. -> * the * Lexinfo ontology

A final information is needed: how many new sub-classes (e.g. subclassed of lemon:Frame) did you have to introduce that were not in LexInfo. How many grammatical functions did you have to introduce?

Thanks for this great work!

You might want to cite the following paper in which some of the benefits you mentioned are discussed:

Chiarcos, C., McCrae, J., Cimiano, P., & Fellbaum, C. (2013). Towards open data for linguistics: Lexical Linked Data. In A. Oltramari, P. Vossen, L. Qin, & E. Hovy (Eds.), New Trends of Research in Ontologies and Lexical Resources (pp. 7–25). Springer.