Converting the Parole Simple Clips Lexicon into RDF using the Lemon model

Riccardo del Gratta
Francesca Frontini
Fahad Khan
Monica Monachini

Guest editors Multilingual Linked Open Data 2012

Dataset Description
This paper reports on the publication of parts of an Italian Lexicon Parole Simple Clips (PSC) as linked data using the Lemon linked data model. The main problem dealt with during the conversion process related to the mismatch between the Lemon view of lexical sense objects, as a reified pairing of a lexical item and the concept in an ontology that provides a meaning for it, and the corresponding notion within PSC of semantic units, which can take part in a broader class of semantic relations. The solution outlined in this paper was to instantiate the semantic units of PSC semantic layer as units in an OWL ontology called Simple OWL along with the relations holding between them in the PSC semantic layer, and also to duplicate them as Lemon lexical sense objects. The details of the solution and the organisation of the data are given in the paper, as well as a discussion of possible improvements and future work.
Solicited Reviews:
Review #1
Anonymous submitted on 16/Feb/2013
This paper describes the work done in order to publish some parts of the Italian lexicon Parole Simple Clips (PSC) as linked data according to the lemon ontology-lexicon model. Specifically, they performed three actions:

1. The conversion of the semantic layer of the PSC lexicon into OWL, resulting in an extension of the Simple OWL ontology
2. The conversion of the PSC nouns according to the lemon model
3. The linking of the lemon model with the Simple OWL ontology

In the introduction, the authors refer to Language Resources and Technology as an area of knowledge in its own, and quote some papers. According to the perspective with which they are approaching this matter, I would say that they should have referred to several initiatives in this domain that have been going on for many years now, such as FlareNet, Clarin, etc., or conferences such as LREC.

In section 2 the authors provide a general description of the lemon model and give some details about the special relation or linking holding between lemon lexical units and ontology concepts.

Section 3 is devoted to the description of the PSC lexicon and Simple OWL ontology. They explain that the semantics of the PSC lexicon is based on the Generative Lexicon theory, but the examples they provide do not really illustrate how the different dimensions of the meaning of a lexical entry are represented in the PSC. I believe these would help understand the difficulties in the transformation into the lemon model, which is described in section 4. Moreover, it would be interesting to know where the lexical relations of synonymy, polysemy, etc. are represented in the PSC.

Section 4 is the main section of the paper, since it explains how the conversion of the PSC lexicon into the lemon model took place. However, it is the most unclear section of the paper. Many questions remain open, and the authors do not really achieve to explain or justify the decisions taken. They state that, and I quote, “the lexical sense objects in Lemon don’t take part in semantic relations”. They should explain that this is so because these relations are captured in the ontology. On the contrary, they mean that, therefore, “it is not always possible to identify PSC Usems with lexical sense objects”. Why is this be a problem?

Similarly, they also refer to the fact that properties such as synonymy and antonymy only occur at the level of senses in lemon as being a problem. I believe this should not be seen as such, since lemon has other mechanisms in order to represent that. However, they propose to represent those lexical relations at the conceptual level in the Simple OWL ontology. As I see it, this is completely contradictory to the lemon philosophy.

Finally, in section 5 links are provided to the libraries that host the resources described.

Some format and language issues:
• I strongly advise the authors to revise the English of the paper.
• They should also check the use of acronyms (LRT, PSC). They should indicate it the first time they use them, and then use them consequently in the rest of the document. They have also used lemon both with capital letter and with small letter, indistinctly.
• At several stages, the same word appears twice. E.g., 1st paragraph, Unique Resource Identifiers Unique Resource Identifiers

Review #2
Anonymous submitted on 21/Feb/2013
This paper describes a conversion of the noun parts of the Italian PAROLE-SIMPLE-CLIPS lexica to RDF. The authors had to duplicate each entry to satisfy the distinction between lexical and ontological items required by the Lemon model. This also entailed splitting the original relations and properties accordingly among these two types of items. The authors drew on previously published work (the Simple OWL project) to map legacy semantic types to OWL classes.

I wonder why the authors focused only on nouns. Certainly, converting verb frames to RDF is non-trivial. However, instead of an all-or-nothing solution, the authors could have chosen to translate just parts of the information available for other parts-of-speech, e.g. just the semantic units.

I am not quite satisfied with the example that lexical senses for "synagogue" and "mosque" could both have ReligiousBuilding as its ontological mapping as a motivation for the lexical sense/ontological mapping distinction. The fact that the ontology lacks more fine-grained classes Synagogue, Mosque is an orthogonal issue that could be addressed using different types of mapping properties (similar to skos:broaderMatch). What you need to show is that even if the ontology had classes for Synagogue and Mosque, it would still make sense to have separate lexical sense entries for those terms, perhaps by providing evidence for some sort of fundamental ontological difference between ontological classes and lexical senses.

Overall, it is good to see this data available with an open-source license.

Small mistakes and issues:
- There is another submission to the same journal special issue that uses capitalized forms "PAROLE", "SIMPLE" instead of "Parole", "Simple". If both are accepted, it would be good to harmonize the notation.
- The figures need a little bit of work. The caption of Fig. 1 needs to explain what the purpose of each column is. Fig. 3 and Fig. 6 are very hard to read and definitely should be indented properly to be more readable.
- "IdentifiersUnique"...
- OWL does not stand for "Ontology Web Language"
- "sense objects was"
- Table 2 needs to use commas for large numbers and capitalize headings consistently.
- "consider.The"
- inconsistency: "Usems" vs. "USems"

Review #3
Anonymous submitted on 24/Apr/2013
(Editor note: Review was submitted after decision letter was written).
The authors describe the conversion of a subset of the lexical items in a large-scale, multi-layered Italian language lexicon Parole Simple Clips (PSC) based on the Lemon linked data model.
After a good introduction in section 1, the authors show the Lexical Ontologies with Lemon in section 2. It is unclear why they decided to use Lemon. A comparison with other models would be helpful to understand this decision.

Different own projects related to previous work are presented in section 3, as well as "Parole Simple Clips" and "Simple OWL" .

The authors present in section 4 the conversion of the semantic units in PSC into corresponding objects and in the Simple-OWL lexicon. It stays unclear how many semantic units were converted and how is the general coverage.

The overview of the distribution and structure of dataset is given and presented with different examples.
It would be nice, if these example would be explained in more details. How did you decide to convert the data? What is the purpose?

Different work that has been done in the past has not been referenced, as e.g. work done by Gangemi et al. or De Luca et al. on RDF/OWL and the use of WordNet/EuroWordnet with ontologies.

The paper is concluded with a short discussion of possible improvements and future work.