The LEGO Unified Concepticon

Tracking #: 400-1508

Authors: 
Jeff Good
Shakthi Poornima
Timothy Usher

Responsible editor: 
Guest editors Multilingual Linked Open Data 2012

Submission type: 
Dataset Description
Abstract: 
The most widely available kind of linguistic data from a cross-linguistic perspective are wordlists, where forms from a language are paired with generalized semantic concepts to indicate the best counterpart for those concepts in that language. Wordlists can be readily understood as a pre-digital form of linked data insofar as standardized concept lists have frequently been employed in their construction to facilitate cross-linguistic comparison, in particular to aid in efforts to ascertain patterns of relatedness among large sets of languages. This paper describes the LEGO Unified Concepticon, a resource which expresses which concepts in a number of widely used concept lists can be understood as the same in order to allow wordlists collected using different concept lists to be more readily compared. While the resource itself contains relatively limited information on the concepts it describes, it can nevertheless serve as a means to link together forms across wordlists and has already been employed to these ends in the creation of linked wordlists across more than a thousand languages. Moreover, it has the potential to serve as the foundation for further applications in cross-linguistic semantic comparison using linked data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 28/Jan/2013
Suggestion:
Reject
Review Comment:

This paper presents the LEGO Unified Concepticon, namely a lexicalized concept repository obtained from the combination of pre-existing machine-readable lexical knowledge resources on the basis of a data model which allows linguistic data linking and integration.

All in all, the work described in this paper crucially needs to be better contextualized and compared with other similar ongoing efforts, in order for the overall contribution to become clear and ripe for publication. As explicitly mentioned from the authors, the key point of the proposed contribution is to define a data model in order to be able to link, combine and allow inter-operability between a variety of pre-existing structured lexical resources. But while this is a very active area of research, the paper delivers little novel insights in this direction. Overall, the paper is on the wordy side and it is not clear, in fact, what the advantages of the proposed data model are in comparison with, for instance, other existing frameworks for linguistic linked data like Lemon (which the authors correctly acknowledge, cf. ref. [8]) or the UBY-LMF model presented by Gurevych et al.

Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer, and Christian Wirth: UBY – A Large-Scale Unified Lexical-Semantic Resource Based on LMF, in: Proc. of EACL-12, p. 580-590.

Besides, the size of the lexical knowledge repositories mentioned in the paper tends to be extremely small (i.e. only a few hundreds or thousands of entries): what is relation with ongoing efforts in creating massive repositories of multilingually lexicalized knowledge like the aforementioned UBY or BabelNet?

Navigli, R. and Ponzetto, S. P. (2012): BabelNet: The Automatic Construction, Evaluation and Application of a Wide Coverage Multilingual Semantic Network. Artificial Intelligence, 193, 217-250

Could the LEGO model be applied to these resources as well? If yes, what would the advantages be, e.g. in comparison with other inter-operable models?

Review #2
Anonymous submitted on 22/Mar/2013
Suggestion:
Reject
Review Comment:

I had difficulties in reading the paper, but I think I got the main ideas. Unfortunately the web pages http://lego-wordlists.googlecode.com/files/LegoUnified.rdf and http://code.google.com/p/lego-wordlists/downloads/, mentioned in the submission are not reachable.

If I understand well the direction: I have a meta conceptlist that establishes similarities between concepts used in distinct word lists which are pointing to concepts, and so help in establishing cross-lingual links between those word lists.

Here a first point: the authors compare their approach with DBpedia, but I think that a comparison with various WordNet families (also available in RDF and published in the LOD) is more appropriate.

A main question is then how and why the LEGO Concepticon differs from SUMO/WordNet like organisation of lexical resources.

Another topic: I am not sure if the keys used in Wordlists are necessarily meanings and the values only wordforms, but what intended is, is clear: how does one express the English word "man" in French: "homme".
If we want to keep the vocabulary used in the submission: if we reverse the relation, going from French to English, do we map from a word form to a concept? Or do we need another (French) conceptualization so that the French concept "Homme" is linked to both "man" and "human being" ?

In any case I do not think that when wordlists are created, those aspects are considered by the creators. Looks more like easy to handle very basic specific translation helps for readers of a text.

Not sure if the presence or not of grammatical information is important in the distinction between wordlist and dictionary entries. But the fact that "chien" is associated to the concept "DOG" is a semantic information I guess, although this is denied by the authors.

"As will be seen, our linked data representation of wordlists deviates from the traditional model in not directly containing concept labels but, rather, references to concepts described via labels in an external concepticon." Is this not the way RDF encoded taxonomies / ontologies are working? Why introduce here the "concepticon" topic? One can take any wordlist and link its elements to concepts in an existing or to be created taxonomy/ontology. Or not?

"The structure of the concepticon is schematized in
Figure 2. The concepticon is a container (associated
with metadata not depicted in the figure), which consists
of unified concepts which are themselves containers
for concepts from the concepticons". Hard to understand. Looks circular!

"As can be seen, the information encoded in these
wordlists is quite sparse~for instance, it only includes
concept identifiers, not concept labels. Therefore, in
order to reconstruct the information associated with
traditional wordlists (as in (1)), the unified concepticon
must be merged with the wordlist." Why were the concept labels lost. I do not understand.... Why not include them from the very beginning? And what are concept labels in fact.....?

So again: Not clear how the LEGO fulfil other goals as the Universal Word Net effort.

No idea on how concepts are encoded.

Can LEGO not be entirely expressed in SKOS-XL?

The work here seems to be very idio-syncratic.
Also no indications of level of automation, number of items/lexical entries etc. (but I could not reach the quoted web pages)

Most of the negative comments are due to the fact that I was looking to a contribution to the Multilingual Linked Open Data, and the submission doesn't seem to offer a concrete contribution or a relevant data set by now. My comments are not about the intrinsic value of the word described.

Review #3
By Pablo Mendes submitted on 02/Apr/2013
Suggestion:
Major Revision
Review Comment:

The article presents an interesting resource and in its current state is already a contribution. I feel like it offers a chance to reach out and bridge the gap between two communities.

The article could provide a bit more background in linguistics, if the target audience is the Semantic Web community. I made some suggestions below on some points where I think more details could be added.

The article could also expand a bit more on what is gained by using Linked Data, for the linguistics community too. Particularly, I'd like to see highlighted the "potential to serve as the foundation for further applications in cross-linguistic semantic comparison using linked data." Can such an application scenario be described somewhere in the article?

Section 1

"The current version was most recently modified in August 2010." -> this is 3 years ago. Are there plans to modify it again?

Typo:
make relevant other remarks -> make other relevant remarks?

Section 2

Since this is a Semantic Web Journal, the general public could also benefit from a short description of the differences between:
- lexicon
- dictionary
- wordlist

The reference in text, together with the caption for Figure 1 seems to imply that lexicon and dictionary are taken as the same thing here?
Is Wordnet a concepticon?
I can see how it is academically interesting to think about concepts and what are the related words in other languages, but I also think it would be worth mentioning in the article a few practical applications of your concepticon. Can it be used as "gazetteers" for NER, for example? As input for machine translation?

Do concepticons focus on general concepts such as "dog", "fire" and "water" or do they also include more specialized "entities" or "events" such as movies and wars, for example?

Section 4

What is the difference between linguistic sign and lexical entry?

"add links from concepts in the unified concepticon to ... DBpedia" -> I would also be interested in hearing what is the relationship of the proposed concepticon and the DBpedia Lexicalization dataset [1]. The link form DBpedia concepts to their "linguistic signs" observed in Wikipedia is a natural extension of the work described here? Or are they fundamentally different?

[1] Pablo N. Mendes, Max Jakob and Christian Bizer. DBpedia for NLP: A Multilingual Cross-domain Knowledge Base. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2012, 21–27 May 2012, Istanbul, Turkey.

Section 5

The relation to DBpedia is key to understanding what your project is for the Semantic Web community, as DBpedia is one of the most well understood examples of Semantic Web technology. I wonder if perhaps something like the discussion in Section 5 could not be superficially introduced earlier, so that the reader has a clearer understanding of the kinds of linguistic resources that you are talking about there.