Lexvo.org: Language-Related Information for the Linguistic Linked Data Cloud

Tracking #: 521-1722

Gerard de Melo

Responsible editor: 
Guest editors Multilingual Linked Open Data 2012

Submission type: 
Dataset Description
Lexvo.org brings information about languages, words, and other linguistic entities to the Web of Linked Data. It defines URIs for terms, languages, scripts, and characters, which are not only highly interconnected but also linked to a variety of resources on the Web. Additionally, new datasets are being published to contribute to the emerging Linked Data Cloud of Language-Related information.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jose Emilio Labra Gayo submitted on 03/Oct/2013
Review Comment:

I have read the paper again and I think it has corrected all the issues that the reviewers had suggested.

One minor typo in page 5, paragraph 3. "Lexvo.org backed by..."

Review #2
By Menzo Windhouwer submitted on 29/Nov/2013
Review Comment:

This paper is a well written description of lexvo.org, which provides an interesting hub for language-related information in the linked data cloud.

(0) Information on the data set

General information like name, URL, and version date are given, although the specific version is not available for download on the website (http://www.lexvo.org/linkeddata/resources.html).

(1) Quality of the data set

Lexvo.org integrates important but scattered data sets and code tables into the linked data cloud. The quality of the source data can vary depending on the origin, the quality of Lexvo.org lies in the mapping from one the other which is sound. Some of the data sets and code tables are updated regularly. The paper contains a sketch of how the update process of lexvo.org works. More details would be interesting, i.e., how is dealt with deprecated codes especially splits, but maybe in a next paper.

(2) Usefulness

The current integrated set of data sources and code tables is very powerful en functions already as a hub between other linguistic data sources. The paper mentions several of these users. Some metrics and statistics on this connectivity will strengthen these claims.

Furthermore some suggestion for additional entry points/data sets:

* many older resources still use SIL Ethnologue 14 (or older) language codes, using the code tables available at http://www.ethnologue.com/ its possible to create mappings from 14 to 15 and thus to ISO 639-3; making version 14 codes available would help link in older data sets

* I think an entry point of full @xml:lang tags (see http://tools.ietf.org/html/bcp47), e.g., "sr-Latn-RS" represents Serbian ('sr') written using Latin script ('Latn') as used in Serbia ('RS')" would be valuable to be able to follow the information available on the various parts without the need to understand BCP 47, i.e., it would enable easier linkage to any dataset using @xml:lang

(3) Clarity and completeness of the descriptions

In general the descriptions are clear. I think it would improve section 3.1.1, where the steps to construct a term URI are given, if a small example was added to see what is going on, e.g., just a result URL like http://lexvo.org/id/term/cmn/%E6%9C%8B%E5%8F%8B

Review #3
By John McCrae submitted on 02/Dec/2013
Review Comment:

This paper describes the lexvo.org resource, which is a high-quality resource and presents a depth and width of linguistic data that would make it highly useful for a wide-variety of users. The paper describes this data well and the authors have met all the criticisms from reviewers in the first round. As such I strongly recommend that this paper be accepted.

I have a few minor issues that should be clarified in the final version of the paper:

s2.3: Do the authors define what language families are from their own research or is this data drawn from ISO 639-5?
s3.1: What exactly does the Java API do over the linked data interface? Are there expected to be implementations for other languages?
s5: "fairly up-to-date"... could the authors provide the exact update frequency of the resource?

Finally, the references section of the paper is sadly a mess. Here are some of the issues

Use full names for conferences: 1,3,6,8,14,23
Include pagination: 1,3,6,10,14,17,23
Capitalize titles correctly: 2,21,22,24
Include first names: 13, 16, 19
5 has no conference
18 has no publisher
21 is "in collection" but should be proceedings
20 - Please cite lemon as:
McCrae, John, Guadalupe Aguado-de-Cea, Paul Buitelaar, Philipp Cimiano, Thierry Declerck, Asunción Gómez-Pérez, Jorge Gracia et al. "Interchanging lexical resources on the semantic web." Language Resources and Evaluation 46, no. 4 (2012): 701-719.