Countering language attrition with PanLex and the Web of Data

Tracking #: 560-1766

Patrick Westphal
Claus Stadler
Jonathan Pool

Responsible editor: 
Guest editors Multilingual LOD 2012 JS

Submission type: 
Dataset Description
At present, there are approximately 7,000 living languages in the world. However, some experts claim that the process of globalization may eventually lead to the world losing this linguistic diversity. The vision of the PanLex project is to help save these languages, especially low-density ones, by allowing them to be intertranslatable and thus to be a part of the Information Age. Semantic Web technologies can support achieving this goal, for reasons such as their capabilities of flexibly representing, interlinking and reasoning with data, in our case particularly linguistic resources and annotations. Conversely, an RDF version of PanLex makes a significant contribution towards improving the coverage of the Linguistic Web of Data, as to the best of our knowledge there exists no large scale Linked Data data set for panlingual translation of non-mainstream languages. In this dataset description paper we detail how we transformed the data of the PanLex project to RDF, established conformance with the lemon and GOLD data models, interlinked it with Lexvo and DBpedia, and published it as Linked Data and via SPARQL.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Steven Moran submitted on 29/Nov/2013
Review Comment:

The authors have addressed my concerns from my previous review:


This statement (first paragraph) has been clearly rejected and repeating it, even as citation, probably does the field more harm than good:

Nonetheless, some experts claim that processes such as nation-state consolidation and globalization are producing language attrition so rapidly that up to 90% of all languages alive today will be extinct within a century [7].


And a post about these findings at the Long Now Foundation's Rosetta Stone:


which has led to emergence of the Web of Data ->
which has led to the emergence of the Web of Data

- 2x:

more fine grained ->
more fine-grained

in an English to German dictionary ->
in an English-to-German dictionary

- maybe break this line so it doesn't run out of the margin:

"abbreviated with plx."

- unclear what is meat here (maybe one-to-some? the relation?):

"Since these models differ from the PanLex one to some extent"

- this footnote is smaller than the others:

15 http://wifo5- 03.informatik.uni-

- missing comma between footnote numbers:

SPARQL endpoints16 17


- the link returns a 502 Bad Gateway


Review #2
Anonymous submitted on 15/Dec/2013
Review Comment:

This is a new revision of a previously reviewed paper. My suggestions from the previous rounds have mostly been addressed.
It is quite concerning that much of the data comes from sources without any clear licensing information, but addressing this is an issue that is beyond the scope of this paper.
Overall, I suggest accepting this paper for publication now.

Minor comment: Various related work is only referenced with an URL. Since this is a scientific work, please add proper citations for DBpedia Spotlight, Lexvo, DBpedia, etc.

Review #3
Anonymous submitted on 16/Dec/2013
Review Comment:

I reviewed two earlier versions of this paper and raised a number of questions during those reviews and, in my second review, also questioned the extent to which the second revision properly addressed issues raised during the first review by various reviewers, including myself.

The current revision appears to me to be much more thorough than the second revision and, while the authors have not directly addressed all the concerns previously indicated at the level of detail I might have liked to have seen, it seems to me that the paper is now essentially publishable as the comments of previous reviews have been adequately addressed. Moreover, between the first time I reviewed the paper and now, the online accessibility of the described resource has improved considerably, which makes it all the more important to have a paper like this published.

I have only a few minor comments at this point, which I don't think will take much time to address. This is why my recommendation is now accept. Of these comments, the only one I think is essential to address is the first one, since, if it is not addressed, one of the figures will be essentially unreadable.

Figure 1: The text in this figure is very hard to read, especially the edge labels, which I could only read by zooming in considerably. This should be fixed before publication.

Figure 2: Perhaps the caption here (or the text) could indicate that this schematization specifically includes lexical information for the word "between"?

Table 5: Is there any measure available for the accuracy of these links?