Multilingual Linked Data Patterns

Jose Emilio Labra Gayo
Dimitris Kontokostas
Sören Auer

Survey Article
The increasing publication of linked data makes the vision of the semantic web a probable reality. Although it may seem that the web of data is inherently multilingual, data usually contain labels, comments, descriptions, etc. that depend on the natural language used. When linked data appears in a multilingual setting, it is a challenge to publish and consume it. This paper presents a survey of patterns to publish Multilingual Linked Data and identifies some issues that should be taken into account. As a use case, the paper describes the patterns employed in the DBpedia Internationalization project.
Review #1
By Felix Sasaki submitted on 01/Aug/2013
Minor Revision
Review Comment:

This paper is well written, from experts in the fields: I have only a few editorial comments for a very minor revision.

1) Here "It is convenient to distinguish between internationalization,
localization and translation." I would add a reference to

2) Here "It is convenient to distinguish between the
code point and the glyph of a character." I would refer to

3) Here "glyph is mainly the particular image representing
a character or set of characters." I would replace "set" with "repertoire".

4) I would add a reference to the "Best Practices for Multilingual Linked Open Data" Community Group

5) The authors have not touched upon the topic of testing or other kind of tool support: with a given data set, how can an author of multilingual linked open data assure that he / she has done "the right thing"? Maybe they can add a sentence about whether they thing that topic is useful and what in terms of tooling would be involved (e.g. from simple SPARQL queries to more complex means of testing).

Review #2
By Jorge Gracia submitted on 12/Aug/2013
Review Comment:

This paper presents an excellent review of patterns for Multilingual LD in a very clear and instructive manner. The comments of the reviewers have been addressed well and the paper has gained in clarity. I think that it is ready for publication after a further review for minor style issues and typos. Here are a few:
- Section 1: "In this paper, we collected, justified and explained a comprehensive set of those patterns." -> I would say "... we collect, justify and explain a comprehensive set of..."
- Section 2: "to identify Armenian, one should use hy (Hayeren) instead of am which identifies the country Armenia but the language Amharic spoken in Ethiopia" -> I would say "...which does not identify the country Armenia but..."
- Section 3, "...we may be interested to declare that Juan is Professor but..." -> "...we may be not interested in declaring that Juan is a Professor but..."
- Section 3. "50 is represented as ?? in ancient Armenian". I cannot read ?? in the document.
- In Section 3, after the definition of "multilingual data", there is a series of very short paragraphs that could be joined in few ones to facilitate readability.
- Section 4. When enumerating the patterns at the beginning, the first sentence after "Longer descriptions" (starting "We consider...") is redundant with the rest and can be omitted.