Disclose High Quality Structured Data with Airpedia

Tracking #: 1011-2222

Alessio Palmero Aprosio
Marco Fossati
Claudio Giuliano

Responsible editor: 
Philipp Cimiano

Submission type: 
Tool/System Report
The advent of Wikipedia as the best digital representation of cross-domain knowledge is now a reality. The DBpedia project aims at converting Wikipedia content into structured intelligence through the Linked Open Data paradigm and currently holds a vital role in the growth of the Web of Data as a multilingual interlinking kernel. However, its main classification system relies on a time-consuming manual procedure that aims at mapping Wikipedia infobox data to a common ontology. We present Airpedia, a tool that automatically generates class and property mappings for any DBpedia language chapter. We support our findings with the deployment of the Swedish, Ukrainian, and Esperanto DBpedias. Evaluations demonstrate that Airpedia is not only comparable to humans in terms of precision, but also provides a recall leap, with special regard to entities lacking infoboxes.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Heiko Paulheim submitted on 20/Mar/2015
Minor Revision
Review Comment:

The paper introduces an approach which, given existing language editions of DBpedia, automatically creates DBpedia chapters for new languages by automatically computing infobox mappings, and applying statistical methods to complete the typing information in those chapters.

On the positive side, the paper shows very mature work with good empirical results. Evaluations for all of the steps are carried out with care, and the resulting datasets are published, hence making a contribution for the Linked Open Data community.

The related work is semi-complete, I miss, e.g., a reference to [1], which seems to be rather close. Furthermore, the related work section is merely an enumeration of approaches, but I miss some sentences defining the difference and novelty of the works presented in this paper w.r.t. the state of the art presented.

My main concern with the paper is a lack of self-containedness. I understand that this paper is a distilled report of quite a few previous research efforts, which in principle is fine. However, in certain places, it is impossible to understand the paper without looking into the referred papers. In those places, more details should be added, so that the paper can be consumed as a self-contained piece of information. Furthermore, the paper lacks clarity in different places. I will detail them below.

The main section describing the approach (3-5) miss an introduction briefly describing the "big picture" of how the different pieces fit together. A suggestion for fixing this is moving section 6 before 3, and adding a high level picture.

Section 3 should add a more detailed discussion of the performance differences on the different language editions. The authors talk about "heterogeneous structure of infoboxes", but some more details would be appreciated. In particular, the algorithms are described rather briefly and vaguely.

In the evaluation in section 3, it is not clear whether micro or macro average of precision and recall are used. In the example given in Fig. 3, recall and precision would both be 2/3, given that owl:Thing is not included. If another infobox which relates to an entity on the same level as Agent is mapped correctly, the corresponding recall and precision would be 1. Is the final score based on macro (i.e., (2/(3+1))/2=5/6) or micro average (i.e., (2+1)/(3+1)=3/4)? I feel like the former would be more appropriate to level out effects caused by a heterogenous depth of the DBpedia ontology in different branches, but in any case, it should be clearly stated whether the evaluation based on micro or macro averages.

The evaluation section 4.2 left me a bit puzzled. It is stated that a human-created gold standard is used for the evaluation. Then, Fig. 4 also reports the quality of human annotations. If it is the human-created gold standard evaluated against itself, the F1-measure should be perfect. If not, it is not clear what "Human" refers to.

Section 5.2 contains an evaluation diagram depicting different variants (bottom-up vs. hybrid), but they are not appropriately introduced in the text. Here, a bit more detail would also be appreciated.

Finally, for the evaluation in section 7, I would have appreciated some statements about the runtime as well to discuss the scalability of the approach.

While these are quite a few points of critique, I believe that it should be fairly easy for the authors to address each of those points. Thus, in my opinion, a minor revision of the paper is sufficient.

* Not sure whether this is an issue with my printer driver, but a few special characters did not appear correctly in my print
* p.1: stating that Wikipedia is the "best digital approximation of encyclopedic human knowledge" is a bold statement. Depending on the purpose and information need, others could be better. It would not harm to tone down this statement.
* p.2: To readers infamiliar with DBpedia, it might not be clear what a "chapter" is in this context - a definition should be added.
* p.3. The term "pivot language" should be defined.

[1] Volha Bryl, Christian Bizer. Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion. In Proceedings of the 4th Joint WICOW/AIRWeb Workshop on Web Quality Workshop (WebQuality) @ WWW 2014

Review #2
By Marc Spaniol submitted on 03/Apr/2015
Review Comment:

This paper introduces Airpedia, a tool that discloses high quality structured data from Wikipedia. In general, the paper is relevant to this edition of the journal. Nevertheless, the paper lacks clarity in many points of view. Firstly, the overview on related research is quite comprehensive, but when it comes to describing the own workflow, it remains unclear, why the many already existing approaches can't be used to achieve the very same effect. Secondly, this paper appears to be in many places more to be a "hack", instead of a principled approach. For instance in section 3 the authors explain that a threshold of L 0.5 is the solution in order avoid spurious. Well, given their example this is the case, but a systematic analysis is something completely different. Similarly, the subsequent evaluation is questionable. The way the authors describe their experiment, the correct mapping of a very specific sub-class, e.g. "Tennis players at the 1984 Summer Olympics" (which, btw, might be relatively easy to map) might generate positive counts along all the path up the class hierarchy. Hence, scores are somewhat dubious here. Thirdly, the mapping itself is somewhat another black-box. The mapping extraction is based on the highest similarity pairs S(A_I,R), but how they are computed remains the secret of the authors. Lastly, the size of the experiments seems to be EXTREMELY small and are, therefor, considered to be shallow. In section 4.2 the authors state that a total of 15 infoboxes have been annotated for a total of 100 different attributes. Even more, those infoboxes were randomly drawn from the 100 most frequently used infoboxes, which in returns means that these infoboxes are supposed to be the most "coherent" and best "maintained" at the same time. Similarly, experiments in section 5.2 have been conducted on 500 entities, where 50 more entities have been manually annotated. As such, and given the overall size of Wikipedia, these experiments are really SMALL scale. Even more one wonders, why those experiments have not been conducted on "virtually blinded articles" (meaning articles, where real categories been removed for the experiments that could serve as ground truth).

Review #3
By Laura Hollink submitted on 04/Jun/2015
Review Comment:

This paper discusses a set of tools for generating class and property mappings between Wikipedia infoboxes and the DBpedia ontology, and for classification of untyped DBpedia resources. It is particularly useful for the generation of new DBpedia language chapters.

## Contribution
The three tools described in the paper are also discussed in three separate older papers by the authors. The authors clearly signal this at the end of each respective section. The current paper aims to present the three tools in one unified framework. While I appreciate this, in my opinion the paper does not have enough added value over the three previous papers to justify publication in the journal. The added value of the current paper over the previous ones seems to be:
1. extra evaluation of two of the three tools. However, these evaluations do not compare the tools to other approaches. It is therefore not clear if the presented tools are improvements over the state of the art. (See also my comments under related work)
2. three new DBpedia chapters. This is a very valuable contribution. However, the paper only contains four sentences about this part of the work, and does not provide the readers with any insights.

## Related Work
The related work section is missing one very relevant item: Wikidata. One of the major benefits of the Wikidata approach over DBpedia is the fact that it is one unified representation that can be used for several languages, rather than consisting of different chapters for different languages.

Other than that, the related work section is extensive, but lacks an indication of how the cited work is different from the current paper. For example: how are the class-extraction approaches of [34], [19] and [11] different from the tools described in this paper? The authors mention that only the English wikipedia was considered in these papers. However, that does not make it obvious that they cannot be easily adapted to other languages.

## Specific comments
-the tables 2-4 are not explained in the text.
-dito for figure 1. Either explain or leave out.
-In the introduction, very little information is given about what is actually done. I would include a description of methods and results to the current description of the contributions.