Abstract:
Given the explosive growth in both data size and schema complexity, data sources are becoming increasingly difficult to use and comprehend. Summarization aspires to produce an abridged version of the original data source highlighting its most representative concepts. In this paper, we present an advanced version of the RDF Digest, a novel platform that automatically produces and visualizes high quality summaries of RDF/S Knowledge Bases (KBs). A summary is a valid RDFS graph that includes the most representative concepts of the schema, adapted to the corresponding instances. To construct this graph we designed and implemented two algorithms that exploit both the structure of the corresponding graph and the semantics of the KB. Initially we identify the most important nodes using the notion of relevance. Then we explore how to select the edges connecting these nodes by maximizing either locally or globally the importance of the selected edges. The extensive evaluation performed compares our system with two other systems and shows the benefits of our approach and the considerable advantages gained.