Abstract:
The rapid growth in size and complexity of knowledge graphs (KGs) available on the web has created a pressing need for efficient and adaptive methods to facilitate their understanding and exploration. Recently, semantic summaries have emerged as a means to quickly comprehend and explore large KGs. However, most existing approaches are static, failing to adapt to user needs and often struggling to scale. In this paper, we introduce iSummary, a workload-based and scalable approach for constructing selective summaries tailored to specific user requests. Unlike prior methods that process the entire KG, iSummary leverages query logs, exploiting the collective knowledge embedded in past user queries to identify relevant resources and relationships. The summarization process operates linearly with respect to the number of queries, enabling incremental and scalable summary generation even for large workloads. We formally define the (λ, κ)-Selective Summary problem, provide an approximate and efficient algorithm with theoretical guarantees, and evaluate it on two real-world datasets. Experimental results demonstrate that iSummary consistently outperforms existing techniques in both coverage and efficiency, producing high-coverage summaries up to 40× faster than state-of-the-art approaches.