Abstract:
Knowledge Graphs (KGs) are key technologies that enable enhanced understanding, knowledge representation, reasoning, and interpretation of complex data. The use of RDF KGs relies heavily on SPARQL queries for knowledge retrieval and manipulation. Analyzing SPARQL query logs can provide valuable insights into KG usage, revealing patterns in user behavior and interactions with the data. Prior studies have analyzed these logs in terms of their syntax and structure, but little is known about what parts of a KG are queried by users. This paper introduces content-based methods for analyzing KG usage, specifically examining the extent to which SPARQL queries cover the KG schema over defined time periods. We examine organic and robotic queries of Wikidata and Bio2RDF. Robotic queries have high schema coverage (60-97%), whereas organic queries exhibit lower schema coverage (14-21%). Both datasets exhibit a sharp decline in usage frequency indicating that a large set of schema elements is only infrequently queried. We perform statistical assessments to discover the trends and shifts in schema element usage across different KG versions, query types, and log intervals. Our work sheds light on KG usage, highlights frequently used schema elements as well as underused elements, and provides guidance for improvements in documentation, schema design, and performance optimization.