LSQ 2.0: A Linked Dataset of SPARQL Query Logs

Tracking #: 2866-4080

This paper is currently under review
Claus Stadler
Muhammad Saleem
Qaiser Mehmood
Carlos Buil-Aranda
Michel Dumontier1
Aidan Hogan1
Axel-Cyrille Ngonga Ngomo

Responsible editor: 
Philippe Cudre-Mauroux

Submission type: 
Dataset Description
We present the Linked SPARQL Queries (LSQ) dataset, which currently describes 45 million executions of 11.55 million unique SPARQL queries extracted from the logs of 27 different endpoints. The LSQ dataset provides RDF descriptions of each such query, which are indexed in a public LSQ endpoint, allowing interested parties to find queries with the characteristics they require. We begin by describing the use cases envisaged for the LSQ dataset, which include applications for research on common features of queries, for building custom benchmarks, and for designing user interfaces. We then discuss how LSQ has been used in practice since the release of four initial SPARQL logs in 2015. We discuss the model and vocabulary that we use to represent these queries in RDF. We then provide a brief overview of the 27 endpoints from which we extracted queries in terms of the domain to which they pertain and the data they contain. We provide statistics on the queries included from each log, including the number of query executions, unique queries, as well as distributions of queries for a variety of selected characteristics. We finally discuss how the LSQ dataset is hosted and how it can be accessed and leveraged by interested parties for their use cases.
Full PDF Version: 
Under Review