Publishing planned, live and historical public transport data on the Web with the Linked Connections framework

Tracking #: 2629-3843

This paper is currently under review
Julian Rojas
Harm Delva
Pieter Colpaert
Ruben Verborgh

Responsible editor: 
Axel Polleres

Submission type: 
Full Paper
Exposing transport data on the Web for consumption by others poses several challenges for data publishers. In addition to planned schedules, access to live schedule updates (e.g. delays or cancellations) and historical data is fundamental to enable reliable applications and to support machine learning use cases. However publishing such dynamic data further increases the computational burden of data publishers, resulting in often unavailable historical data and live schedule updates for most public transport networks. In this paper we apply and extend the current Linked Connections approach for static data to also support cost-efficient live and historical public transport data publishing on the Web. Our contributions include (i) a reference specification and system architecture to support cost-efficient publishing of dynamic public transport schedules and historical queries; (ii) an empirical evaluation of the impact that API design aspects such as data fragmentation size, have on query evaluation performance for the route planning use case; (iii) an analysis of potential correlations of query performance with particular public transport network characteristics such as size, average degree, density, clustering coefficient and average connection duration. Results confirm that fragmentation size indeed influences route planning query performance and converge on an optimal fragment size per network, in function of its size, density and connection duration. Our approach proves to be feasible for publishing live and historical public transport data and supporting efficient route planning use cases. Yet, for bigger networks further optimizations are needed to be useful in practice. Careful design of data fragmentation strategies constitute an important factor for cost-efficient, scalable and usable publishing on the Web. Additional dataset fragmentation strategies (e.g. geospatial) may be studied for designing more scalable and performant Web API s that adapt to particular use cases, not only limited to the public transport domain.
Full PDF Version: 
Under Review