Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

qEndpoint: A Novel Triple Store Architecture for Large RDF Graphs

Submitted by Antoine Willerval on 01/17/2024 - 07:38

Tracking #: 3616-4830

Authors:

Antoine Willerval

Dennis Diefenbach

Angela Bonifati

Responsible editor:

Aidan Hogan

Submission type:

Full Paper

Abstract:

In the relational database realm, there has been a shift towards novel hybrid database architectures combining the properties of transaction processing (OLTP) and analytical processing (OLAP). OLTP workloads are made up by read and write operations on a small number of rows and are typically addressed by indexes such as B+trees. On the other side, OLAP workloads consists of big read operations that scan larger parts of the dataset. To address both workloads some databases introduced an architecture using a buffer or delta partition. Precisely, changes are accumulated in a write-optimized delta partition while the rest of the data is compressed in the read-optimized main partition. Periodically, the delta storage is merged in the main partition. In this paper we investigate for the first time how this architecture can be implemented and behaves for RDF graphs. We describe in detail the indexing-structures one can use for each partition, the merge process as well as the transactional management. We study the performances of our triple store, which we call qEndpoint, over two popular benchmarks, the Berlin SPARQL Benchmark (BSBM) and the recent Wikidata Benchmark (WDBench). We are also studying how it compares against other public Wikidata endpoints. This allows us to study the behavior of the triple store for different workloads, as well as the scalability over large RDF graphs. The results show that, compared to the baselines, our triple store allows for improved indexing times, better response time for some queries, higher insert and delete rates, and low disk and memory footprints, making it ideal to store and serve large Knowledge Graphs.

Full PDF Version:

swj3616.pdf

Previous Version:

qEndpoint: A Novel Triple Store Architecture for Large RDF Graphs

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Miel Vander Sande submitted on 23/Feb/2024

Suggestion:
Accept

Review Comment:

This paper introduces qEndpoint: a triple store architecture where a main data partition optimised for reads is accompanied by a data partition optimised for writes. The write partition contains the delta and is merged in the main partition whenever it becomes too large and inefficient. The self-indexed compressed read-optimised HDT format by Fernandez J. et. al. is used for the main partition and RDF4J native store is used for the write-optimised partition.

The revision fixes many of the issues I had with the paper:
- the authors added many clarifications and background to the paper. This significantly improves the readability for those who are not that familiar with the subject matter. However, I feel like the authors could have gone a little further. Also, some of the new (and old) text is sloppy and contains typos or spelling mistakes. I recommend carefully revising the language for the final version.
- there are references to OSTRICH and X-RDF-X. Also the introduction to SAP HANA improved. Wrt. OSTRICH and X-RDF-X though, I think what's being discussed is besides the point (e.g, only triple patterns or being unmaintained). More interesting are details on the architectural similarities.
- an additional experiment on merge time and a note on benchmarking, which motivates the choice for Berlin a lot better. That said, I still believe the choice of benchmarks is rather limited. Watdiv would have been interesting to see how triple pattern performance evolves with the size of the data/merge partition. With respect to merges: I wonder why the authors did not consider https://aic.ai.wu.ac.at/qadlod/bear.html ?

To summarize: I'm a bit dissapointed to see the authors focused on changing as little as possible> That being said, I think the papers was sufficiently improved in order to be published.

Review #2

Anonymous submitted on 20/Mar/2024

Suggestion:
Accept

Review Comment:

It looks like my previous concerns/questions have been clarified, and the missing discussion has been added.

Log in or register to post comments
2476 reads

Main menu

Editorial Board

Syndicate

qEndpoint: A Novel Triple Store Architecture for Large RDF Graphs

Tracking #: 3616-4830

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

qEndpoint: A Novel Triple Store Architecture for Large RDF Graphs

Tracking #: 3616-4830

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles