Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

Similarity Joins and Clustering for SPARQL

Submitted by Sebastián Ferrada on 09/08/2023 - 18:25

Tracking #: 3540-4754

Authors:

Sebastián Ferrada

Benjamin Bustos

Aidan Hogan

Responsible editor:

Marta Sabou

Submission type:

Full Paper

Abstract:

The SPARQL standard provides operators to retrieve exact matches on data, such as graph patterns, filters and grouping. This work proposes and evaluates two new algebraic operators for SPARQL 1.1 that return similarity-based results instead of exact results. First, a similarity join operator is presented, which brings together similar mappings from two sets of solution mappings. Second, a clustering solution modifier is introduced, which instead of grouping solution mappings according to exact values, brings them together by using similarity criteria. For both cases, a variety of algorithms are proposed and analysed, and use-case queries that showcase the relevance and usefulness of the novel operators are presented. For similarity joins, experimental results are provided by comparing different physical operators over a set of real world queries, as well as comparing our implementation to the closest work found in the literature, DBSimJoin, a PostgreSQL extension that supports similarity joins. For clustering, synthetic queries are designed in order to measure the performance of the different algorithms implemented.

Full PDF Version:

swj3540.pdf

Previous Version:

Tags:

Reviewed

Long-term Stable Link to Resources:

https://github.com/scferrada/jena

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 11/Sep/2023

Suggestion:
Accept

Review Comment:

The revised introduction addresses sufficiently my previous comments.

Review #2

By Agnieszka Lawrynowicz submitted on 12/Jan/2024

Suggestion:
Accept

Review Comment:

I thank the Authors for their work done towards the revised version of the paper.
I am satisfied with the revisions and the answers.

Maybe what the Authors write in their response concerning Section 4.2 (Semantics) would be an excellent explanation on top of the discussion which can be found towards the end of Section 4.2: "Perhaps a key finding here is that properties that hold for (equi) joins in SPARQL do not necessarily hold for similarity joins, and our results show when this is the case, and why this is the case. This means that common optimisations applied in SPARQL engines (such as join reordering) cannot be applied “as is” for similarity joins, and thus that there are interesting open challenges on how to optimise such queries."

I am inclined to accept the paper.

Log in or register to post comments
923 reads

Main menu

Editorial Board

Syndicate

Similarity Joins and Clustering for SPARQL

Tracking #: 3540-4754

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Similarity Joins and Clustering for SPARQL

Tracking #: 3540-4754

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles