Review Comment:
SPARQL federated query debugging tool
Marek Moos, and Jakub Galgonek
The paper presents a federated SPARQL query debugging tool. The proposed tool operates as a web-based debugging proxy, intercepting and monitoring query execution in real time across all service endpoints without requiring their modification. By capturing detailed execution traces—such as query requests, responses, durations, and endpoint interactions—it enables precise identification of bottlenecks and error sources even in deeply nested federated queries. A case study demonstrates the tool’s effectiveness in diagnosing query failures due to silent endpoint-induced query modifications.
Overall the paper is well written and easy to follow.
Many papers on SPARQL federation engines already used a proxy to monitor data transfer, number of HTTP calls etc, but it was in the context of benchmarking. I think it is a good idea to have a well written and maintained proxy able to do that.
The tool is available on Github and an online demo (https://sparql-debugger.elixir-czech.cz/) is working. The user interface comes with examples and is easy to use. I particularly appreciate the “bulk execution node” allowing you to see nested loop joins. The UI is very reactive allowing it to follow the execution in “real-time”.
Concerning the UI, I regret that subqueries are not formatted ie. the query is formatted as “...endpoint/chebi?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E+”. Not easy to see the exact subquery sent to endpoints when debugging.
The tool is very “UI” oriented. Sometimes we just need a proxy with logs for post processing. I don’t see in the paper or in the github documentation (https://github.com/iocbbioinf/sparql_debugger) how to deploy the proxy for such a context.
It is also not obvious to me how i can use such a proxy with federation engine such as FedX, Comunica, Fedup etc. Is it possible to set up the proxy for federation engines or it works only with services query typed in the browser ?
If i read the paper as resource paper, I have also few questions/remarks:
* On the impact, I think that tool identified and almost fills a gap. Federated queries are hard to analyze/debug especially when used with federation engines. The approach is not original, but can be useful.
* On reusability, there is no evidence of a usage by a wide community of researchers/engineers. Maybe the authors know that, but it is not written in the paper (just an interesting use-case).
* On the design&technical quality; the UI part is well written and seems to work. However, I wonder if the tool can be used in a federated query benchmark with hundreds of endpoints and a very large number of subqueries. The documentation looks very poor to me. Where are the execution logs stored ? How is it possible to post-process logs ?? I don’t think another developer is able to reuse this code. It is also strange to me that I cannot download and install only the “proxy” itself without the UI. Also strange to see that .vscode or .idea has been committed on github.
* On the Availability, the source code is available on github. Maybe persistent UrI (PRUL,DOI) should be available. We don’t know if there are any plans to maintain this tool for the next few years.
To resume, i think the tool fills a gap for the federated query processing, but several important question remains:
* Q1: Is it possible to use the tool really as a web proxy without UI ? If yes, how to do that. Can you provide step by step the documentation to do that ?
* Q2: How to do post-processing with your tool ? Can you write a documentation explaining how everything is stored in the proxy and how it can be retrieved by programs ?
* Q3: Can you explain how the proxy is built such that another developer can maintain it ?
* Q4: Can tell us who are the current users of the tool and if you have a sustainability plan ?
|