Review Comment:
The article studies systems able to process queries over a unified schema using multiple (possibly heterogeneous) data sources.
Besides the review of a large number of publications and systems, the paper introduces a framework that allows for the systematic study of systems according to different dimensions that could be of interest for different types of potential users.
The scope of this survey is wider than existing surveys as it covers multiple data models (relational, RDF, others), analyzes the characteristics of a considerable number of systems (both academic and industrial) and considers a fair number of dimensions.
The comments and doubts that I expressed in my previous review were considered in this new version. The authors made considerable changes to the paper to include more details for some of the dimensions, be more precise describing the methodology that they followed, and make the text of the paper easier to follow for readers with limited background in the area.
I have only a few doubts regarding the current version of manuscript:
It is unclear what "tuned for recall over precision" means under "The selection of industrial systems" (page 9). How could this "tuning" be understood (or reproduced) by others? How much higher is the weight of recall?
Systems such as CostFed, Odyssey, SemaGrow, and SPLENDID have been evaluated using LargeRDFBench (e.g., in [41], mentioned on page 29). Since this benchmark includes queries with UNIONs and OPTIONALs and the systems are able to process the queries without using "another, fully-fledged, SPARQL federation engine", then what is mentioned as part of the third observation about "Query language" (page 17) does not seem accurate.
Under "Source selection and query partition", the sentence "Other systems, like HiBISCuS, propose a refinement of the query-based strategy where the probing query has a complex structure based on the hypergraph underlying the input SPARQL query." seems inaccurate. In reference [66], Algorithm 1, line 17, only one subject, one predicate, and one object seem to used to perform "the probing query", but the sentence seems to suggest that [66] uses probing queries with a complex structure.
Why is reference [138] included in Table 9? This reference presents a data federation system and does not seem to focus on evaluation of systems more than the other systems included in Table 10.
Possible typos:
"consists in" (second paragraph, page 2)
"this sub-dimensions" (point 1, page 25)
|