Review Comment:
Review "Similarity-based Graph Queries"
Summary:
This paper proposes the combination of similarity-based information retrieval queries with graph queries to improve LOD-enabled recommendation retrieval. To this end, the proposed approach combines SKOS annotations with SPARQL queries in a content-based recommender systems. The SPARQL queries are utilized to explore graph structures for the retrieval process rather than simply searching through RDF metadata. A comprehensive user study using a crowdsourcing platform is conducted in three domains as well as in a cross-domain study.
The novelty of the proposed approach comes from the combination of LOD-based graph pattern matching and SKOS-based similarity metrics for annotation matching in a recommender system. However, it is based on an existing LOD-based RS approach that considers similarity metrics, i.e., "Recommendations using Linked Data" [37]. In contrast to this related work, the proposed approach bases their similarity measure exclusively on SKOS. The structure of the paper and even more the internal structure of the individual sections make it very difficult to read and follow the main points the auhtors are trying to make. It requires a lot of effort to understand the exact workflow as well as the interaction between components of the proposed system. As an overall comment, I would say the approach is little innovative as it combines well-tested RS methods, however, on the positive side it might offer some interesting suggestions and insights to the LOD community based on its comprehensive user study.
Section by Section comments:
Introduction:
While important arguments regarding major contributions are provided, they are quite unstructured and difficult to extract. I had to read the introduction several times to really find the main points and am still not entirely sure about some of the arguments. I suggest streamlining and restructuring in the way that you have: one clearly structured paragraph on why graph pattern matching rather than only metadata are required with the Indie rock example, one clearly structured paragraph on similarity-based queries and what exactly you mean by similarity based queries, and one third paragraph why the combination makes sense. Right now it is one very long paragraph with too many and partially unclear arguments. For instance, the fact that LOD metadata query results have been frequently used in offline computations in RS approaches is irrelevant to the argument that metadata analysis alone is insufficient and graph structures need to be considered. In other words, there is no true connection established in this paragraph between the two co-occurring arguments of prior LOD queries and flat data structure. It is not until one page later that the connection to why online is important is established. The main argument seems to be that this combination of graph structure search to filter query results with similarity-based queries is novel.
Related work:
The problem of structuring arguments is continued in the related work. For instance, the introduced approaches of REQUEST etc. are not designed for LOD queries is followed by "However" LOD-compliant systems. The second argument does not contradict the first so the use of "however" is strange. Also for this section I strongly suggest presenting the identified research gaps in a structured format.
What do you mean by "restrictive requests" in your critique on the REQUEST method? Please clearly establish in which way the reqeusts are restrictrive because otherwise the argument is not clear and the paper not self-contained.
SKOS Recommender Section:
Instead of describing the individual components, which the subsections should do, it would be nice to have one short and coherent description on how the system moves from input to recommendation, considering the different additions to the main workflow such as pre- and post-filtering. I also do not understand the assignment of "importance ratings" (e.g. "almost as equally important") to the individual elements of your proposed system. One would presume that naturally all of them are important since otherwise you would not have included them in the first place or omited them prior to publcation. The visual representation of the architecture leaves connections and interactions between individual components other than high-level groupings open, so "From Figure 1, it can" not "be seen that the engine can interact with...".
The SPARQL query in Listing 1 contains underlined elements that are supposed to represent "SPARQL snytax elements", however there are considerably more SPARQL syntax elements in that listing than underlined, such as SELECT as indicated two lines later and introduced as "query keywords". In the query syntax, it says "that the variable (Var) occurs in the WHERE condition" - what is this variable?
The similarity-based approach is central to this paper, but only described first on page 7. Before that the notion of similarity is not even related to SKOS, so until p. 7 the reader is left wondering what kind of similarity the paper tackles.
Please explain how you understand on-the-fly recommendations. Even though some acronyms are highly frequent, they still should not be used without introducing their full form, such as the DCMI or IC.
The description of the contribution of this paper on p. 13 is yet the clearest. "While regular SPARQL queries perform... relevant items" and thereafter provide the best motivation for the whole approach.
Evaluation:
Two users are very few to test the user interface. Even though this is not the main contribution of the paper, the user interface can strongly influence the experiments. On the other hand, the consent form really is not crucial to this research. Either explain why it is important to not only describe it in detail but also provide a screenshot of it or omit it such a legnthy description. Most of those screenshots represent a whole web page in a very small format which is not legible in a print version and barely legible on screen.
Test cases 1: where users informed in advance what you mean by music act?
"It was done to gather data on the usefulness of suggestsions resulting from a baseline method"... which baseline method? I fail to understand this sentence. Item-level assessments were "partly" carried out .... what is the other part? Needs to be started here.
Evaluation: how did you test the sincerity/quality of the clickworkers? In other words, what set of test questions where used to ensure that users did not just randomly click or simply select the same/first answer for each question?
The increase in result set is sold as "a remarkable outcome" while the users at best perceived the results as equal to a simple SPARQL query in the cross-domain test case number 4. A quantitative increase can hardly be considered a remarkable result. This jeopardizes one of the main acclaimed contributions of this paper: namely, the ability to avoid zero result sets. Please either explain in detail why you consider this quantitatve increase remarkable or change the argumentation.
Minor Comments (in order of appearance:
"item feautres, for whom" => which
"are widely enough used" => "are used widely enough"
"as equally important as" => "as important as" or "equally important"
"based on the engine's ability generate" => to generate
"RDF dataset. (Definition1)." => one additional full stop
"A SKOSRec requests" => request
Encoding of Definition 4 and equation (3) is different from the rest of the section - happens several times
The caption if listing one is almost invisible - please offest with a margin
Afterward => Afterwards
either gender or not => once it is he/she then it is just he -> streamline
"in the DBpedia" => in DBpedia
RedGroupGraphPatter (p.11) => extends beyond column boundary
from p. 13 onward suddenly a different font-size is used
p. 19 latex encoding of quotation marks in 4.2. and should be in English and not German (that is on the top)
"Despite the positive user" => Despite is the wrong linker here; for this one of the two evaluations would have to be negative
"each domain mean relevance scores (mrs) were" => "...score (mrs) was" because of each
|