Review Comment:
The authors reported on ABSTAT profiles to facilitate SPARQL query formulation. The ABSTAT approach (and system) is used to help users understand a KG's structure. In their experimental setup, they compared users' performance that used ABSTAT with a group that used Web Protégé (their baseline). Somewhat unsurprising, the authors' data indicate that their approach does help users in formulating queries.
In terms of originality and relevance, the topic of this article is timely and relevant. Furthermore, the article reports on a rather extensive experiment of a system that was previously reported in other venues.
The results presented by the authors are important, but the authors could improve on two points:
First, it is unfortunate that the authors did not set up an experiment comparing ABSTAT, a baseline, and Loupe. The authors stated that Loupe was the only system that was comparable and available—because of this, comparing both would have likely yielded even more interesting data. The authors did not explain why they did not compare their system to Loupe for the experiment, and this limitation was not mentioned in the text. I understand it is challenging to redo an experiment, and I would not ask the authors to do this. The limitation needs to be adressed, however.
Secondly, the authors should have been more transparent about the experiment. For instance, the authors should have shared the questionnaires and instructions to facilitate the reproduction of the experiment. Ironically, the authors stated that sharing these is one of the contributions of the article, but they are not provided. I tried looking up the survey via the mailing list, but the survey is closed.
It would even allow others to conduct this experiment using different systems (and, in part, addressing the first point). Sharing more details about the experiment would have allowed me, as a reader, to understand some of the figures better.
For instance,
Fig. 8 contains rounding errors (very likely), or the question was not mandatory. I'm not convinced whether there's a difference between a Google Search for a DBpedia resource and the consultation of the Web page describing that resource. I presume people could have typed "dbpedia mountain" to look for that page. Does that count as a Google search?
The article implies that users were able to submit a query multiple times. Was that part of the survey, or was another form or tool used to assess the queries? Did users get feedback when submitting a query (semantics, syntax, correct/not correct)? Section 4.2 states that users needed to report how many times they attempted the query.
Did the instructions explicitly state that users were allowed to use other tools? Why not ask participants to avoid the use of other tools? The authors risk(ed) gathering skewed results, e.g., one participant using DBpedia's interface vs. one using YASGUI with autocompletion.
I am convinced the authors could improve the article if they provided more detail on the experiment and clarified some of their analyses. The latter will be mentioned in the list below.
Another issue with the article is the bias that the authors introduced. The authors decided not to provide a video for users that wanted to use Protégé, assuming that knowledge of ontologies and SPARQL was enough to know Protégé. One could say the same thing for ABSTAT. I would hope that the survey also enquired about prior knowledge of (Web) Protégé and ABSTAT, but the questionnaire is not available.
Overall, the article is well-written. There are quite a few typos, most of which are recurring. E.g.,
"In this section we..." instead of "In this section, we...".
Some sentences need to be rephrased.
The authors did not correctly conjugate some verbs.
Overuse of "moreover."
...
The text is dense. I would encourage the use of enumerations where possible. For instance, it allows the reader to go back and forth between points and images more easily.
I found that the authors should have mentioned the limitations earlier. They are now mentioned after the related work. These should have been discussed after 4.3, either as a separate section or as a section 4.4. There is also a difference between the lessons learned from this study, which can be part of the conclusions, and the lessons learned from the limitations, which can be part of future work.
Questions:
How does ABSTAT deal with the following corner case: minimal type patterns where the most specific class can be one of several due to multiple class hierarchies. A paragraph on this could make the notion of "most specific" more concrete and the paper self-contained.
How do the authors define "good abstraction" (page 4 line 27)? Where have the authors reported the evidence?
It seems that the text describing Fig. 3 and the image do not correspond. I presume the list is covering dbo:Film and the numbers mentioned on page 5 are not to be found in Fig 3.
In Section 4.1, the authors analyzed the responses to the participants' self-assessment for Q3. The authors mention SPARQL, data modeling, DBpedia datasets, and ontologies, but Fig. 6 only provides data w.r.t. SPARQL. Why were the others left out? It also seems that the authors did not provide an answer to Q3.
Q1 and Q2 are yes/no questions. It would be interesting to focus on the how and why, however.
At the end of Section 4.2, the authors state that using ABSTAT is easy. The authors did not report on a quantitative usability analysis (SUS, PSSUQ, ...). There is also a difference between the comprehension of the profiles (which participants said could be made easier to understand) and the tool's ease of use. I do not believe that the authors adequately assessed the ease of use. I suggest rephrasing this statement. At best, the data indicates that ABSTAT is easy to use (or easier to use w.r.t. Web Protégé for the tasks).
Why did the authors choose people choosing the tool they wanted to use instead of randomly assigning people to either group? The latter is a best practice.
More detailed comments:
Rephrase the sentence starting on line 41 (2nd column).
The authors consider RDFS to be the "simplest" ontology language. This is rather subjective as others deem schema.org semantics more intuitive (I am referring to the semantics of multiple domain declarations, for instance).
Some footnotes refer to the same URL (e.g., 12 and 13).
Inconsistent use of "she/he", "s/he", and "the user."
The scale of Fig. 1 seems off.
The authors used informal speech, such as "very" in multiple places. E.g., "very big", "very few", and "very positive."
I am not sure that the use of "coherent" (page 9) is correct in this context.
While minor, it could be beneficial to add a totals column in Table 5 and make the link with table 2.
Rounding errors in Fig. 8?
Provide figures instead of "a lot of users" or "very few"
|