Review Comment:
# Paper
The paper presents a between-subjects grounded theory study in which subjects (n=19) were tasked with mapping data into RDF using (a) YARRML or (b) SPARQL Anything, respectively. The authors aim to provide a "usability analysis" and recommendations for users and for the future development of each of the two approaches. To this end, they reflect qualitatively on the experiments conducted with users and derive some recommendations.
(1) originality
The goal to compare path-based and triplification approaches for mapping data into RDF from a usability perspective addresses an important gap. The community has developed a large number of RDF mapping approaches, but it has put dedicated limited attention to the usability of these approaches in practice. Ref. [24] cited in the paper already provides a more quantitatively oriented comparison, but focuses on ShExML. The presented study appears to be the first that aims to compare path-based and triplification approaches and with a qualitative focus.
An experimental usability evaluation of RDF mapping paradigms could yield important insights in the cognitive processes involved and inform appropriate design decisions. This is highly relevant to avoid a widening gap between research and practice and important if the goal is for these approaches to be adopted in practice, rather than merely being developing as research artifacts.
(2) significance of the results
Whereas I find the aim of the presented study original and relevant, I'm less convinced about the research design and the limited insights that result from it. Sections 7-8 presents a narrative account of commonly observed errors for each approach and Section 9 starts off with recommendations and guidelines for users that are informative, but do not yield insights about the usability of the respective paradigms. Section 9.2 is the most significant in terms of results and summarizes recommendations for future development of YARRRML and SPARQL Anything.
Whereas analyzing common mistakes is worthwile as part of a usability comparison, I find it somwewhat disappointing that we do not learn anything about the relative differences between the two paradigms in terms of usability (neither quantitatively nor qualitatively) and that no hypotheses or theories are developed that could explain why/which aspects make the approaches more or less "usable". Finally, the definition of "usability" is fairly narrow and the results only tell us which errors were commonly observed for each of the two, but not how the "usability" of the approaches was perceived by users and how usable (intuitive, easy to learn, efficient, effective..) they were more generally.
(3) quality of writing
The paper is very clearly written and easy to follow.
# Materials
The authors provide accompanying materials in a data file under a stable URL in an institutional repository.
(A) The materials are well organized, there is no README file in the download (would be good to add that), but the information on the landing page is helpful and the data is easily accessible.
(B) The provided resources include the tutorials and questions (task descriptions) the authors used in their experiments. The "study" files describe the tasks that the participants had to perform in detail. However, I could not find the source files to be mapped and the partial mappings provided as a template to participants (experiments.zip that was attached to the Email to participants, according to the study file, does not seen to be available) - although these files are included in the paper as (somewhat verbose) listings (partially included as screenshots), they are not in the files - this makes it at least very tedious to replicate the experiments (sorry if I missed where those are in the repository).
(C) The chosen repository (the OpenUniversity's institutional repository) appears appropriate for long-term discoverability - a DOI and citation for the materials are available.
(D) Completeness of data artifacts: the materials only contain the tutorial and question documents used; the files provided to participants are missing and no empirical data or data documenting the analysis is provided (e.g., the authors mention that the sessions were analyzed using the NVivo qualitative analysis tool and that they performec a "grounded classification of observed errors", but it is not clear how - if possible, it may help to make the artifacts generated in this process available).
# Recommendations:
- Methodology: the methodology of the study is not described in great detail - Section 3 provides an "overview of the study", but the methodology (grounded theory) is only mentioned in passing in a footnote of the last sentence of the section. Exactly how the "grounded classification of the observed errors, informed by the participants' comments" was performed remains unclear. The authors only note that the recorded sessions were analyzed using the NVivo qualitative analysis tool, but this analytic process is not described and remains opaque. I would also recommend to mention the methodology in the Abstract, as I would have expected a more traditional quantitive empirical research design for a usability analysis until p.4).
- Sections 7 and 8 report on common problems subjects experienced. The authors present a lot of (excessively) detailed anecdotal/narrative accounts of which mistkaes were made by some/many/more than half/almost all/a particular participant(s), but without any synthesis and theorizing. In my understanding, the main goal of grounded theory as a qualitative method is the generation (rather than testing) of hypotheses and the construction of a theoretical model. Rather than a list of commonly observed mistakes, I would have expected a more in-depth discussion and theorizing about the how the paradigms affect the cognitive mapping processes that led to these common errors.
- I was expecting an in-depth comparison of the two paradigms in terms of usability, but could not really find this claimed contribution (cf. conclusions, which states that "Our study compared two very different approaches") in the paper. The authors do not characterize and contrast the relative merits and evaluate design choices of the two approaches from a usability perspective. What the paper presents is two largely separate discussions of common mistakes users make and recommendations on how to avoid them for YARRRML and SPARQL Anything. I would recommend to reframe the discussion and put less emphasis on the narrative account of observed problems and instead focus more on an in-depth discussion of similarities and differences of the approaches from a usability perspective (synthesized from the study), including hypotheses and possibly a theoretical model explaining them.
- The authors seeem to define "usability" as (absence of) conceptual difficulties experienced by the subjects - which they aim to infer from the errors the subjets make. They exclude other usability aspects such as ease of learning, efficiency of use (time and effort required to create the mappings), syntactic aspects etc. I understand that mapping/query languages are not self-contained software products and cannot be evaluated in quite the same way (e.g., they can be used in different editors, environments etc. which could also contribute to/impair usability) and a focus on conceptual difficulties makes sense. I would recommend, however to clearly define what you mean by "usability" and define the contributions, scope and goal of the study more clearly early in the paper. In this context, I would also usually expect to find research questions, hypotheses etc. - I understand that grounded theory as the chosen method aims for a more open-ended, exploratory research process than more tradititional methodologies and aims to generate rather than test hypotheses, but IMO it would still be important to know what the initial/high-level questions and the scope of the study are from the begining.
- "Usability" is not an absolute concept, but also depends on the context - particularly, e.g., the complexity of the mapping task and the experience level of the user. A powerful tool that has a steep learning curve may not be very "usable" to the uninitiated user, but very efficient and more capable than a more "usable" tool aimed at novice users. This is not considered. According to Section 3, the authors did ask participipants about their previous experience with relevant technologies. The paper reports this as demographic information, but I did not find any reflections on experience level in the discussion of results. Given that this information was collected, it would be interesting to know if the issues discussed in Sections 7 and 8 were "beginner mistakes" that are easily overcome once subjects learn the concepts or grounded in more fundamental issues in the design of the approaches.
- The missing elements in the solutions were chosen to "test participants' conceptual understanding of the mapping process" - I would have expected that this is picked up in the discussion of results somewhere and that we learn something about participants conceptual understanding (and the differences across approaches)..
- I would find an overview figure illustrating the experimental design (questions, conditions, tasks etc.) useful.
- The headings of Sections 7 and 8 are misleading. They do not actually report on "the user experience", as subjects were apparently neither asked about their experience, nor were many aspects that contribute to the user experience of a tool considered. Apparently, the users' experience was inferred from the observed mistakes subjects make. I would find it interesting to actually include a discussion on the "user experience" and also report on that if the respective data was collected; otherwise, however, the Sections should be named more appropriately ("common mistakes"?)
- It is unclear to me whether the large number of listings provide added value to the reader. They do make the paper self-contained, but 2/3 of the paper are largely dedicated to describing the tasks that subjects had to perform before we get to the core of the paper - which seems excessive, given that the analytic process is not described at all. I would recommend to reconsider this structure and possibly move some of the verbose scripts to an appendix or a separate document describing the research design in full detail. This should free up space to focus on a description of the research process, a more in-depth discussion, theorizing and synthesis of general insights in the main part of the paper.
- I would have expected a discussion of limitations. It seems that several aspects of the study design mentioned in Section 3 limit the insights that can be drawn and should be clearly stated as limitations (e.g., representativeness of subjects drawn only from Open University and from W3C groups, not randomized - i.e., subjects were free to choose the condition, limitations in experience that may explain difference in perceived usability and may hinder comparisons, the "great deal of interaction" of study designers with subjects during the experiment, which may limit the insights that can be drawn from the study, small sample size etc.)? I understand that internal validity is not as much a concern in grounded theory studies, but it is IMO still important to note these limitations.
- The code listings are pasted as images, partially in low resolution (e.g., Fig. 5). This makes them difficult to read (particulary in print). I'd recommend to add them as proper listings (the Latex listings package may come in handy). This would also solve the problem that text size, syntax highlighting etc. are currently inconsistent across the listings.
## Minor comments:
- some misplaced/missing commas (e.g., Introduction: "convert this to RDF for the creation of linked data, has stimulated")
|