Review Comment:
The authors presented two new tools for visualizing ShEx constraints. The two tools take slightly different approaches to tackle some of the issues observed in prior work. The tools used in a user study used participants following a semantic web technology module. The user study is based on a set of tasks and questionnaires, the latter of which has been developed for this study.
I believe the topic of this paper is important and, given its niche, original. Moreover, there is a potential to have a more significant impact by allowing, in the longer run, for knowledge engineers to edit ShEx files via such visualizations.
The contribution of the two tools is significant. Still, the results obtained via the study are less so, mainly due to how the authors set the experiment (types of participants, number of participants, the surveys, etc.) up. I understand that recruiting participants for a user study is challenging, so I appreciate the evaluation.
I understand that the authors have made the artifacts available on GitHub at this stage of the reviewing process, but I hope the authors will make public those files on other platforms (e.g., Zenodo). The authors have not indicated in the article that they will do that. Annoying was that the authors referred to the running example that did not seem to work in both tools. I had to use the model from the experiment to get the tools to work. The error messages did not help me in identifying the issues. URLs to the GitHub repos of both tools would have been appreciated.
Problems that need to be addressed in this paper are:
- Clarity and structure: the authors use a lot of jargon that has not been described and defined in the paper. This made the article very difficult to read.
- The SOTA is not inadequate. A few (obvious) references are mentioned, but there is no overview. Either the SOTA of "visualization on the Semantic Web" needs to be much more substantial, or the authors need to provide a list of criteria for including/excluding related work in their survey. The authors did not mention literature relevant to this paper, such as Huang, Weidong, Peter Eades, and Seok-Hee Hong. "Measuring effectiveness of graph visualizations: A cognitive load perspective." Information Visualization 8.3 (2009): 139-152. There are even instruments to measure mental workload, which may inform us about cognitive load.
- Evaluation: the authors did not avail of existing instruments for their surveys (e.g., SUS and PSSUQ). The authors need to motivate the development of their survey. I also found that their surveys had many limitations (many aspects were open to interpretation). However, the most significant issue with the evaluation was the precision metric that depends on speed.
- Language: the article needs to be thoroughly proofread. I have added some comments, but this is a problem that is easily adressed.
While reconducting the experiment is likely very difficult, The authors can significantly improve the paper by addressing the clarity, structure, and SOTA. A reanalysis of the existing data (considering time and accuracy as distinct (but maybe correlated) variables) is also necessary.
More specific questions and comments:
The authors state that only one SOTA, RDFShape, exists. As this is a bold claim, it is always prudent to add that this is "to the best of their knowledge." The authors could also briefly describe how they looked for ShEx visualization tools.
Throughout the paper, especially at the beginning, the authors introduce and use terms that have not been defined or described. This makes the paper difficult to comprehend on its own. I would advise the authors to define those ("semantic transparency" and "complexity management" are examples).
The authors' overview of SOTA on visualization is not compelling; the aim was to provide a SOTA on "visualization on the semantic web," but the authors only referred to some. There is a lot of work that has been missed. Examples include:
- a graph-based RMLEditor by Heyvaert et al. 2016;
- Ontodia, a graph-based approach combined with faceted browsing by Mouromtsev et al., 2015. Ontodia places data properties inside a rectangle and represents object properties as arcs between entities. It is, in that sense, also "UML-like."
- RelFinder by Heim et al., 2009 to explore Linked Data datasets.
- ...
The SOTA presented by the authors lacked a goal and a scope so that it could be more exhaustive and from which an analysis could better identify the problem(s).
The authors also missed some related work on the evaluation of visualization tools. Longo and Crotti Junior researched mental workload and cognitive load and have applied their research to mapping languages.
I have tried using both tools, and I sometimes get results. However, the running example in the footnotes (genewiki.shex) leads to errors. The error messages state that a base directive is missing, even if I include the directive and missing namespaces. If the error is due to me, then the tools lack documentation. The file from the experiment, available in a separate GitHub repo, does work. This leads me to question the robustness of the tools.
When using examples from the spec, some valid examples :
- lead to obscure errors in a JavaScript window;
- only part of the numerical facets are shown (MinInclusive is shown, but MaxInclusive isn't); (The tool seems to accept the input but not display the MaxInclusive constraint.)
- ...
Given the following example:
PREFIX ex:
PREFIX xsd:
ex:c xsd:integer MinInclusive 10 MaxInclusive 20
ex:Foo {
ex:a @ex:c ;
ex:b xsd:integer MinInclusive 1 MaxInclusive 5 ;
ex:d IRI {1,2}
}
This input yields a diagram with two "UML classes" with an arrow from ex:Foo to ex:C. The problem, however, is that all the interesting information about the use of the predicate ex:a is lost (i.e., the permissible values being integers between 10 and 20). The example and the grammar outlined in Table 1 lead me to believe that only a subset of ShEx is supported. I have not found motivation for only supporting a "subset" of ShEx.
3DShex was entertaining, but I found the interface not intuitive. I sometimes managed to have all arrows around a shape to light up, but I found it difficult to replicate. Again, some documentation on how to use these tools would be welcome. One gets quickly "lost," especially when dealing with large graphs. It would have been nice to have a feature that allows one to store a particular position or state.
As for the evaluation:
- Why did the authors combine time and precision? Is this based on prior studies? One can easily create a situation where a very swift person with poor results "outperforms" a slower yet more precise participant.
- In the discussion, it seems that the authors do not recognize that precision depends on success rate: "only one member of the Shumlex group achieved a perfect score both in success rate and precision." One can only have a "perfect" score for P if they not only have the perfect score for S but are also the fastest.
- The authors allude to a potential conflict of interest on line 43 of page 12. It may be that students did not want to critique the work conducted by a research group. The authors should elaborate on this in Section 6.1.4.
- The number of participants is minimal and, frankly, too little to compare the three tools. For usability studies, 5 participants per tool may suffice, though a more extensive pool is needed if a tool is only assessed once. Other types of analysis require more participants to ensure that the observations are not merely by chance.
- Minor: The authors deemed the threshold for significance to be < 0.05. It might be worthwhile to indicate that in the article. Is that a choice the authors made or is this based on similar studies?
I appreciate their analysis, but I had a problem with the user evaluation. The authors created their own survey and tied one question to each dimension. There have been well-established surveys available that the authors can use. E.g., SUS for usability and PSSUQ for satisfaction and error handling. Why did the authors not consider these instruments? Or, to put it differently, what was the motivation to build this particular survey, and what measures were undertaken to ensure the questions adequately assessed the dimensions?
Many of the questions in the survey are subjective and vague. "The experience with the tool was satisfactory," for instance, is not targeted enough. Is the experience using the tool in a browser, the experience of solving a specific task, the usability w.r.t. usability, ...? As it is open to interpretation, the input from users is difficult to compare. Existing instruments ask users several questions to better hone in on that dimension.
Questions such as "The meaning of the symbols can be inferred from their appearance.", while useful, could have been indirectly discovered using a survey. E.g., "What is the meaning of this symbol in diagram X?"
Abstract
- Did the authors mean "knowledge engineering" instead of "knowledge"?
Section 1
- What do the authors mean by "RDF brings together users from various branches of human knowledge"? This span can be interpreted differently and does not provide any added value. The authors could have just stated that RDF is applied in many application domains, requiring the skills and competencies of domain experts, knowledge engineers, and users, amongst others.
- Line 1.39: the implication is unclear as arguments or elaboration are missing. I.e., it's unclear how the authors all suddenly mention "textual programming languages."
- "sheer amounts of data" contains two plurals. Using "sheer" in this context implies a tremendous amount. I wonder whether that is intentional.
- The authors introduce some terminology that needs to be defined. Examples include symbol overload, semantic transparency, ...
- The authors refer to a "Semantic Web ecosystem." What is this ecosystem?
- The authors refer to an "aforementioned scalability issue," but never describe that issue in detail. It is vaguely implied in the paragraph before but needs to be "fleshed out" by the authors, but the authors did not mention the words "scalable" or "scalability" before. This problem is also known as "complexity management," but the authors have not provided a reference. The statement that complexity management is rarely adressed in general, as the sentence on line 2.3 suggests, also lacks a reference.
- Omit "Thus" from Line 2.6 as this sentence does not flow from the other paragraphs. The whole sentence/paragraph needs to be rephrased.
Section 2
- What does DOT stand for?
- What are the common scalability issues we can observe in Fig. 1? And is that based on literature? The authors mention more than one issue, but only one is described in the following sentence.
- The authors "jump to a conclusion" in the last sentence of the second paragraph. How is DOT a suitable testing ground for testing complexity management mechanisms?
- The authors have again introduced terminology that has not been defined or described: "complexity management mechanisms" and "cognitive efficiency."
- The authors should consider converting the SVG into PDF for the paper.
Section 3
- Something is wrong with the sentence starting on line 3.7. Are there words between two sentences missing? Also, what is "high element interactivity"?
- Section 3.1 does not provide a SOTA but some background knowledge.
- I would suggest that the authors also briefly describe the principles outlined by PoN in this section, as they are used in the paper. Note that Section 4.1 does not systematically explain or describe the dimensions. Semiotic clarity, for instance, is not described.
- What's the reference for NV3D?
Section 4
- In Section 4.1.1, do the authors have a reference for UML being "widely recognized"? I would also argue that it is recognized by people with a computer science or software engineering background.
- How was the threshold mentioned in Section 4.1.2 set? Is that set by the authors or a best practice mentioned in literature?
- In Section 4.19, the technical details of what? RDF and ShEx, or UML? Probably the former, but this must be made explicit to avoid confusion.
- Wrong use of "on the contrary."
- Why did the authors not choose to adopt some UML 2 notation such as {XOR} for the OneOf constraint? Wouldn't people familiar with UML not prefer to see as much reuse as possible?
Section 5
- "JavaScript" instead of "Javascript"
- "won't" is informal; use "will not" instead.
- The quality of figures 2 and 3 is unacceptable. Consider using vector-based images instead.
- The relations in Fig. 4 cannot be read.
Section 6
- "based on" instead of "based in" (several times)
- "hasn't" -> "has not" (informal speech)
- I appreciate that the authors dedicated a section to the limitations of their experiment. Some aspects need to be elaborated on, however. Participants were drawn from a course. Does that not entail a conflict of interest, and how was that mitigated? Participants were students; but how do people from industry familiar with semantic technologies or ShEx react to the tool?
There are mistakes in some references. E.g., the name of Ben de Meester in [16], [18] is incomplete, use of HTML entities instead of Latex ('amp;' in [25]), ...
|