An approach to increase the usability of Shape Expressions editors

Tracking #: 3178-4392

Authors: 
Pablo Menendez
Jose Emilio Labra-Gayo

Responsible editor: 
Guest Editors Interactive SW 2022

Submission type: 
Full Paper
Abstract: 
There is a need to increase the number of tools that support the use of Shape Expressions language (ShEx). In this paper we present YASHE, a ShEx text editor that incorporate new features with respect to the existing ones. It takes the SPARQL text editor YASQE as a starting point and adapts and extends it to the needs of the language. We also present ShExAuthor, a graphical assistant for ShEx schema creation inspired by the WDQS of Wikidata. We have carried out a usability experiment with 16 non-expert users to compare four ShEx editing tools. The results showed no statistically significant differences in terms of time and completeness percentage (CP) between the tested tools. However, our tools obtained better results in CP and YASHE obtained the highest score in terms of precision (time to CP ratio).
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Ben De Meester submitted on 19/Aug/2022
Suggestion:
Major Revision
Review Comment:

### High-level impression

I have a general feeling that the writing of this paper did not receive enough attention.
Although I can surely sympathise with the goal and the work, the lack of argumentation and attention to detail
puts me in no position to accept this paper.
I am currently not at all convinced that this paper can reach a state to accept it,
but with enough argumentation, a broader scope, and some generalizable results (see section hereafter),
there might be a possibility.
The long-term URL is very well structured and feels very complete.

#### originality

The research is not original: an existing library was repurposed to a different language.
I hoped to read a more detailed analysis of what kind of features are relevant to a (visual/textual) editor in general
and to ShEx editing in particular (which I would consider valuable research), but no more than a comparison with existing editors was given.
Also, a better classification of the type of visual editor could be interesting: ShexAuthor feels a 'form-like' editor,
whilst more graphical 'drag-n-drop' type of editors also exist and an argumentation why this form-like editor was pursued would be interesting to me.
Sadly, the lack of argumentation and references to similar work in the introduction, and the very narrow scope of the related work section cannot convince me
that the currently presented work is original.
To be seen as research, I would expect more generalizable results.

#### significance of the results

As the evaluation shows, the results are not statistically significant.
The only significant results are rather because a textual and visual editor are compared,
and have nothing really to do with the presented work.
However, the impact of YASHE is clear and also clearly represented in the paper.
Given that, I would actually expect a resoure paper around YASHE would be more relevant than the results of this paper.
The latter would require a different kind of evaluation.

#### quality of writing.

The writing feels very hastily: many typos and ill-phrased sentences (details below).

### Detailed review

#### Abstract

- Need is not convincing. Why?
- There is no conclusion
- What do you mean by 'better results'

#### Introduction

- Context and Need are not clear nor well argumented, giving the impression that this work is not that valuable.
- Rephrase: "[One editor] was the one incorporated by YASGUI[2]. Later extracted as an independent module": The sentence would be clearer if you state "[One editor] is YASQE, and independent module extracted from YASGUI".
- Personally, I find it annoying that you don't state what YASHE stands for.
- It's not clear why exactly the listed features were incorporated, and not others.
- Typo: "On the other hand": there's not 'On the one hand' anywhere. This also points to a lack of structure in the Introduction section. I would expect and introductory paragraph overviewing two types of tools (textual and graphical), detailing their differences, and only then diving in the details.
- Rephrase: "One of the problems faced by domain experts [...] is the fact that they don't need to be accustomed to the use of computer languages": I would assume that 'the fact' is not the problem but rather "One of the problems faced by domain experts [...] is that they are not accustomed to the use of computer languages"
- There is very little argumentation or references to back up the claims in this introduction paragraph ('they may find a GUI more comfortable'-> do they?)
- "a shapes graphical assistant": I don't understand what this means, what is a shapes assistant, do you mean a user interface?
- Rephrase: "This tool integrates YASHE into its system to visualize the shapes created from the wizard": I'm confused, I thought YASHE was a textual editor, this text makes it seem as if it's a visualizer? Is it both? that's not clear. Maybe it points to ShexAuthor?
- You have twice 'In this paper, we present'. That points to a lack of structure: if you present 2 things, introduce them both together

### Related work

- It's not clear what the selection criteria were for these shex tools. Is it ad hoc? How can you know that this is a relevant list then? Why are other shape validation language-like editors taken into account, such as JSON schema editors, SHACL editors, etc?
- https://www.semantic-web-journal.net/system/files/swj2834.pdf happens to have evaluated a graphical language for shape editing (not tied to SHACL, but the tooling currently only supports SHACL import/export).
- Why is RDFShape not part of this SOTA? Yes it incorporated YASHE, but it should at least be mentioned IMO.
- Aha, the listed features in the introduction were found through SOTA review. Maybe it makes more sense to not specify those features in the introduction, as readers don't know where they came from
- It's weird that the functional comparison between YASHE and related SOTA is done within the Related work section. For me it's fine to add that in the same table, but the actual discussion should not be in the related work section IMO (people should be able to skip the SOTA section and still understand your paper IMO).
- Typos: in the table: Ghrapic and Pritner
- You state "ShEx2- Simple Online Validator 1 (to abbreviate ShEx2 from now on)", but later in the text you keep using "ShEx2-Simple Online Validator". Either abbreviate or not, please.
- "the possibility to edit and create EntitySchemas. Wikidata offers a plain text editor to perform this task. This editor only offers 2 of the 17 features defined in Table1". Multiple things wrong with this statement:
- In the table, "Save entity schema" is state, that doesn't sound the same as 'edit and create'.
- "2 of the 17", while I see 3 features offered in the table
- "defined in Table1": Table1 doesn't define anything, it provides some labels but the alignment between the discussed features and the labels is not always clear.
- Typo: "Table1": add a space.
- "This could facilitate the appearance of grammatical errors in the EntitySchemas" I don't understand this statement. How does 'beting able to edit and create' facilitate the 'appearance of grammatical errors'? Or is specific validation in place?
- "Grammatical error detection": Is this "Error Checking" in the Table? This is not clear

### Description

- Apart from being a very vague title, the ToC stated 'only' the architecture would be described. This section contains much more
- "server-side": don't you mean client-side?
- "It takes care through the defined prefixes": what prefixes are you talking about? Prefix.cc prefixes? Q and P?
- Why is the 'tooltips' feature not part of the list?
- Also, drop the list and make use of paragraphs, much more readable :)
- Rephrase: "over which we pass our mouse" Isn't 'hover' a more conventional way to describe this?
- Considering the limitations: how important/impactful are those limitations? Are these marginal cases or very impactful limitations?

### Methodology

- Where does the questionnaire come from? How are you sure you are asking the right questions?
- The precentages are not clear: what do 16%, 13%, and 71% mean? Something like '16% of all to be generated items where prefixes'?
- The discussion about ratio is overly long, the same formula is just repeated 3 times. This could be solved more elegantly
- CPun is not used in the formula at the end of section 4. Is this a typo in the formula or is CPu 'the fastest' user? Also, shouldn't the formula use 'precision{x} (precision of user X)' where the top part of the division is Tux? Also, you mention both precision and accuracy: what is it?
- It seems no qualitative responses where gathered for the test audience. This is a quite big limitation to your experiment IMO.

### Results

- Statistical analysis: you are comparing textual with visual editors. It feels very logical to me that a visual editor uses less keystrokes and more mouse clicks on average. What value does your statistical analysis in that regard give except for the naive conclusion of my previous sentence? Is this relevant to ShEx editing? Why didn't you compare textual editors with textual editors and visual editors with visual editors? --> It turns you specify this in the discussion section, but it should be clear in the results that you also compare textual editors with textual editors solely.
- It is not clear why Tukey and Sheffe tests are both used. Please argument why (because now it is hard to compare and it feels as if they are chosen opportunistically)
- "Despite this, YASHE and ShExAuthor obtain lower values": I assume you mean 'for Time consumed', but it is not stated as such.
- How is it possible that A11 and A12 don't have any results for ShexAuthor, Wikidata, and (for A11) ShEx2? Please explain.

### Conclusion and Future Work

- I am missing generalizable conclusions
- Future work is severly lacking. It feels as if the tool is perfect, and only the experiment should be repeated with a larger scale. What features were deemed most relevant? What features were lacking?

Review #2
By John Samuel (CPE) submitted on 20/Sep/2022
Suggestion:
Major Revision
Review Comment:

The article “An approach to increase the usability of Shape Expressions editors” proposes a text editor named YASHE and a graphical assistant for ShEx schema creation called ShEx-Author. Both these applications are available online. They have also presented a user-evaluation test for measuring the usability of their tools and two other existing tools. In addition to this, the authors have also submitted a complete repository with the code and data for repeatability/reproducibility purposes.

The article with 8 sections and the References section is well-structured and easy to follow. They introduce the challenges related to the creation of shape expressions (ShEx) in Section 1 (Introduction), followed by a comparison of ShEx-related tools as well as a discussion of desired features for ShEx tools in section 2 (Related Works). Section 3 presents a detailed discussion of their proposed tools. This is followed by user evaluation test in section 4, where they propose the methodology used to evaluate 4 tools, including the default one supported by Wikidata. Section 5 sums up the results, followed by a discussion on these results in section 6. YASHE has been used in other tools including ShEx-author (discussed in this article) and a brief discussion on other tools is presented in section 7. Section 8 concludes the article and presents the future course of action.

The user-evaluation section is very interesting and could be used to evaluate other tools in the future. I like the fact that the authors have talked about the limitations of their proposed tools in responding to RQ2 (scope for future improvements/discussions).

I have some suggestions to improve the article further. The article has some limitations, which I believe can be improved in a modified version. The article assumes that the readers are well aware of shape expressions (ShEx) and does not present a running example. I would suggest the authors present a brief introduction to ShEx, a running example, and some major features of ShEx. We see a discussion of their proposed tools in the limitations section later in the article, like the difficulty in handling of nested shapes.

Secondly, the authors have chosen four tools in their evaluation. It is not very clear why the two other tools were chosen. A number of tools are present in the section https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas#Tools including the recently introduced GUI editor called EntitySchema Generator. A short discussion on the limitations or even a general overview of these existing tools will help the reader to better comprehend and compare their proposed solutions.

Page 6 presents some visualization. However, because of a missing running example, it may be difficult to understand some of the proposed features in the applications.

Page 8 discusses the completeness percentage. However, how the values 16%, 13% and 71% have been obtained is not very clear.

Two tasks have been proposed for the evaluation. Though, one can find the code, data and the proposed tasks in the code repository, I would suggest the authors to add a brief description of the two tasks in the article.

In the conclusion section, they talk about improvement in ‘relationship time’. This term is used for the first time in the article in the conclusion section.

Some other improvements
1. Page 1: These data belong to very different fields of knowledge -> These data belong to *various* different fields of knowledge?
2. Page 2: for ShEx editing while others have another specific purpose. -> for ShEx editing while others have *other specific purposes*?
3. I would suggest the author to use gender-neutral terms in the article. Page 2 (best suits him, he can), Page 6 (he/she)

Review #3
Anonymous submitted on 11/Nov/2022
Suggestion:
Reject
Review Comment:

The article presents and evaluates two tools for working with ShEx (Shape Expressions), aimed at non-expert users.

Unfortunately I cannot support the publication of this submission, due to what I consider insurmountable flaws in the article's narrative, writing quality, and most importantly, the underlying study design.

The first sentence of the abstract is an unsupported claim motivating the work: "There is a need to increase the number of tools that support the use of Shape Expressions language". I disagree entirely. We do not need _more_ tools for RDF shape validation. We need fewer (but _better_) tools (and probably fewer languages, though that is a more contentious discussion) for such use cases.

The authors set up two research questions, neither of which are answered in the article. The former cannot be answered using only the metrics collected (usability is is a lot broader than here framed), and even if it were possible, no such effect is found by the authors. The latter is not even addressed in the submission.

The comparison with the state-of-the-art is based on a feature comparison, where the value of several of those compared features is dubious, e.g., "Web tool", or "Dark mode" -- arguably the former isn't a functionality so much as a delimitation, and the value of the latter is, in contrast to things like syntax highlighting/validation likely very limited.

The authors select four of the presented tools for evaluation in their study, but provide no motivation for this selection.

The Likert-statement survey contains several dubious statements w.r.t the intended usability evaluation, e.g., "The tool favors the use of ShEx", "The functionalities offered by the tool to work with Wikidata seem useful to me". The authors are encouraged to study standard usability evaluation methods, e.g., the System Usability Scale. One should note that such evaluations are notoriously difficult to generalize and that translating the results into quantitative data as in Figure 3 is not recommended. In this case, given the low number of study participants and given that the no participant used more than one tool, comparison between the four tools is really not possible. Also, note that Likert scale responses are NOT qualitative as the authors claim.

The task completeness percentage is ambiguous in its description; in particular, given the dependency between CPShapes and CPTripleConstraints, clarification would be needed.

The authors redefine the well-established notion of "Precision" as a metric relating to elapsed time. While it is their prerogative to define concepts in their article, it breaks with standard nomenclature and makes for difficult reading. They also refer to the same metric, in the same paragraph, as "accuracy".

Due to the other flaws discussed I have not carried out review of the statistical evaluations; but the way they are presented does sound suspiciously like p-hacking/data dredging (doing many statistical evaluations only to report the ones that come back with statistical results).

Language/correctness issues (a subset):

* Third paragraph: "On the other hand" -- from what? What's the first hand here?
* The user is consistently gendered as male; neutral language is preferable.
* Table 1 contains spelling errors and inconsistent capitalization.
* The authors refer to their tool/study as "ourselves". A more distanced language would be preferable.
* "Grammatical error detection" on in section 2.2 is ambiguous. Does this refer to ShEx syntax validation?
* Section 3.1.1: the tool cannot both be completely server-side and depend on a client-side JQuery library.
* Descriptions of the user interfaces frequently refer to interface component colors. This article may not necessarily be printed in color; it would be better if such references were avoided.
* In describing the study participants, the authors write that they consciously placed the study toward the end of a university course, so that participants should have a minimum of knowledge of the topic. This statement makes no sense; if that's the intended goal, the study should have been run at the start of the course in question.