Review Comment:
This paper presents the tool InteractOA that provides a visualization of information about small regulatory RNAs, their interaction with genes and the paper that provides the reference of the interaction. This information is represented using Wikidata resources. The source code is available on GItHub and Zenodo. The idea behind this work could be a good step towards improving the life sciences data available in the web, but I have some major concerns that need to be address before the publication of this work.
First of all, regarding the writing, motivation and organization of the paper, there is plenty of room for improvement. The paper would improve largely by organizing it differently rather than just plainly Introduction, Results and Discussion. The Introduction is too long, the motivation gets lost throughout this immense section. There are a lots of useful references, descriptions and examples that would fit best in a “Related Work” section, which the paper is missing. Adding similar approaches would also be nice, aren’t there any similar works previously done that are worth mentioning?
Regarding the Results section, are they indeed results? There is no evaluation whatsoever, and what it is described is the tool. In addition, how it is described is messy and by the time the section ends it is not clear how the data has been transformed or what is exactly the input or output. Section 2.1 tries to describe the data modelling, but mixes that with the data transformation, which are two completely different things. The input data is not described at all (what exactly are GFF files and how are they? Only experts in the domain know this kind of format), neither its provenance (only mentions in last line of page 4 “numerous research articles, RegulonDB [38], and other resources”) –which other resources? How are these very different resources integrated? This section needs a reorganization, and could use a general workflow diagram to help the reader understand each step.
There are also plenty of statements without evidence to sustain them, for instance “and shortens the time needed to consult previous research” or “Having demonstrated the general usefulness of this approach” –on what evidence are these statements based on? Again, there is no evaluation, testing or validation presented, then how are authors sure this is true? There is also no mention of the impact of the tool according to the criteria of the “Reports on tools and systems” kind of papers – “demonstrable uptake of your work by the research community, industry, governments, or the general public”.
A minor final remark about writing, the references to the figures are too far from the actual position of the figure, and Figure 2 is not even referenced. The links to the tools and GitHub repositories should appear as footnotes the moment they are mentioned, there is a part in the text that mentions 2 different tools that aren’t even distinguished by name, and their corresponding links appear at the end of the section.
Regarding the tool, the graph visualization and the table with the reference work and look nice. The graph visualization contains not so many interactions so it’s feasible to see information, how are authors planning to manage this when there are more data inside? There are millions of RNA-Gene interactions. Another thing that concerns me is that, despite the emphasis in the paper regarding representing the information according to Wikidata, in this visualization I see nothing related to it, I’m referring specifically to the Wikidata identifiers. What is the point of using Wikidata if the output data provided in the end doesn’t reflect it? In which sense is then taking advantage of it and how is improving the original data?
When clicking in the blue concepts, no information appears; and there are relationships like “instance of 2” – what is 2? The graph visualization doesn’t link to the information regarding the provenance where it was extracted from, and thus, all the information stored in the table shown when clicking in “View citations”. In this table, again, I see little reference on the modelling according to Wikidata that authors have performed. And how has this information being extracted, modelled and transformed? Moreover, the rest of the visualizations graphics add no information, it is not clear or not well represented (bar plots, etc).
Additionally, there are two things that I think would be useful for the webpage, (1) providing the transformed data modelled according to Wikidata available for download (as the Wikidata dumps) and (2) a visualization of the transformed data presented as wikidata pages, which is also convenient for user explorations.
The original idea of the paper is fine, but IMO the presented work is not mature enough to be published, it needs major rework both on the system itself and the organization and writing of the paper.
|