Review Comment:
# Summary and general comments
The authors present an approach to extend MVC-based web applications and expose their internals as LOD. Specifically, they provide two library implementations - EasyData/Rails and EasyData/Django - that can be used to (i) expose the application data model, (ii) expose the controller methods via a LOD API (described with SAREST). By deeply integrating the Linked Data-related code into existing applications rather than relying on black-box scraping or wrapping techniques, the approach aims to expose the internal data model and functions directly as Linked Data.
The paper contextualizes the approach well with a good overview of existing approaches towards exposing Linked Data in existing web applications and discusses some of the pros and cons.
Overall, the approach is not groundbreaking as individual developers have long been extending their application's internal code to expose data and API descriptions as LD (using, among others, the very same standards and techniques as those used by "EasyData"). Nevertheless, the two implementations could make it easier for developers to expose applications' internals with less custom code and a paper on the topic can make a valuable contribution as a "tools and systems report".
An obvious disadvantage of the approach is that this tight coupling eliminates separation of conerns in a separate LD layer, which negatively impacts modularity, reusability, and maintainability. The authors acknowledge these limitations in their discussion. IMO, it will be difficult to convince general web developers to openly expose their application's internal data model and functional structure as LD. Nevertheless, the area on the architectural spectrum (i.e., between converting the source data to LD and scraping the views - none of which are particularly satisfying solutions) positions the paper in a key area where progress is necessary.
A weak point is that the paper does not highlight the benefits and consequences of the proposed "invasive" approach more thoroughly. The main argument put forth is that the "invasive" approach provides hooks to implement security and access control. While this argument is clearly important, maybe this could be embedded in a broader vision - i.e., by outlining what would become possible if web applications adopted this approach.. Also, a good motivating illustrative example (focusing not just on example code, but also providing a motivation) would be useful.
# Quality, importance, and impact
The provided evidence of impact is somewhat limited. As is, the paper seems more like a proof-of-concept with a few code examples rather than a report on a mature tool that is in actual use (redmine is used in the examples, but it was not clear to me if a complete LD extension of redmine based on the approach exists and is in active use). In terms of validation, IMO an implementation on a larger scale (and ideally deployment of the approach in production use) would instill more confidence in the quality and impact than the (still useful) comparison of software quality metrics with SonarQube included in the paper. Also, the GitHub pages of the two implementations do not seem particularly active (last commits 3 and 5 years ago, respectively). Overall, importance and impact are a bit unclear.
# Clarity, illustration, and readability
Capabilities and limitations of the tool are discussed. My main concern, however, is clarity, illustration, and readability of the paper; this is partly due to general problems w.r.t. grammar and style, but also partly due to terminological issues throughout the paper. Some of the more important ones are:
*) "reverse engineering": I'm not sure if extending the source code of an application should be considered "reverse engineering"; IMO, "reverse engineering" typically refers to a situation where the source code is not available. Also, the authors seem to use the terms "reverse engineering" and "reengineering" interchangably.. I suppose "reengineering" is the intended meaning.
*) I find the term "legacy web applications" as used in the paper somewhat vague. The authors do not provide a definition, but the phrase "legacy web applications that do not expose their data using the common formats and protocols of the Web of data" suggests that they consider any web application that does not provide Linked Data "legacy", which is a view that is probably not shared in the wider Web development community, where applications built on archaic web technology stacks or standards might be considered "legacy".
Apart from such terminological issues, the general wording should also be more precise and concise. I suggest to remove unnecessary filler words that do not contribute to the meaning (see the detailed comments below) and a thorough revision and rephrasing, where appropriate (also see detailed comments below).
# Overall Assessment
Overall, I recommend a major revision of the paper that should strive to more clearly highlight the vision and benefits of the proposed approach, provide evidence of the proposed tool's impact (or at least of its applicability beyond simple examples), and significantly improve clarity and readability.
# Detailed comments
## Title, Abstract, Header
* Title: "expose data and logic for the web of data" → "on the web of data"?
* "Undefined 0 (0) 1" in the header should be replaced (at least journal name)
* Word "Abstract" formatted according to journal template?
## Introduction
* "The Web of data is largely concerned with procuring web applications that publicly display their information by means of metadata and explicit semantics, such that entities from heterogeneous web information systems can be interlinked [17]."
→ I cannot find this statement anywhere in the cited paper. The wording is a bit odd (the web of data "is converned with" "procuring"?), so it's difficult to discern the intended meaning, but I wouldn't say that the web of data is about "procuring web applications".
* "Best practices of Linked Open Data (LOD) software engineering ... [18]"
→ the cited paper is not about software engineering, but about Linked Data publishing.
* I also don't think that it's fair to say that the Linked Data principles "provide a guide to reengineer legacy web apps" since they are not about reengineering, but about publishing data on the web.
* "providing an existing web application with LOD capabilities" - wording: do you mean "extending"?
par 2:
* "web scrappers" → "web scrapers"
* "diverse middleware" → remove "diverse"?
* "Lots of LD techniques" → "A lot of"
par 3:
* "not insignificant" → "significant"
* "diverse software quality features" → remove "diverse"?
* "particularly concerning with" → remove "with"
p.2, par 2:
* "The research methodology followed for this aim" → "to achieve this aim"?
* "articulated" → do you mean "described"?
* "based on the discernible software architecture of most web applications" → not sure what you mean with "discernible" here.
* "followed by a consolidated discussion" → remove "consolidated"
## Section 2
p. 2, par 1:
* "The architecture of LD applications are discussed" → "is discussed"
* "Alternative patterns have a very low query execution.." → something seems to be missing, "low" what, performance? Also, it would be useful to explicitly name these "alternative patterns".
* "is made up of" (twice in a sentence)→ "consists of"
## 2.1. Reengineering strategies
* "probably because the application" → "typically"
* "on one hand" → "on the one hand"
* A figure (e.g., table) that relates application characteristics (availability of source code, supply of a built-in information exposure facility, disclosure of DB contents) to applicable LDAA patterns would be useful.
p.4:
par 1:
* "The MVC architectural pattern is the most frequent" → "most frequently used"
* "web scrapping" → "web scraping"
item 3:
* "normally designed with" → "typically designed with"
# 3 The EasyData LOD extension strategy
p.4:
* "EasyData is the name of a new approach to LOD extension in order to reengineer legacy MVC-based web apps." - redundant? Introduce EasyData in the previous sentence and remove?
p.5:
* "Since access control to the generated LOD resources should not be granted for everyone"
→ replace "Since" with "If"?
→ do you mean "access ..should not be granted" rather than "access control .. should not be granted"?
* Figure 2 caption: remove "Applicable"?
* remove "therewith the"
* "The EasyData procedure is practicable as long as the application source code is available." → "The EasyData strategy is applicable if the application source code is available."
* "This is granted in open source" → "This is the case with open source software"
* "scrapping" → "scraping"
# 4 Evaluating..
par 1:
* "Two different prototypes" - remove "different"?
* "Each prototype serves the EasyData procedure to be applied" - rephrase wording?
* "such as Ruby-on-Rails and Django" → remove "such as" if you have implemented the approach for exactly these two frameworks.
par 2:
* "How the revealing,... is explained next." → "Next, we explain .."
* Figure 3 caption: not sure if "revealing" the application data model is the correct verb to use here, maybe "mapping", "annotating".. would be more appropriate?
par 3:
* "After revealing and mapping": Again, what is the difference between "revealing" and "mapping"?
p.7 Step 4
* "access control grant can be configured" → remove "grant"
* "generated the previous steps" → "generated in the previous steps"
## 4.1 Comparison..
* "The tools have been selected as long as they fulfill.." → "based on a number of conditions"
## Discussion
p. 10:
* "Therefore external browsers might have not the appropriate authorization privileges.." → "Therefore, external browsers may not have the required access priviliges"
* "an unified security access control layer" → "a unified security access control layer"
|
Comments
Informal feedback from a person who declined to review:
Informal feedback from a person who declined to review:
I have read it and I think the content is generally ok and the research solid, but it requires a significant re-organization. A different Title would be a good start. The abstract and introduction should also focus more on the authors work and what it is.
e.g.
EasyData: re-engineering existing apps for the semantic web
In the abstract I am missing EasyData does ... is compared to ...
In the comparison to other tools with the SonarQube static analysis it fails due to the language differences between the tools. These are not metrics are not comparable. I think that the explanation about security issues is incorrect here as well.
All in all as a reader I would expect more explanations about the EasyData approach and how it actually works. Which kind of MVC apps work? which languages/toolings? just Djanga and Rails?