Review Comment:
There is much to recommend in this paper. The concept behind the work is novel and interesting. The authors have produced an RDF dataset on plants and gardening information, accessible via a SPARQL endpoint. They have also produced an app that acts as an interface to serve and use information from this dataset. The app is evaluated using a number of methods and the authors report how they can learn from the feedback as well as the successful parts of the evaluation. I found it useful to read how the LOD was generated (section 2.2.1), though I have questions over the seeming ease with which plain text was mined to generate triples.
However, there are issues which come up in this paper, to do with decisions made in the research. I'm not sure that revisions can fix these issues, as they form part of the basis for the work.
There are assumptions made in the app design which seem flawed. For example, there is assumption that factors such as 'watering and fertilizing, are assumed to be sufficient'. Later on, the assumption is made that the app would only be used on a clear day, not a rainy day. Certainly this app would not be so useful in the UK, where I am writing from! Presumably this assumption would limit the usefulness of the app in Japan as well.
Rain conditions would have implications on (naturally-provided) watering... so it may be that the app recommends a plant which shouldn't be watered regularly, to be placed in a location which attracts a lot of rain? Section 2.3.2 mentions that a consulted 'bioscience researcher whom we consulted confirmed that the conditions listed in the previous section are largely sufficient to serve as the basis for the plant recommendation'. This is not sufficient for this reader-can this bioscience researcher be named and quoted, and can they recommend peer-reviewed sources to back this up? (NB this comments in section 2.3.2 should appear inside Section 2.3.1, alongside where the factors are presented, to justify what factors are chosen and what factors are ignored).
A lot of the factors e.g. sunlight, temperature, change over time. I cannot see any description of how such changes are taken into account. What if the app is used on a day that is unusually cold, or uncharacteristically cloudy?
I'd also like to understand more about how the data generated for this app is following principles of Linked Data (beyond the generation stage). What other datasets are linked to? (apart from correlation to DBpedia and use of DBpedia terms) To what extent is existing LOD collected in generation, or linked to from the GTC dataset? I suspect there to be no significant links out to other datasets, raising the question of why the generated data isn't supplemented by links to other datasets.
In Section 2.2.1, the authors discuss how plain text is mined to generate SPO triples from the identification of in the text. Is NLP used to identify subjects, verbs and objects? If not, how are triples identified from (e.g.) is of constructions? Also, what if a sentence makes statements that describe many different properties, or for multiple subjects/objects? How are these retrieved, if the sentence is only parsed twice?
The application seems very Japan-centric, e.g. with data generation (beyond the crawl of DBpedia) focused on 100 plants found in Japan, or use of Japan Meteorological Agency data. While this is not a criticism, the paper should be clearer on this point.
As a small point, it seems strange when papers have the related work section at the end of the paper, instead of as an introduction to the choices made in the research. On a rather larger related point, I do not see why AGROVOC's vocabulary was not extended as the base vocabulary for this work. It seems highly relevant.
It's a good idea to use some measure of reliability for the sourced data during the LOD generation, but PageRank is not an ideal measure for pages with lower impact or newer pages.
In conclusion - this paper contains an interesting idea and some practical findings of note, but there are limitations introduced by design decisions that are not sufficiently acknowledged by the authors.
Perhaps this paper should be aimed at a different journal, as the Semantic Web content doesn't seem strong enough.
|