PowerAqua: supporting users in querying and exploring the Semantic Web content

Paper Title: 
PowerAqua: supporting users in querying and exploring the Semantic Web content
Authors: 
Vanessa Lopez, Miriam Fernández, Enrico Motta, Nico Stieler
Abstract: 
With the continued growth of online semantic information, the processes of searching and managing this massive scale and heterogeneous content have become increasingly challenging. In this work, we present PowerAqua, an ontology basedcQuestion Answering system that is able to answer queries by locating and integrating information, which can be massively distributed across heterogeneous semantic resources. We provide a complete overview of the system including: the research challenges that it addresses, its architecture, the evaluations that have been conducted to test it and a deep discussion showing how PowerAqua effectively supports users in querying and exploring the Semantic Web content.
Full PDF Version: 
Submission type: 
Tool/System Report
Responsible editor: 
Decision/Status: 
Accept
Reviews: 

This is the final version, which has been accepted for publication. Reviews on previous versions are listed below.

Below the reviews for the first revision.

Solicited review by Danica Damljanovic:

Major comments (related to the misrelation of the main contribution and the evaluation)
which have been raised during my first review are addressed. The
link given in the paper points to the demo which now works and the authors added
additional sections into evaluation which demonstrate the suitability of PowerAqua
to work in the open scenario on the Web through Watson, and the user study.
Most of the other comments
are also addressed.

My main comment in this review is about the presentation of the evaluation section. It should be more explicit and concrete (see some details below). It would
be worth adding (e.g. as the first sentence of each of the subsections)
the goal of the evaluation, the measures used (definition if neccessary). These currently
exist in the text most of the time (apart from definition), but it would be easier for the reader to follow
if it is clear from the beginning: what is evaluated and using which measures
(and define the measures unless they are obvious which usually is not the case, and
especially in cases when there is a mixed opinion in the literature or if the measures are adapted from some other field such as precision and recall).

The paper still needs to be proof-read for many small typos
and the English toroughly checked. Also, some (especially new added) sections
could be slightly rephrased
at places for smoother readability, and the authors should be as explicit as
possible especially in the evaluation section. Some pointers are given below in the Detailed review.

---------------
Detailed review
---------------

Section 2:
'However, ontology-based QA systems are able to handle a much more expressive
and structured search space and have proven to be ontology-independant, where, as opposed
to a DB, the information is highly interconnected by the use of explicit relations.'
consider rephrasing: the first part of the sentence seems to be disconnected from the second.
Also, it is implicit in this statement that NLIDBs are not domain-independant whereas those
NLIDBs developed in 80s claimed to be easily portable so maybe leave that out ("proven to be ontology-independant")
in this comparison and highlight the interoperability and reasoning which are the main advantages.

At a second level>>At the second level

footnote 11: use the same formatting for URLs

Section 3:

let's>> let us (avoid abbreviations, there are mentions of "don't" and similar in the paper, these should be replaced with extended versions)

the user query >> the user's query

etc>>etc. (also throughout the paper)

Figure 2 caption: 'every item on the Onto-Triple and the answers are an entry point...'>>rephrase

Figure 1 is mentioned after Figure 2 in text, this should be swapped. Also, it
would be better for a reader if figures would be closer to the text where they are mentioned,
not at the end of the paper as is the case at the moment.

just before Section 4.1:
Second,>>Secondly,

Section 4.1.
First sentence in the second paragraph requires rephrasing (splitting into two?).

footnotes: all should use the same formatting for URLs

The paper should be thoroughly proof read to correct minor typos e.g. missing '.' at the
end of the sentence in footnote 21.

"Additionally to time performance optimisation">>"In addition to time performance optimisation"

as a fifth aspect>>as the fifth aspect

footnote 22: 'a subset of DBPedia which in the 2008 version it was just 1GB'> rephrase e.g.
"A subset of DBPedia which in the 2008 version was only 1GB large."

SUS score was clasiffied as good:
include the exact score on the scale from 0 to 100 as 'good' does not give enough details for SUS.

New Section 5.5.2. Using the Watson SW gateway

In the second paragraph the authors mention that the recall cannot be measured. While
this sounds reasonable the reason for this should be mentioned (size?).
This is followed by precision, precision@1 and recall@1. I have a slight problem here as
if recall cannot be measured how can you measure recall@1? The explainations in the
brackets are a bit problematic:
"precision: which results are valid from all set of results">> suggested rephrase: the number of correct
results devided by the retrieved results.
"precision@1:">>similar rephrase; 'correct' is more explicit than 'valid' in the definition, and also 'all the results ranked at the first place'>
remove 'all'
recall@1> I do not understand this definition, consider rephrase;
Recall should be a number of correctly retrieved (1) results
devided by all those correct ones that should have been retrieved (2).
To my understanding, the problem for calculating recall is that you cannot measure 2, but if you want to calculate recall@1 wouldn't that again require 2 in your equation?
The definitions of the evaluation measures should
be stated at the beginning. See [1] and [2]
for the discussion on adapting precision and recall (from IR community) and definitions.

I might have already said this, but I will repeat as I think it is very important: the chapter uses precision, recall, accuracy, and other measures.
If
possible the authors should use the same measures. For example, why not use precision instead of accuracy in section 5.2. where you
talk about 'successfully answered questions'. Does it mean that 'successfully answered questions' are also correctly answered questions? If yes,
than why not use precision here. The language throughout the paper should be as explicit as possible. A reader could easily interpret
'successfully answered' into 'successfully parsed'
which is not the same as the 'correctly answered' question.

the very last sentence of the second paragraph:
with Watson than in previous experiments>>Watson in comparison to the previous experiments

3rd paragraph, the part starting with 'This functionality allows PowerAqua...' until the end of
this paragraph should be rephrased. Especially the part in brackets. I could not understand it.

Section 5.6:
"with higher SUS score.">>> higher than what? Also, other evaluated systems should
be mentioned and their SUS score.

authors mention that the precision and recall do not capture everything due to the complexity
of queries used in the evaluation. How many questions were asked?
You could add the percentage of queries that could have been
answered by PowerAqua. Something like: xx% of questions that users asked were supported by PA query language
while the remaining XX% were not (disjunctin, comparative, etc.).

Were the questions predefined
for each user or they have been given a task explaination where they had to
formulate the questions themselves? In the latter case you are also testing the
difficulty of the supported language in which case precision and recall of @.5-0.7 does not sound bad.

the 'evaluation context' and 'evaluation setup' seems to be missed from the 5.5 and 5.6
as subtitles.

Conclusion:
'confidence was the best'>>confidence algorithm was the best..?

references:
[1] L. Tang and Raymond J. Mooney. Using Multiple Clause Constructors in
Inductive Logic Programming for Semantic Parsing. In In Proceedings
of the 12th European Conference on Machine Learning, pages 466–477,
Freiburg, Germany, 2001.

[2] Philipp Cimiano, Peter Haase, and J¨org Heizmann. Porting Natural Lan-
guage Interfaces Between Domains: an Experimental User Study with
the ORAKEL System. In IUI '07: Proceedings of the 12th international
conference on Intelligent user interfaces, pages 180—189, New York, NY,
USA, 2007. ACM. ISBN 1-59593-481-2. doi: http://doi.acm.org/10.1145/
1216295.1216330.

Solicited review by Jorge Gracia:

This paper describes consolidated work, summarizing the results of a consistent and well-founded research line, highly relevant for the community. In my opinion the authors have addressed the review comments well. The article is clearer and the motivation and experimental sections are stronger now. The paper illustrates well the research problems and proposed solutions, and convey to the reader both the capabilities and the limitations of the system. Thus, I recommend its publication.

This is a revised submission. The reviews below are for the original submission.

Solicited review by Haofen Wang:

This paper describes an ontology-based Question Answering system which can process natural language queries and integrate massive heterogeneous information sources. The idea of this paper is well motivated. The paper is well presented, and experiments confirm the claims.

Solicited review by Danica Damljanovic:

The paper discusses PowerAqua - an ontology-based question answering system
able to answer Natural Language questions by locating and integrating information "which can be massively distributed across heterogeneous semantic resources".

While I really like the problem addressed in the paper and also appreciate the work on the algorithms developed in order to solve the problem (especially as the authors have spent many years doing the research on this topic), I find the Evaluation section not being aligned with the main contribution quoted above - the paper should be revised to do the evaluation with querying the LOD on the fly and with the users in order to support the current contribution statement. For example, claims that 'PowerAqua effectively supports users in querying and exploring the Semantic Web content'should be accompanied with the user-centric evaluation. The evaluation does make links to the challenges listed in Section 3,
but the evaluation of the main contribution seems to be missed out and here I
refer to the quoted part of the contribution mentioned above. More details in the detailed review below.

An alternative solution would be to change the contribution statement (and perhaps the title) and
align with what is demonstrated by the evaluation. Although this seems as an easier solution, I
mark the paper as 'reject and resubmit after major revision' as it seems to be that
even without repeating the evalution experiments, the paper would need to be changed significantly (more in the detailed review below).

The paper points to the link: http://technologies.kmi.open.ac.uk/poweraqua/evaluation.html
from where it is possible to access the demo(s). However, all demo links seem broken. Therefore, although this publication is listed under Systems/Tools my review is based solely on
what is described in the paper (and my overall score would be strongly influenced by the possibility to try out the demo due to the category of this paper which lists that as a requirement).

With regards to the clarity of presentation, the paper would be much smoother
for a reader if the sentences would be shorter. Currently, many
sentences are a paragraph long. An example is the first sentence in Section 1. More examples
are listed in the detailed review below.

While I understand that this review sounds pretty harsh, I believe it contains useful information for the authors in order to improve the paper.

Detailed review:
(note that sometimes PA is used instead of PowerAqua throughout this review)

Section 1:

Paragraph 2:
"Although this data growth opens new opportuni-
ties for Semantic Web (SW) applications, diminish-
ing the well-known problems of knowledge sparse-
ness and incompleteness, the diversity and massive
volume currently reached by the publicly available
semantic information introduces a new research
question: how can we support end users in querying
and exploring this novel, massive and heterogeneous,
structured information space?"

- while I like the research question, I have to disagree that 'this data growth diminishes the sparseness and incompleteness'...one of the main
challenges brought by this data growth is indeed incompleteness and noisyness...and 'incompleteness' is mentioned under Challenges
addressed by PA in Section 3 (challenge 6).

Paragraph 3/4:
There is a mention of current approaches not balancing usability and
expressivity at the level of query and thus systems like Swoogle, Watson and
Sindice are mentioned. Authors also point out (correctly) that these systems
accept keyword-based queries, but are not able to handle more complex ones. However,
PowerAqua and semantic search engines have a completely different goal and therefore
not being able to handle complex queries is not their disadvantage neither I would agree that they are not balancing
usability and expressiveness - search engines such as Google can probably be considered both
usable and expressive (as usability is measured in the context of whether the
users can finish the tasks they need to successfully and quickly).
While PowerAqua has a goal to locate *the answer* to a query,
semantic search engines are finding *documents* (in this case these are Semantic Web Documents/ontologies/RDF graphs)
which might contain an answer to a query. This two approaches have completely different purposes,
and thus the usability and expressivity should not be compared.

Paragraph 5:
PowerSet, Wolfram Alpha and TrueKnowledge are mentioned under 'approaches which narrow
their search scope to a fraction of the current SW information space' and then under a
subgroup: 'approaches that build and query their own comprehensive factual Knowledge Bases (KBs)';
from these two we can derive that 'these owned factual KBs are part of the current SW information space'..which does not sound correct. Can anybody
apart from the mentioned systems access these comprehensive KBs? Probably not.
My suggestion is that these tree systems should be mentioned as a separate group
not a subgroup of the systems which narrow their scope to a fraction of SW space...
Now reading further, this is nicely distinguished in the Related work section so my suggestion would be to remove this paragraph from Intoduction.

Also, the systems which narrow their scope to a fraction of SW space...what would be
the difference btw a system that has a set of predefined URIs as input (or a repository URL where all ontologies are loaded) and
PowerAqua?

"However,
these approaches do only perform a shallow exploita-
tion of the semantic information, which derives on
the need of higher levels of user interaction during
the search process [6] or the retrieval of low accurate
answers that the users have to filter later on."
- what does this mean? Consider for rephrasing. It seems to me that some of the mentioned systems do indeed query the Web of Data on the
fly (so similar to approach proposed by PA), maybe the supported language is then a difference?

Paragraph 7:
In contrast to other similar approaches PowerAqua balances usability and expressivity...
see the comment above related to Paragraph 3/4

Section 2:

This section repeats content from the Introduction. Maybe remove paragraphs 3-6 from Intro
as it seems that Related work gives more insight into similar systems.

Paragraph 3 mentions close domain>>closed domain approaches whose scope is limited
to one or a set of apriori selected domains at a time. Further on it mentions
systems such as ORAKEL, QuestIO, PANTO claiming that 'these systems are not suitable
to open domain scenarios where a massive, heterogenous set of semantic information
should be covered'. This statement is a bit problematic: none of the cited papers
mention this. It is correct that these systems have been evaluated in narrow domains
but this does not imply that none of the systems are suitable to be used in the
open-domain scenarios. For example, FREyA [1] which is a successor of QuestIO has been used
for the experiments with LOD.

"they to not generally provide knowledge fusion and ranking mechanisms to improve the
accuracy ">> consider rephrase

Paragraph 8, which is also a contribution of the paper:
"Aiming to go one step beyond the state of the art,
PowerAqua answers queries by locating and integrat-
ing information, which can be massively distributed
across heterogeneous semantic resources. Unlike the
previously presented close domain applications and
approaches that rely on their own semantic resources,
PowerAqua does not impose any pre-selection of pre-
construction of semantic knowledge, but rather ex-
plores the increasing number of multiple, heteroge-
neous sources currently available on the Web."
- the evaluation mentions using Virtuoso and Sesame for this purpose as well as
Watson. While using Watson seems as doing 'querying on the fly' using Sesame
and Virtuoso as a repositories makes PA very similar to other approaches (as
these repositories need to be deployed somewhere and then the ontologies need
to be loaded into them). The difference however is the size of the ontologies
used for the evaluation so maybe that is something to point out. Also, maybe
it is feasible to do the evaluation with Watson only in order to prove that
this statement is indeed correct.

Section 3:
list of 6 challenges are given. Challenge 5 is Heterogeneity. It is
not clear how heterogeneity arises 'from the use of different ontologies, but also within the same ontology'.
What kind of heterogeneity is this? An example might be useful here.

Section 4:
typo:
>>
Step 3: while it is mentioned that linguistic triples are generated from a query, it
might be worth pointing out here that these are *NOT* RDF triples.

Point 4 mentions that 'the Semantic Validation Component builds on techniques
developed in the Word Sense Disambiguation community'...which techniques? it would
be worth pointing this out here or later in the section dedicated to description of
this component.

Point 5:
"Several filtering heuristics [5] are integrated
in this component to limit set of candidates for
computational expensive queries."
- it would be nice to list at least some generic heuristic principles here instead of
just pointing to the other paper. Now reading further, it seems that some of these
are listed in the dedicated section. So, how about mentioning here 'more details in Section xx' and
than in Section xx reference to the paper.

The beginning of the Section 4 lists 6 steps. The first two steps are of a
different nature from the last 4 which are concerned with the query processing. Authors
might want to consider removing them from this list. Also, it is stated that:
"Each of the aforementioned components can be
considered a research contribution on its own.", why is Step 2 (query interface) a research contribution?

Section 4.1:
This section should be considered for major revision.
To my understanding, PA uses Watson to access ontologies published on the Web.
In addition, it is able to connect to the repositories (Virtuoso or Sesame) which have been
deployed somewhere, and then datasets from LOD are loaded. This means that PA then queries the pre-specified set
of datasets when used with Virtuoso or Sesame which is not inline with the main contribution statement.

Section 4.2:
- there is discussion on usability and expressivity, and also 'simplicity' of the interface;
who are the intended users of PowerAqua? Simplicity can only be discussed in the
context of who is going to use the system. My suggestion would be to avoid statements
like 'simplicity of the interface' unless this is tested with the users.

Figure 2: all scores seems to be 0. Maybe choose another example where ranking
shows some scores (or add explaination why these are all zeros). This figure is
not easy to read on the printed paper.

"Once the query has been thrown to the system" >> consider revising

Section 4.3: similar as above, highlight that these are not RDF triples

an specific role >> a specific role

4.3:
with vary in length >> which vary in length?

4.4:
"entities that compose the semantic resources" >> semantic resources?
- how do you decide how many terms/synonyms to take from Wordnet?

music gene>>music genre?

Section 4.4. describes how rock is disambiguated to refer to a 'music genre' and not to 'stone'.
This description should be revised to be more clear: from the description it seems that no context
is used for this disambiguation? also, what happens if one term refers to several URIs
all with the same local names but different namespaces?

these tables are known Entity ... >>these tables are known as Entity ...

4.5:
"Triple Mapping Component also called Triple Similarity Service": it might be
easier for a reader to follow if one of these terms is used only

moviedatabase ontology: is this really the name? or is this IMDB (movie database ontology)?
Add reference.

Paragraph 2
" the QT 1 is mapped by the TSS with the
OTs and
found in the DBPedia and the moviedatabase ontol-
ogy respectively ..."
is a bit hard to follow due to the same formulation used in QTs and OTs.
To my understanding OTs contain actual namespaces such as dbpedia:actor
while QTs would be just 'actor' as referring to the word which is not yet
linked to any ontology resource.

'Answers obtained from clearer mappings are ranked first'>> what is the 'clearer mapping'?

Section 5:
"In this section we present the four main evalua-
tions conducted to test PowerAqua. For each evalua-
tion we report: a) the context in which the evaluation
was conducted, b) the evaluation set up and, c) the
lessons learned. In addition we present in this paper
the latest PowerAqua's evaluation, focus on assess-
ing the performance of its algorithms using different
semantic storage platforms."
- does this mean then that you are presenting 5 evaluations? Also, it would
be nice to list here the evaluation measures. What is performance? Precision, recall?

- 5.1: evaluation of PowerAqua as a query expansion component to an IR system. This evaluation might be very
nice in the context of some other work (paper?) but not in the context of the paper
which addresses the problems such as finding the answer to a NL query in the
set of ontologies linked on the Web. Also, while there is a mention of 20% improvement of the precision in
comparison to the baseline, were there any trade-offs, such as the execution time?

Section 5.2 presents the evaluation with questions generated by users familiar with SW. However,
the 'semantic information space' was not the whole semantic web space but a subset which is preloaded
into a repository and then queried. While I can understand the reason for this (to make results reproducible)
what I would really like to see is the questions run against the whole semantic web space (as that feature seem to
be the main contribution of the paper which I really appreciate).
Further on, it is mentioned that it was hard to assess failures of PowerAqua in the evaluation from Section 5.1.
But which failures if from 20% of the queries which were covered by ontologies, all 20%
had improved result using PA as a query expansion tool?

- it is stated that questions for evaluation were generated by users who ensured that
the answer is covered by at least one of the ontologies in the information space. How did they
check this? Using SPARQL queries? It would be interesting to know how long it took them to generate
these questions as it will probably motivate the work even more.

"The average query answering time was 15.39 seconds, with some queries being
answered within 0.5 seconds."
- how about the maximum query time? That would be interesting to know and also
if it is long, why it was longer than for the others?

" Secondly, while the distinctive feature of
PowerAqua is its openness to unlimited do-
mains, its potential is overshadowed by the
sparseness of the knowledge on the SW. To
counter this sparseness, the PowerAqua algo-
rithms maximize recall, which leads to a de-
crease in accuracy and an increase in execution
time. " >> consider revising, I do not understand this;
how does PA maximises recall while the accuracy is decreased? Also, would
'sparseness' exist as a problem in the real scenario for PowerAqua where it queries the
whole Semantic Web space?

In Section 5.2 the results are reported as accuracy while in 5.3 you used precision and recall.
It would be nicer if you would start with one measure preferable precision and recall.

Section 5.3:
'such as SWETO or the DBPedia'>>such as SWETO and the DBPedia
"As queries we collected a total of 40 questions selected from the PowerAqua website"
- how did you collect these questions? who asked them? how they related to the PowerAqua website?
majority seem to be related to geography so I might have missed something.

Section 5.3. presents the evaluation of merging and ranking altorithm. It is stated in the papercthat this algorithm was not available when doing the evaluation in Section 5.2. In order to really
see the effect of this algorithm, why not repeat the same experiment from section 5.2. only with this component
enabled? Current evaluation setup adds up to the set of the ontologies being queried but also the set of questions. Therefore it
makes it hard to assess the real effect of the merging component.
" An
interesting, observed, side effect was that, answers to
some questions that were distributed across ontolo-
gies could only be obtained if the partial results were
merged. "
- Nice, but how about showing the table with and without the merging component? It will be easier to see the difference.

5.4: the evaluation reports the repeated experiment from 5.3 with only difference
that DBPedia dataset is added to the semantic search space. However, in 5.3. it is
stated that:
"To represent the information
space, additional metadata was collected with respect
to the previous experiment [22], up to 4GB, includ-
ing very large ontologies, such as SWETO or the
DBPedia. "
- this now confuses me as it seems that DBPedia was used in the evaluation 5.3. as well, what was then different in the evaluation
presented in 5.4?

'fusion algorithm'>> it would be nice to introduce this as a term somewhere earlier in the paper - maybe at the parts where
you talk about " knowledge fusion and ranking mechanisms"; is this the algorithm implemented in the 'merging and ranking component'
evaluated in Section 5.3?

due two>>due to two

"The average number of valid an-
swers obtained after applying the fusion algorithm,
which has a precision of 94% [4], increased from 64
to 370 when the DBpedia dataset was used."
- I am a bit confused with this statement. Does this mean that recall is increased when using DBPedia while the precision was not affected?
I would strongly recommend that at the begining of the evaluation section you give definition of precision and recall and then use only
them to report the results.

It is mentioned that the execution time is increased due to adding DPBedia, and also
that the authors then considered to use Virtuoso. But which repository was used in this experiment?
Sesame is mentioned in Section 5.4. but this should be mentioned earlier.

the address>> to address

5.5:
introduction or Virtuoso>>of Virtuoso

6:
While the main contribution of the paper is the "deep exploi-
tation of the massively distributed and heterogeneous
SW resources to drive, interpret and answer the us-
ers' requests."
which I really like and appreciate, the evaluation is done in the closed experiment
where all datasets are loaded into one repository (Virtuoso and Sesame). This seems
not to make any difference from other approaches and what I would really like to see
(even with a very slow performance) is the evaluation with the semantic web on the fly
as this is what seems to be proposed in the paper. If this evaluation is still far
from achievable then the main contribution of the paper should be revised to be
inline with the results reported in the evaluation (maybe focusing on the size
of the dataset as this seems to be a positive side of the approach in comparison
to others?).

All figures are placed at the end of the paper while it would be nicer to have them at
the current place holders.

References
[1] D. Damljanovic, M. Agatonovic, and H. Cunningham. Natural
language interface to ontologies: Combining syntactic analysis
and ontology-based lookup through the user interaction. In
Proc. ESWC-2010, Part I, LNCS 6088, pp. 106–120. Springer,
2010. http://gate.ac.uk/sale/eswc10/freya-main.pdf

Solicited review by Jorge Gracia:

Summary of the paper:
This system paper offers a comprehensive view of PowerAqua, an ontology-based question answering system. The system receives queries posed in natural language which are analysed linguistically and translated into semantic triples. These triples are mapped to equivalent or related statements discovered in a pool of ontologies. Underlying data satisfying the query are retrieved. After a merging and ranking process, the final answer is given back to the user. The paper motivates the interest of such a system in a "massive and heterogeneous information space", describes its architecture, and presents various evaluations.

Comments:
This paper describes consolidated work, summarizing the results of a consistent and well-founded research line. In my opinion, this approach is highly relevant for the community and has many ingredients for being a killer application in a near future. The paper is clear, illustrates well the research problems and proposed solutions, and convey to the reader both the capabilities and the limitations of the system. Here are some comments that could enhance the paper.

- The system deals with some LOD semantic datasets although needing to store them locally in Virtuoso or Sesame repositories. This is an important issue and, in appearance, a practical limitation. Some questions arise, such as why not to keep these datasets online and to access them by using sparql endpoints (when available) or by other means (like www.sindice.com). I guess this for achieving real time performance, although I would appreciate more details in the paper justifying this design decision.
- There are some overlap between sections 1 (introduction) and 2 (related work). I recommend to move (and merge) the three paragraphs starting "As an example..." from section 1 to section 2 in order to improve readability.
- Some citations (or footnotes) are missing in the paper for TREC QA, WordNet, TAP ontology, and the WSD techniques mentioned in 4.4
- Section 3. The research challenges 3 and 5 seem to be highly related and can be merged.
- Research challenges 4, 5, and 6 are introduced as a consequence of the emergence of LOD. They are however not specific of LOD in particular but affect the Semantic Web in general (though LOD exacerbates them).
- Section 4. The numeration of the component is not aligned with the numbers in Figure 1 (e.g., 1 is "semantic storage platforms" in the text and "linguistic component" in the figure). I recommend to follow the same numeration in both places for the sake of clarity.
- Section 4. Item 4 of the architecture should be more concise, following the style of the other items (e.g., some details provided later in section 4.4 could be omitted).
- Section 5.2. The five items included in "lessons learned" are conclusions that could be moved to the end of Section 5, owing to the fact that they are general enough and seem to be relevant not only to the experiments in 5.2 but to the whole section.
- The authors mention the suboptimal performance of current ontology repositories as an issue that limits PowerAqua capabilities. More details (or a reference to them) would be appreciated. A better insight on the behaviour of such repositories is an interesting result by itself and might be useful for many researchers.
- In Section 5.3 the ranking algorithm is mentioned in plural, stating also that "the best ranking algorithm was able to...". There were more than one algorithm tested? (different to the method described in 4.6).
- The headers of table 1 are not significant enough. In particular "after dbpedia" and "after virtuoso" (what they really mean is accessing dbpedia in sesame or in virtuoso).
- The impact of the system has to be emphasized in the paper in order to fully accomplish the requirements of a system paper. For instance, some details could be added about other research lines influenced by PowerAqua or reusing (or planning to reuse) some of its components. Also some metrics about the visits/usage of the PowerAqua web application (if available) could be added.
- The paper is clear and well written. Nevertheless, I discovered a few typos. E.g.:
First paragraph in 4.1: "...including: the Watson SW gateway, b) Virtuoso..." -> "...including: a) the Watson SW gateway, b) Virtuoso...".
Second paragraph in 5.3: "Precision and recall, where selected as evaluation metrics" -> "Precision and recall were selected as evaluation metrics".
Third paragraph in 5.3: "...the ranking algorithms was able..." -> "...the ranking algorithms were able...".
Last paragraph before section 5.5 "The address the suboptimal..." -> "To address the suboptimal...".

Tags: 

Comments