Aemoo: Linked Data exploration based on Knowledge Patterns

Tracking #: 958-2169

Authors: 
Andrea Giovanni Nuzzolese
Valentina Presutti
Aldo Gangemi
Silvio Peroni
Paolo Ciancarini

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
Abstract: 
This paper presents a novel approach to Linked Data exploration that uses Encyclopedic Knowledge Patterns (EKPs) as relevance criteria for selecting, organising, and visualising knowledge. EKP are discovered by mining the linking structure of Wikipedia and evaluated by means of a user-based study, which shows that they are cognitively sound as models for building entity summarisations. A tool named Aemoo is implemented which supports EKP-driven knowledge exploration and integrates data coming from heterogeneous resources, namely static and dynamic knowledge as well as text and Linked Data. Aemoo is evaluated by means of controlled, task-driven user experiments in order to assess its usability, and ability to provide relevant and serendipitous information as compared to two existing tools: Google and RelFinder.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jan Polowinski submitted on 10/Feb/2015
Suggestion:
Minor Revision
Review Comment:

Originality:

Although part of the work has been published before (ISCW 2011), the paper offers a good overview on the application of EKPs in a Linked Data browsing context at the example of the aemoo tool, which is available for online testing.
The parts published before include the work on Enclopedic Knowledge Patterns (based on Wikipedia page links), and its evaluation by a first user study. The approach to use EKPs for browsing as well as the description and an evaluation of the aemoo tool in a second user study are new contributions and justify the publication.

Significance of the results:

The work clearly contributes to browsing UIs for linked data. Existing interfaces often fail to simplify and reduce the overwhelming complexity of linked data sources to endusers. Aemoo shifts the focus from completeness to a possibly not comprehensive but clean UI. The necessary criterion to decide what is shown and what is not (here referred to as „knowledge boundary“) is offered by means of the path popularity defined on top of the EKP, which describe paths on a class level using most and least specific types instead of evaluating typed links.
The evaluation was carefully done (from my point of view with a rather basic statistics background) using a dedicated tool (AemooEval) and methods such as System Usability Scale and Grounded Theory; the order of tools has been switched for half of the participants to avoid a bias.

Quality of writing:

The overall quality of writing is good, but some issues may need clarification, furthermore, not too much, but some minor issues like typos exist that require some changes (see below). The structure of the paper is very clear and sections are well motivated and consistent with the hypotheses and contributions listed in the introduction.

Other issues:

*Disclaimer: some issues I noted here, were done before I read the existing work on EKPs, but since I think the paper should be self-contained, they should still be relevant*
- Intro>Contribution: method for extracting EKPs „from Wikipedia“ vs „mining the structure of Linked data“ -> be more consistent here? I understood you used DBpedia *AND* Wikipedia, but maybe rephrase this.
- 3rd page, 2nd and 3rd paragraph: „aka KPs“ vs „aka knowledge pattern“ (explained twice, more detailed comes last)
- page 4, end of first column: Is it always possible to find *the* most specific /general type? What about a resource typed with two classes, where one does not subsume the other? Do you handle this as multiple paths in such cases?
- Definition 2: threshold value explained twice?
- Fig 1b: ka:has*Property* points to a class (dbpo:Place) instead of a property, which I think is intended. But should it maybe be rather called different than „hasProperty“?
- Fig 2.: UML diagram: The attribute representation is redundant to the representation via links (skip one in favor of a larger font size?)
- same figure: what is OWL*2*-specific about this? Or could this also read „OWL“.
- Is it really necessary to generate prefixes, or do you just do this for documentation purposes (in this paper)? Why is generating URIs not enough?
- Sect. 3.1, peculiar facts: Maybe you need to explain a bit more, why you make the assumption that both pathPopularity over a certain threshold and the peculiar facts are valuable (and only paths between are boring). Side note: When I tried to experiment with the curiosity feature, no relations showed up.
- Figures in the second half of the paper seem to be often falsely referenced, e.g., there is no Fig. 5.2. This make it hard to follow the explanations, especially when it comes to the statistical evaluation.
- Related work: related work is listed, but not always compared to aemoo
- Related work: "multi-faceted” should be the same as "faceted"?

Minor issues (typos, etc.):

- introduction, second column, top: shared intensional
- Check usage of „aimed at“. Sometimes complicated sentence structure. For example, 2nd page, 5th paragraph could be simplified to: „… a system that helps humans …“
- next paragraph: „two working hypotheses“
- upper/lower case (e.g., in headlines)
- consistent (smaller) font for URLs in footnodes?
- simply say (Fig. 1b) instead of (Fig. 1(b))?
- „a“ vs. „an“
- check blank space around footnotes and at the end of sentences
- page 6, second item: missing comma in formula?
- page 6, 2nd column: „cd. Definition 1“
- footnote 17 is incomplete?
- page 12, 2nd column: On one hand .. other .. does not make sense to me here. Just leave it out? (But I’m not a native speaker as well)
- same page, last line: row->raw
- Fig 5, label : „RESTful“
- 5.2: „used ad first“ … -> as first , … as second
- page 21, last sentence is hard to get -> reorder „subjects judged the faceted (filter-based) browsing interface of RelFinder useful and awkward at the same time …“
- a few more typos exist, I can send a copy of my printout annotations if desired

Review #2
By Mariano Rico submitted on 06/Mar/2015
Suggestion:
Reject
Review Comment:

(1) originality
This tool (and its parts) has been described in several papers, most of them cited by the authors. The significant contribution of this paper is the user evaluation (less than 1/3 of the paper length). In my opinion this evaluation is more appropriate for a User Interaction journal.

(2) significance of the results
Considering only the user evaluation, the experiments are interesting and the methodology seems correct. Maybe SUS is not a fine-grain usability test to apply sophisticated statistical methods (Kendall's W, Cronbach alpha, Spearman correlation, and so on).
In page 18 it is mentioned that "We did not observed any relevant difference of SUS values between Aemoo and Google", and a discussion about p-value follows. The method used to compute the p-value is missing.
I miss the data in order to reproduce the experiments.

(3) quality of writing
Good readability. Some typos but, in general, well written.
Typos:
- page 2. tow --> two
- page 3. Knoeldge --> Knowledge

Review #3
By Aba-Sah Dadzie submitted on 18/Apr/2015
Suggestion:
Major Revision
Review Comment:

The paper describes the use of “Encyclopedic Knowledge Patterns” to guide, within defined boundaries, exploration and summarisation of heterogeneous, static and dynamic knowledge, including that derived from Linked Data. (Where “encyclopedic” refers to knowledge mined from Wikipedia.) The aim is to manage challenges due to data heterogeneity especially across multiple linked datasets and other related data.

The paper, which continues extended work by the authors on EKPs, presents a fairly detailed description of the approach taken, with the aim to verify two hypotheses - that EKPs provide sound, cognitive base from which to generate summaries about a selected entity, that may be used to aid exploratory search, by providing also, peculiar or serendipitous information at the entity.
The authors describe a tool, Aemoo, implemented to illustrate the utility of EKPs, and carry out a user evaluation to measure usability, compared to google and RelFinder.

The authors do a good job of presenting the work done, show where this builds on their previous work on EKPs, and end with pointers to future work.
There is one point that is important, though, wrt to the call - the visualisation aspect of the paper appears to be coincidental, or at best, secondary, the focus is on describing EKPs and how they are built, and illustrating their implementation in Aemoo. This is not a bad thing, but to be relevant to the issue a good degree of emphasis should be on the contribution of the visualisation/visual analysis to solving the problem posed. That visualisation - for discovery and/or summarisation, or even presentation of LD, is not addressed at all in the related work bears this out.

In the description of Aemoo, the authors state it uses a concept map - why was this chosen? What other options, if any, were considered? How does this relate to other similar tools, such as RelFinder - of the two tools it was compared to, RelFinder utilises a related visualisation technique, and the authors also stress that the comparison with RelFinder is their focus.
Importantly, how does the visualisation contribute to/influence the discovery and summarisation tasks? In and of itself and compared to RelFinder? The evaluation doesn’t look at the contribution of visualisation, in either Aemoo or RelFinder, to solving the tasks. The comparison again is on the utility of EKPs only - note that this IS important. The comment is simply to highlight the fact that for this target it must also look at the specific contribution of visualisation/visual analysis.

In this vein, also, I’m not completely convinced that google is an appropriate tool to compare with the other two - it presents results as text lists, while the other two have graph layouts supplemented by text. It is possible that this may have been a confounding factor. Especially since time to complete was one of the measures of usability.

Another suggestion - a lot of the work presented in [27] on generating EKPs is repeated in the fist half of the paper, this could be summarised at a higher level, which would reduce the length of the paper somewhat.

****

what would be considered a “peculiar fact”? This is not finally explained till p.10 - I would suggest at least a forward reference. Secondly, how do you distinguish between expected and peculiar facts in the visualisation?
Further, I’m not convinced about interpreting the presentation of “peculiar/curious” information as serendipitous.

“We assume that EKPs are cognitively sound because they emerge from the largest existing multi-domain knowledge source, collaboratively built by humans with an encyclopedic task in mind.” - true, but it is also easily argued that Wikipedia contains unverified and/or biased articles, as it doesn’t undergo the same type of verification as an encyclopedia such as Brittanica.
This is finally addressed, but from a different viewpoint - on the assumption the participants were sufficiently well informed on the topics and/or could find the answers in wikipedia (which isn’t really verification of its content). At the least the authors should point forward to this section. And even with that, I think the point still needs to be addressed further. It may simply be that the assumption of correctness of Wikipedia is clearly stated.

Evaluation
- why the emphasis on “subjects of different culture and language” - was this coincidence or is there a reason for users with differences in culture and language? If so, then whatever specific insight expected as a result should be reported in the findings.
Note also, the term “participants” is now recommended, rather than “subjects”.
Also, there is no description of the participants beyond this - needed to correctly interpret the evaluation results.

- too many unrelated examples are given. I would suggest examples related only to that used for each task to illustrate how it could be answered.

- “The user-study was supervised by an administrator, who was in charge ” - administrator is strange here - is this the observer/evaluator, i.e., the person observing the users and, say, taking notes about what was going on?

- “with the so-called axial coding” - “so-called” implies it isn’t really axial coding. Either delete “so-called“ or replace with what was actually done.

Is there any evidence for this statement: “RelFinder is one of the most widespread tool in the Semantic Web community supporting exploratory search.” ? Otherwise using this as a baseline for the usability for Aemoo is questionable.
Also, if so widely used how come the users were unfamiliar with it (p.25)?

“Namely, we wanted to prevent that the answers provided by a certain subject during the first iteration affected the answer provided by the same subject during her second iteration. “ - actually, you can’t prevent that, but rather try to normalise by doing exactly what was described - splitting so that the bias evened out.

What exactly was the challenge in using the faceted browsing in RelFinder - the conclusion that “This suggests that some filtering mechanism should be provided in a transparent way to the users.” - may not necessarily simply resolve this - there is s a limit to automated filtering.

“The SUS is a well-known metrics used for the perception of the usability “ - why “perception” - this implies it may not really measure usability.

The conclusion that the results show Aemoo to present “the best ratio of relevance and unexpectedness” is debatable. The authors describe these peculiar/curious links as information that is not normally seen to be relevant, and inverted the “relevance criterion provided by an EKP” to obtain these. Unless Relfinder and google both provide this option, or the users were provided with instructions as to how to search in this way, the comparison is not fair. The statement should be made on its own, maybe as a feature with value on top of typical search.

*** Other points

some auto-links not properly formatted - break when clicked on

Some figure labels incorrect - e.g., Figure 4.2 (shd this be 6?)

“five free-text answering questions” - can be referred to simply as “open” questions

AemooEval is referred to as if it were a person or an animate object, e.g., “AemooEval put the system to use …” not really possible, it was used to put the system to use…

**** a number of grammatical errors and typos - an auto-check and proofread should catch these, e.g.,

instensional - should this be intensional?

“without loosing the overview of an entity” -> “without LOSING the overview of an entity”

“an user” -> “a user”

(i.e., “Used ad first tool”, “Used ad first tool” and “Average”).

“incoming and outcoming” -> OUTGOING


Comments

The right list of references in the cover letter is:

1. Nuzzolese, A. G., Presutti, V., Gangemi, A., Musetti, A., Ciancarini, P., 2013. Aemoo: Exploring knowledge on the web. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM, pp. 272–275.
2. Nuzzolese, A. G., Gangemi, A., Presutti, V., Ciancarini, P., 2011. Encyclopedic Knowledge Patterns from Wikipedia Links. In: Aroyo, L., Noy, N., Welty, C. (Eds.), Proceedings fo the 10th International Semantic Web Conference (ISWC 2011). Springer, pp. 520–536.