Semantic Turkey: A Browser-Integrated Environment for Knowledge Acquisition and Management

Paper Title: 
Semantic Turkey: A Browser-Integrated Environment for Knowledge Acquisition and Management
Authors: 
Maria Teresa Pazienza, Noemi Scarpato, Armando Stellato and Andrea Turbati
Abstract: 
Born four years ago as a Semantic Web extension for the web browser Firefox, Semantic Turkey pushed forward the traditional concept of links&folders-based bookmarking to a new dimension, allowing users to keep track of relevant informa-tion from visited web sites and to organize the collected content according to standard or personally defined ontologies. Today, the tool has broken the boundaries of its original intents and can be considered, under every aspect, an extensible platform for knowledge management and acquisition. The semantic bookmarking and annotation facilities of Semantic Turkey are now supporting just a part of a whole methodology where different actors, from domain experts to knowledge engineers, can coope-rate in developing, building and populating ontologies while navigating the Web.
Full PDF Version: 
Submission type: 
Tool/System Report
Responsible editor: 
Decision/Status: 
Accept
Reviews: 

The manuscript available from this page is the final to-be-published version, following a (second) resubmission after a "conditional accept pending minor revisions" decision.

The first reviews below are for the original version, resulting in a "reject and resubmit" decision.

Then follow the reviews for the first resubmission, resulting in a "conditional accept pending minor revisions."

Review 1 by Roberto García for original submission:

The paper addresses a very interesting application of Semantic Web technologies for web pages annotation and ontology editing also from web sources.

The main caveat is that the paper is hard to read as it tries to present the application from to many points of view and show how many things the tool is capable of. It would be really appreciated if the authors concentrate on the key issues and features, specially taking into account the intended audience of the journal.

In this respect, and also considering that the papers is almost 5 pages over-length, I should recommend a major revision that makes it shorter and focuses it more. For instance, I would recommend removing section 7.2. It is an interesting section for a ST developer but the natural place to look for it is the developers section of the ST website.

Another global recommendation is to try to simplify wording and make shorter and more concrete sentences (some examples of this below).

Now, going into more concrete comments:

Abstract: "...a whole methodology where different figures,..." --> actors instead of figures makes the sentence clearer.

1. Introduction, 1st par, 1st sentence: "...aver and ever..." --> fix error but it can be removed to make sentence clearer.
Introduction, 1st par, last sent: "on the Web, Open Data" --> not clear, "Web of Open Data"?

1. Intro, 2nd par, 1st sent: the whole par just one sentence. Very hard to follow. At least break it at "second". E.g "technology-savvy users). Second,...".

2. Related Works --> "Related Work"?

2. Related Works --> This section enumerates a lot of tools and initiatives that relate to the proposed work following different dimensions. However, they are basically enumerated and described so a more detailed comparison highlighting what ST adds in comparison with them would be helpful. Moreover, this information should be presented in a more structured way.
E.g. use subsection for the different kinds of tools analysed (ontology editors, visualisation, annotation,...). If its is feasible, a summary table showing tools and their features would be nice. Finally, section 2 ends with a overall justification of ST in relation with the presented tools based on ST providing all these features combined in one tool. It would be useful to have also a more general justification about why users would want to have all them combined.
In other words, annotation is a clear end-user tasks and already proved its usefulness. However, is it practical and specially scalable to build ontologies while annotating web documents? Are there user studies supporting this? Or are this ontology editing features just a way to tweak existing ontologies or those generated automatically by analysing a text corpus (beyond web documents)?
In fact, this scenario (annotation combined with ontology edition) could be presented and motivated in the intro and used as the driving force through the paper, e.g. when comparing to other alternatives or when talking about user evaluation.

3. Motivation, par 1: "partitive semantics"? --> reference please (or elaborate this concept)

3. Motivation, par 2: the justification for not keeping a reference to the precise position of the annotation in the document is not solid. In fact, it might be possible to cache the document and thus offer and added value to the users with annotations that include timestamps and versions of the document at that time point.

5. User Interaction. Fig. 2: When adding new value for property of individual, if the property is a DatatypeProperty, why isn't the "ask for language" included in that branch of the activity diagram?

6. Knowledge Model: a figure showing an annotation example should be helpful. Moreover, the name for the concept corresponding to the annotation of a piece of text is quite confusing, i.e. TextualAnnotation. It conveys more that the annotation is based on a text string than what is annotated is text. It is much more clearer in the other case, ImageAnnotation. An alternative might be TextAnnotation, but it might be even clearer as AnnotatedText...

8. Conclusions, par 1: Complex wording, too many "as", "as well as", "and", "or as",... and then ":" and the phrase continues...

8.1, par 1: Some details about UIMAST and STIA, in the context of exemplifying the extensibility and applicability of ST, would be appreciated.

8.2 User Evaluation: user interaction is the key issue when talking about a tool intended for end-users. Consequently, much more effort should be paid in this section and make it a main section, not a subsection of the conclusions. A questionnaire is clearly not enough and supervised test with users should be conduced. It is fine to recruit students (e.g. computer science ones) as users, they are clearly a user profile and they can provide a lot of feedback about the system. Other user profiles should be also analysed (domain experts, etc.).

8.2, par 6: wrt --> too informal. Moreover, whole par just 1 sentence. Hard to read. E.g. first break after "...positive feedback)". Continue with "However, some of the users...". Second bread at "...other than instances". Then, the fragment "outside of the questionnaire" is redundant. You can keep "This feature has been requested explicitly...". Finally, "in need of providing training" --> "who need to provide training".

8.3, par 1, 1st sentence --> "Probably"?, future steps should be clear. Moreover, the whole sentence might be more concise and clear. E.g.: "Future work concentrates on making ST an ontology development tool, in addition to its annotation features". This point should be made clearer through the whole paper as it seems the added value. In any case, as mentioned before, it should be demonstrated that ST can become an ontology development tool capable of scaling and that web page annotation is a practical scenario for users to develop a full ontology, not just to instantiate or tweak it.

8.3, last par --> The relation with Semantic Desktop is not clear, it seem a little forced...

Review 2 by Leo Sauermann for original submission:

I accept this paper because its a tool and it seems to work (I tried it).

I only accept it because the journal explicitly said that no evaluation nor "beyond the state of the art" is needed. I have seen DBIN, OpenIris, and PiggyBank go into oblivion, I sense that Semantic Turkey will follow, disprove me.

Myself,
I question the overall approach because of many weak points, both technically and from the usability focus.

BUT still you seem to have a growing user base for the tool, and Turkey WORKS (I tried it). So although I don't like it personally, I wish you much success with Turkey. Now read on for a very critical review, which may help you to simplify Turkey. Or which will make you angry at me, but as this is a non-anomyous review, I am happy to receive comments.

* There is no evaluation. Although the tool may be used, there is no scientific record of that use. This, together with the Protege-like complexity, and my own experiences using the tool, makes me wish for an end-user evaluation of the tool for a specific purpose and user group.
* Your code is obsolete if you just had integrated Protege into Firefox. The user interface does first completly replicate Protege, then add one or two features (namely, drag-drop).
* That also means - the tool is as usable as Protégé, which is not always considered user-friendly. Given that it is not as mature as protege, it is even less usable (for example, protege allows setting the label property per class)
"
In the conclusions you mention extension mechanisms - these are comparable to PiggyBank's "scrapers". Now - PiggyBank is pretty much dead now, why do you think Turkey will survive any longer?

related approaches
* Revyu.com as semantic bookmarking service may have been interesting as another 3rd party tool.
* NEPOMUK.semanticdesktop.org, especially dev.nepomuk.semanticdesktop.org. This is a OSGi RCP based extensible Semantic Desktop browser. A prototype.

I also question the need for your ontology:
* The annotation ontology is a replication of the revyu.com ontology, the tagging ontology, and many other ontologies.
* The semantic desktop ontologies may be interesting to you: www.semanticdesktop.org/ontologies. You refer to them at the end, but did not use them.
As semantic web people, you must reuse ontologies. Adding a figure explaining how your ontology sits on top of other ontologies and the data may also help.

The implementation details are too much:
* cut section "JavaScript API for accessing Ontology Services"

Looking at the Figure 4 - Architecture - and having analyzed comparable systems before (CALO OpenIris, PiggyBank, DBin, NEPOMUK Java, Gnowsis 0.9) I see an apparent joy of overengineering. You may experience a loss in productivity and developer motivation soon, as all other listed systems did experience in the past.
The "flexibility" you bought is the achilles heel - its too much flexibility for too little appearant use of the tool. But alas, its up to you to go further and disprove this assumption of mine.

My personal comment is: simplify the architecture into half that complexity to attract more non-academic open source developers.
Using OSGi is a leap of faith here - I guess that this makes JUnit testing and ANT building even more harder than usual Java projects. You gain only a slight flexibility which you can also achieve using Spring. I bet that this will bite you back soon by a drop of productivity and developing motivation (see DBIN, for example).

As alternative to OWL ART you can also check out RDF2go:
http://semanticweb.org/wiki/RDF2Go
RDF2Go is used by us as Jena/Sesame abstraction layer in our very popular Aperture toolkit: aperture.sf.net

In "future work" you hint that you may want to add SKOS and full OWL. I rather propose to remove half the existing features and make this work better for a precise task.

This is wishful thinking: "The presented architecture ... could naturally aspire to a colla-borative framework allowing knowledge engineers and domain experts to exchange information" - the example of DBin already proved that a complex architecture alone does not aspire anywhere. In my very nitwitty opinion, it would be more important to create a scenario where the system is usable for a certain user group and then evaluate the system with usability or performance metrics.

DETAILS

Usability:
* the registration process asks for a base URI.
It should give a default URI.

* ontology search must include instances

Your classification of Haystack is arbitrary:
Haystack is more an ... Information Management System ... then a general Knowledge Management and Acquisi-tion system".
This is obviously wrong, as you will not be able to come up with a proper distinction between "Information" and "Knowledge" in this regard.
Cut your classification, replace by just saying what Haystack does, and this you already do quite ok.
"Extension for Eclipse" - I doubt that. Earlier versions were not much related to Eclipse, later they used the Rich Client Platform (RCP) and not the IDE.
The argumentation that users wish to stay within their web browser and not switch to haystack is correct.

Google notebook is deprecated by now.

This sentence is wrong and must be changed:
"Standing on top of mature results from research on Semantic Web technologies, such as Sesame [27] and OWLim [28] as well as on a robust platform such as the Firefox web browser, Semantic Turkey differen-tiated from other existing approaches ... [...14]"

14 (PiggyBank) is far more mature than Turkey when it comes to usability and clear benefit for end users. It supports collecting RDF data with lesser clicks and less options compared to Turkey. Also, PiggyBank uses exactly the same technologies underneath (sesame, firefox) and thus the comparison is also wrong.
PiggyBank is deprecated, but still, as a prototype, it is more user friendly than turkey for non-semantic web experts, and thus more mature in that aspect of "user friendlyness".
I used it myself for some time and followed the papers by the authors.

TYPOS:
...which has been developed in-side the homonymous IP project co-funded...
-> IP stands for "integrated project"
-> I suggest to write "homonymous integrated project co-funded"....

Review 3 by Knud Möller for original submission:

SUMMARY:

The paper presents Semantic Turkey (ST), which is a hybrid tool for semantic bookmarking of Web pages as well as a basic ontology modelling tool. ST is implemented as Firefox extension, and can itself be extended at a wide variety of plugin points. Users are presented with additional panes in FireFox which show the usual tree-based ontology hierarchy. Text snippets can be dragged onto classes to create new instances, or onto existing instances to also create new relations. Ontologies be loaded, extended, manipulated, queried, etc., all from within Firefox.
The tool is freely available, easy to install and can (from someone with experience in SW-based knowledge modelling) be used intuitively without reading a lot of documentation first. However, it's still rather buggy and could benefit from a much better UI (see below).
The paper is quite detailed (in fact, it's over-length, see below). It covers a good bit of related work (but not all relevant, see below). The motivation is kept quite short and mainly focussed on differentiating ST, the semantic bookmarking tool, from semantic annotation, which focussed on text rather than knowledge (according to the authors). I find this too weak and unconvincing: who is the target audience of this tool? Simple bookmarking or tagging works for casual users because it is simple, the semantic bookmarking proposed here is quite complex and way over the heads of most users (I fear). So, are you targeting experienced knowledge workers? In fact, I'm not even sure if turning a Web browser into an ontology editing environment is a good idea. There is a lot of room for discussion here.
The paper then progresses to describe the user interaction model, which is claimed to hide a lot of complexity of the knowledge modelling process from the user through interaction patterns such as drag&drop. I partially agree to this, though I still feel the application is much to heavy for ordinary end users.
The knowledge model of the tool is presented briefly, followed by a long section on architecture and extension mechanism. For me, this section was pretty much superfluous and should probably be cut together to a minimum. I don't see the innovation of ST in its architecture, and I don't really care about each individual component. This should not be the focus of the paper at all. Similarly, the extension mechanism is nice, but I don't need to know in detail how it works and which methods are called. Instead, I would like to know how the extension mechanism has been used beneficially. What is an interesting use case to show what it is good for? This is hinted on in the discussion of the paper, but should be extended.
The papers has very little to no evaluation. In fact, the authors say that they are "currently conducting [an] evaluation" - that is very disappointing. Much more than architecture implementation details I would like to know if ST is actually considered useful by a large community of users (the authors claim that the tool has hundreds of users in the "note for reviewers" at the end of the paper).

All in all, I consider ST an interesting and useful tool (albeit not really mature, but rather at alpha stage), which is adequately presented in the paper. However, I'm missing a clear message in the paper (what is the main contribution), and the missing evaluation is bad. In detail:

- the paper needs to decide what it wants to focus on. My advice: cut the architecture details. They may be interesting for software developers, but they add nothing to the discussion of Semantic Web tools in particular. You can save some pages here.
- the paper needs to motivate better who this tool is actually meant for. I don't see ST as an end user tool; it is much too complicated for that. So, who will use this, how and why?
- the related work section is incomplete in some respects, and too detailed in others.
- the tool itself works well enough, but needs a lot of massaging and fine tuning.
- the evaluation is poor, which is a major drawback of the paper. In order to show that ST is indeed a mature tool, a better evaluation is indispensable.
- the language is ok, but riddled with spelling and grammar mistakes (see below). If accepted, it needs a very thorough revision.
- the paper needs to be shortened; ST doesn't warrant 15 pages. I think there is a lot of potential to remove irrelevant sections.

FORMALITIES:

The paper is over-length, but this has been discussed with the Journal editor beforehand. However, I don't think it needs to be this long at all!

TRYING OUT SEMANTIC TURKEY:

- ST still seems to work ok as an alpha version. It could actually be a pretty neat tool if the UI was improved a lot. As it stands, it's still quite buggy and inconvenient, and not really mature, as asked in the CfP. I tried it out on Firefox 3.6.6 on Mac OS X 10.6.4.

- Installation worked perfectly. Big plus here!

- I found it harder than necessary to import vocabularies from the Web: either ST doesn't follow redirects, or it doesn't properly request RDF from the server. E.g., I tried to import FOAF and SWC through their namespace URIs. This didn't work. ST seems to request the HTML documentation documents and consequently barfs. It _should_ request the RDF representation instead! In the end I had to point ST to the URI of the actual RDF files, which should not be necessary at all.

- To annotate an existing person instance with its employer, I dragged the name of the employer on the person. This prompted me with a panel from which I could select a property. I did so and clicked "annotate instance". Nothing happened. Then I did what every user does and clicked the "annotate instance" button repeatedly (the computer usage equivalent to talking VERY LOUD to someone who doesn't speak your language). Nothing happened. Eventually I clicked "cancel". And _then_ I got a new panel to select a class to assign to the new employer instance. Once I finished this panel it immediately showed up again, as often as I had clicked "annotate instance" before. Obviously a bug.

- classes in the ontology browser can should up several times (possibly because it was defined once as an owl:Class and once as an rdfs:Class)

- the resize handlers never seem to work right

- if new instances are based on links, their names are bizarrely chosen as the escaped URI of the link. Why not the text of the link?

REFERENCES TO RELATED WORK:

- when you discuss browsing of SW data, you absolutely have to mention linked data browsers such as the Tabulator, Marbles, etc. In general, your references are a bit old; there has been newer work in the area!
- regarding semantic annotation, you should not forget to mention semantic wikis. Here is one (of many possible) references which I happen to have at hand:

Oren, E., Delbru, R., Möller, K., Völkel, M., and Handschuh, S. (2006). Annotation and Navigation in Semantic Wikis. In SemWiki2006 — From Wiki to Semantics, at ESWC2006, Budva, Montenegro.

- you claim that you have coined the term "semantic bookmarking". I'm not sure you did, there is a relevant publication from 2004 about this topic:

Mukherjee et al. "Semantic bookmarking for non-visual web access" (http://portal.acm.org/citation.cfm?id=1028663)

- there is also more recent work on the area which you completely ignore, e.g.:

Braun et al. "Social Semantic Bookmarking" (http://www.springerlink.com/content/162680743u60775m/)

- Rather than talking at length about "old", only distantly related work, you should discuss and compare to work that is right up your alley!

SPELLING, GRAMMAR, LAYOUT, ETC:

- in general, you need to more consistent with your capitalisation. Only capitalise proper names, not disciplines or research areas such as "semantic annotation". Some examples of what not to capitalise: ontology engineering, ontology development, information visualisation, information management system, knowledge management and acquisition. Some examples of what to capitalise: NLP, Annotea, Dublin Core, the Web (if you mean _the_ WWW), Java, etc. I didn't mark all of these explicitly, so please revise carefully yourselves.

- for a gender-neutral way of writing e.g. about users, you can use "they" as a personal pronoun instead of "he" or "she". E.g., you can say "... the user is adding the musician Steve Morse in their ontology". This is a topic of hot debate among language purists, but I find it a very useful and adequate way of writing.

- a lot of hyperlinks are rendered within boxes - I'm sure that's not right?

- several section header have messed up formatting, such as 5.1 and 5.2

- please write out abbreviations such as "wrt"

- don't use back references for simply links to web pages. Just put in a foot note.

p1: s/aver and ever/ever and ever
p1: s/a two-years work/a two-year period
p1: s/we describe the original application/we describe an original application
p1: s/Related Works/Related Work
p2: "off-the-shelf products such as [8]." - just name it: "off-the-shelf products such as Topbraid Composer^footnote."
p2: s/, was conceived as an/, it was conceived as an
p2: s/then a general Knowledge Management and Acquisition system/than a general knowledge management and acquisition system
p2: "RDF repositories sparse over different arbitrary locations" - what does that mean? "spread out over"?
p2: s/Eclipse flexible plug-in mechanism/Eclipse's flexible plug-in mechanism
p2: s/browsing the web/browsing the Web
p2: s/for his usual/for their usual
p2: "purely 'exposed' web content" - what do you mean?
p2: s/From (part of) the same authors of Haystack/From some of the same authors as those of Haystack
p3: s/automate the Semantic Annotation/automate semantic annotation
p3: s/the best of worlds/the best of all worlds
p3: s/hand work/manual work
p3: s/Semantic Turkey differentiates/Semantic Turkey differs
p3: s/were considered the good compromise/were considered a good compromise
p3: s/for the user which/for the user, which
p3: s/presence in the web/presence on the Web
p3: s/provenience/provenance
p4: s/his ontology/their ontology
p4: s/on top/on the shoulders
p4: s/differentiated/differs
p4: s/tailored respectively/tailored
p4: s/the web/the Web
p5: s/carry on/carry out
p5: the paragraph at the top of the second column starts with "at the cost of" - this sentence seems to be incomplete; I don't understand what it means
p5: s/following user's/following the user's
p6: s/but he can/but they can
p6: s/embedded in web browser/embedded in the Web browser
p6: s/Web Based/web-based
p8: s/which guides system's behaviour/which guides the system's behaviour
p8: s/an highlighter/a highlighter
p8: s/hig-hlighted/high-lighted (wrong hyphenation)
p12: "In this issue" - what do you mean by issue? paper?
p12: "in its last official release" - you probably mean "latest release"?
p12: s/steepy/steep
p13: s/steepy/steep
p14: reference 24: is it really "Gobbleing"? It should be "Gobbling"

Review 1 by Roberto García for resubmission:

The changes have been addressed given the space limitations and the content has been adjusted to the 12pp limit appropriately.

Review 2 by Knud Möller for resubmission:

In my second review, I focus on how the authors have address the specific points raised in the first review. All in all, I can see that improvements have been made, and the paper is now more suitable for publication. I'm a bit disappointed though that some of the issues have not been addressed at all.

Here are the individual issues:

"- the paper needs to decide what it wants to focus on. My advice: cut the architecture details. They may be interesting for software developers, but they add nothing to the discussion of Semantic Web tools in particular. You can save some pages here."

The authors have significantly cut down the architecture section, which helps to avoid the diluted focus of the previous version.

"- the paper needs to motivate better who this tool is actually meant for. I don't see ST as an end user tool; it is much too complicated for that. So, who will use this, how and why?"

I didn't really see an improvement on this point. In the introduction, the authors say that ST is of interest to "ontology developers and domain experts" (as in the previous version), but not really much else is said.

"- the related work section is incomplete in some respects, and too detailed in others."

Some of the suggestions in the review have been addressed. Tabulator is now mentioned (but none of the other linked data browsers), and the paper by Braun et al. is referenced (though only in a footnote).

"- the tool itself works well enough, but needs a lot of massaging and fine tuning."

It seems that there has not been a new release of ST to address any of the points I raised in the previous review. Why not?

"- the evaluation is poor, which is a major drawback of the paper. In order to show that ST is indeed a mature tool, a better evaluation is indispensable."

Still no complete evaluation.

"- the language is ok, but riddled with spelling and grammar mistakes (see below). If accepted, it needs a very thorough revision."

Most of the corrections I suggested in the first review have been addressed. I found a few more, see below.

"- the paper needs to be shortened; ST doesn't warrant 15 pages. I think there is a lot of potential to remove irrelevant sections."

This has been done sufficiently. The paper now has 12 pages. Much better like this.

SPELLING, GRAMMAR, etc.:

As mentioned above, the paper has been improved in this respect. I found a handful of new errors, which should be corrected:

p1: "we describe the original application" - this should still be "an original application". Otherwise it sounds as though there was only one original application for knowledge acquisition and management, and it's Semantic Turkey. Surely you don't mean that.
p1: s/Following from definition/Following from the definition
p8: s/which accepting further messages/which accepts further messages
p9: "based on a proper combination" - what do you mean by "proper" here? Just change to "based on a combination"
p10: s/without need to/without the need to
p10: s/through a real robust/through a robust
p10: "over working ontology" - what do you mean? rephrase, please
p10: s/basilar RDF/basic RDF - I really don't think "basilar" is a word that can be used in this context
p11: s/expecially/especially
p12: "Gobbleing" - that's still wrong (but maybe it's the original title of the paper? If so, you could add a [sic])

Tags: