Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

A Classification of Semantic Annotation Systems

Submitted by Pascal Hitzler on 11/15/2010 - 09:22

Paper Title:

Authors:

Pierre Andrews, Ilya Zaihrayeu, Juan Pane

Abstract:

The Subject-Predicate-Object triple annotation system is now well adopted in the research community, however, it does not always speak to end-users. In fact, explaining all the complexity of semantic annotation systems to laymen can sometime be difficult. We believe that this communication can be simplified by providing a meaningful abstraction of the state of the art in semantic annotation models and thus, in this article, we describe the issue of semantic annotation and review a number of research and end-user tools in the field. Doing so, we provide a clear classification scheme of the features of annotation systems. We then show how this scheme can be used to clarify requirements of end-user use cases and thus simplify the communication between semantic annotation experts and the actual users of this technology.

Full PDF Version:

swj123.pdf

Submission type:

Survey Article

Responsible editor:

Krzysztof Janowicz

Decision/Status:

Reviews:

Resubmission accepted by editor.

This is a revised manuscript following an "accept with minor revisions." Reviews of previous rounds can be found below.

Review 1 by Jérôme Euzenat

This is fine except that the correction seems to have been made in quite a hasty way:

(1) "Schema.org, started in 2011, is a demonstration of how this approach is reaching a mature stage, with big industrial players collaborating towards a standardization of semantic annotation within HTML as part of a W3C work group"

Where does this come from? Industrial players (and these ones) in W3C started more than 10 years ago. While Schema.org has been launched totally independently from W3C (which working group is meant here?). So, I am quite shocked when I see things written this way!

(2) Same thing with taxonomy. It is good to come back to the greeks except that there are no reference to greek works with this term. In fact, these terms have been created in the end of 18th-beginning of the 19th century, for classification of animals. So, it is not a term known from the antiquity. This is a term that has been forged for a particular purpose.

Contrary to what is written, "taxon" does not seems to be a greek term. The greek root is "taxis". It even seems that the term taxon itself has been created later in the beginning of the 20th century, but I have no firm evidence.

Using the term taxonomy for a part-whole structure is simply following a slope in which whatever vaguely resemble a tree can be considered a taxonomy. I know that there are papers in "Information sciences" about whatever, but I cannot consider them as serious, just like citing your paper as an authority about that topic would not be serious. When people use precise terms with a precise meaning, it is always bad, at least in scholar work to use them with a sloppy meaning.

"Natural scientists use the Greek term "taxon" (instead of "term"),": Although natural scientists care a lot about terms --they have very standardized rules for naming things but this is out of topic here--, they do not use the term taxon for anything related to a term. Natural scientists use the term taxon for denoting sets of individuals and a taxonomy is a hierarchical arrangement of these taxons such that a taxon is superior to another if its population contains that of the other. This is a quite precise definition, but beware, it takes some studies in order to know that this indeed applies to some taxonomies mentioned below. This is common to all taxonomies although there have been a lot of change since Linneus: from the arrangements of actual or fossil individuals by similarity (phenotypic classification) has been substituted a classification by common genetic ancestors (genotypic, then cladistics; although the latter uses the term clad, this is still a taxonomy as described above).

So the problem is that the more lines are added to this paragraph, the more confusion they add. Writing that "From a very precise meaning in systematics, the term taxonomy has spread in various domains including Information sciences with a somewhat broader meaning" would be something more exact and precise. However, calling an organisation of terms a terminology or a thesaurus would seems quite a nice and better way to do it. It would look less "scientific" but this is what it is.

The reviews below refer to previous versions of the manuscript.

Review 1 by Jérôme Euzenat

This paper is now mostly an analysis of the state of the art in online annotation systems through a particular classification. This classification is based on three dimensions: annotation structure, vocabulary type, collaboration. Then, a methodology for applying this classification to the design of annotation systems is provided and its application to a use-case is presented.

Most of the comments of my previous review had been implemented and I think that this has improved the paper on the completeness and readability part.

Hence, I think that the paper can be accepted as it is, pending minor comments below.

There are still three points on which I do not agree with the standpoint of the paper but they are not related to the key claims of the paper.

The first one is the loose use of the term "taxonomy" even backed up by bibliographic references: ignoring that this term has been forged with a particular setting in mind, frozen in the term "taxon" does not seems appropriate to me. I am sorry but I just disagree with the cited papers (41 is not a proper bibliographical reference by the way) that they use the term properly, especially that other terms are available.

The second one is the use of the term "controlled" on page 17 mostly. At least, the paper explain what is meant by "controlled" in this context because personally I do not see anything as controlled, so I think that a better adjective should be used.

Finally, I have not been proved wrong that the pros and cons view is biased in the logic vs. tag debate: that people find difficult to interpret description logics does not show that tags are better. Tag clouds do not express the same thing and, as previously mentioned, I have to see some experimental evidence that dealing with 20 millions of tags is easier. So I would be more cautious about the conclusion drawn and I do not think that they are backed up. (if the claim is that people feel more comfortable or less intimidated, then I would accept it better).

Minor comments:
- p1 (still page numbers start at 0), 1: structure outline: it is still mentioning three use-cases
- p1 2.1: free-from -> free-form
- p2 Jan 08 -> January 2008 (harmonize)
- p4: Connotea[ -> Connotea [
- p5: "a term coined at the begginin" ??
- p7: require some level -> requires some level
- p8: Ontology's -> ontology's
- p8: At the microformat section, it seems now necessary to mention schema.org
- p14, 4: the two approaches that are supposed to be distinguishable are not disjoints as maybe one user may design the annotations and a community of users design the vocabulary, and then the system would satisfy both conditions. It would be better to improve this.
- p14: "seen Table 2, the in the" ??
- p18: this field( -> this field (
- p16: Cudré-Mauroux still suffers from bad encoding

The reviews below refer to previous versions of the manuscript.

Review 1 by Jérôme Euzenat

In my former review, I had insisted on three problems with the paper:
- scholarship
- use cases and methodology
- disagreements

The new version of the paper has indeed been improved. Many small
problems that I had before with the paper do not exist anymore; the
language has improved and the paper is far clearer. Hence, the paper
is easier to read.

As a consequence, the "disagreement" part of my previous review has
been mostly addressed. The added clarity to the paper reveal a couple
of new issues (to be mentioned below), but I am sure that they can be
solved satisfactorily in the same way.

However, I do not think that the two first issues that were raised have been
addressed satisfactorily. So, let consider them again.

The authors made an effort to cite more papers, but this does not
change the paper: I would expect a discussion about what is in papers
about semantic annotation systems, or maybe non semantic annotation
systems. This may boil down to saying that no one ever try to classify
such systems. I am fine with it, but the authors should write it,
i.e., situate their work within the "academic" state of the art in the
domain. If this is not done, there is no point at publishing a paper.

The second point, also raised by another reviewer, is the status of
the use cases and methodology in this paper. The revision of the paper
brought changes to this part but did not contributed to clarify the
issue. Both reviewers offered different solutions for solving the
problem by reorganising the paper (or its focus). None of them had any
impact on the restructuration of the paper. With respect to the
methodology, in the new version, it seems that this classification has
been used for shaping the use case analysis, but there is no attempt
at showing that this is an evaluation of the classification. This is
only presented as a fact.
The argument in the answers and briefly in the paper is that the
classification allows to present the alternatives without going into
technical details. But this seems a bit weak: I do not see why it
would be necessary to go into technical details if it is not
necessary. The answers also mention that the use case are only an
illustration of the use. The problem is that as an example it is too
heavy: illustrating with one use case, maybe, but why 3? Moreover,
these are not three use-cases, but one: the classification is used in
the same way the three times, so there is not point in detailing them
all.

So, in the end, I feel that the paper has been improved after the
first round of reviews, but these improvements have been quite
superficial, not taking deeply into account the main issues raised and
not really discussing them. So, my opinion on the paper remains the
same.

One of the point is that to some very important remarks, when I ask
for a discussion (this is the place in a paper), the authors offer a
comment. For me this is not sufficient. The issue of independence of
the dimension is the occasion for an interesting discussion.

Amongs the additional problematic points in the new version are:
For each categories, pros and cons are given. However, sometime they
are given to a category but also apply to others. This is a problem
because it gives a biased view of these pros and cons:
2.4: "Proposing a large number of entities... is an issue." it is an
issue for ontologies as well as (if not even more) for tags. "billions
of individual entity descriptions" raise a problem of scalability: yes
but how does it compare to "20 million unique tags"?
4.1: "Users have to remember which terms they normally use" is a con
for any annotation system, and it is arguably more a con for the
other category than for the "Single-user (private use)" category.
4.3: "Social annotation approaches could suffer from biases that
express personnal likes or dislikes of annotators" applies to any
human based annotation system.

I am a bit unhappy with the use of "taxonomy". This term indeed comes
from a very precise field with quite some distinctive features of what
a taxonomy is (a taxon is a classes representing a set of individuls
and the taxonomic relation complies to set inclusion, even cladistic did
not change that). It is not a "vocabulary" and broader and narrower
terms are relations in thesaurus, not in taxonomies strictly
speaking. Even the "geographical taxonomy" of geonames is not a real
taxonomy. It would be good to be somewhat precise in the vocabulary
used.

A new term, "expert", is given prominence in this new version. It is
not very clear. I can see two kinds of experts: domain experts and
"classification expert". These are two totally different things (to the
point that a lot of work about Knowledge acquisition in Expert systems
postulated that a domain expert was totally unable to provide a
correct classification. This postulate does indeed hold to some
extent). It would be good to clarify this in the various places this
word occurs: there are even some places in which this expert is both a
domain and classification expert (and such things exist: those
taxonomist in biology are indeed that). This discussion is needed in
the paper especially for section 4.2 in which the expert is an "expert
in the field" in the first sentence of the "Pros" and "expert in the
annotation task" in the second sentence.

The long discussion about the "recommendations" in the conclusion
should have appeared before, as a recommendation section. It seems
here that these recommendations are a consequence of the content of
the paper. This is not true. This is new material brought
in. Indeed, these recommendations do not follow immediately from the
content of the paper. If the authors think so, they should clearly
show how. Moreover, it is quite subjective and arguable. So, it would
be good for a "discussion" section.

Details (apply -1 to page numbers, this paper starts on page 0):
- p1: In addition the problem -> to the problem
- p2: "this generalisation is extremely low level" -> be precise
please
- "three dimensions: - the level structural...": "levels" are supposed
to be ordered, "dimensions" to be somewhat independent, please use the
adequate term and only this one.
- "the beginning of the spectrum" -> "one end of the spectrum" (maybe
not)
- p3: require nearly -> requires nearly (in fact rather "entails" than
requires)
- web semantic technologies -> semantic web technologies
- p5: allows its users -> their users
- p6: early 2000s -> the early 2000s (would rather use begining of
this century)
- p9: is FOAF an application? Not really.
- PhotoStuff does allow the creation of instances, PicSter allows the
creation of classes.
- p10: at the annotation time -> at annotation time
- p11: what is described at the beginning of 3.1 is typically the
situation in information retrieval.
- p14: "vertebrate" vs. "invertebrate" split. There is none. To the
best of my knowledge, "vertebrate" is a taxon name, not "invertebrate".
- p15: in the table, the "Single users" has two subcolumns with the
same values. This tells the these subcolumns are not useful.
- p16: "there is little point on studying how interaction between
users affect the vocabulary" on the contrary, I see many points to
study this, especially as the vocabulary is free!
- this Section -> this section.
- In this model -> In this model,
- the quality and coverage ... cannot be guaranteed: any evidence of
this? How one could say so?
- p18: biased is used but I would rather have used "personal" than
"biased". For instance, "to read" does not denote any bias, even "best
paper ever" denotes and opinion rather than (or before) a bias.
- p19: "research oriented": what does this means?
- "as it is quite abstract and distant from what the normal user may
want to achieve": again, any evidence of this? This would require some
serious studies.
- p21: "awareness of ... of " -> "awareness of ... for"?
- p23: will provided -> while provided
- The use cases are not clearly presented in the sense that it is
difficult to understand what is the use of these systems, what is
their contexts and what they will be used for: everything is described
from the "annotation platform" standpoint and we do not know if it is
in an aircraft design workshop or in a fishery. An "online virtual
world" can be anything and I have trouble to understand why someone
would go there and annotate anything (I have never annotated a virtual
landscape in my life).
- p25: a classification scheme -> a classification
- While the users -> While users
- p26: "controlled vocabulary is evolved dynamically by the users":
as stated, this is exactly what a controlled vocabulary is not!
The bibliography is not homogeneous with all authors first names/
- [35] Philippe C. Mauroux -> Philippe Cudré-Mauroux
[53] "to Springer Verlag"?
[57] To be updated to http://iospress.metapress.com/content/4164891n48p5v826/
[60] MCIlraith -> McIlraith

Review 2 by Sebastian Tramp

comments and corrections from the first round regarding my points are accepted from my side.

line spacings:
There is a problem with different line spacing in this paper, so that the text looks scruffy. this includes page 21 (extreme difference between columns) but also line spaces in tables (table 2 vs. table 4)

Table 3 is maybe better rendered landscape oriented.

This is a revised resubmission after a "Reject, revision encouraged". The reviews below are for the original submission.

Review 1 by Sebastian Tramp

In general, the goal of building a classification of tagging and semantic annotation systems (the title is a little bit misleading and should be clarified) is desirable and worth a publication in a format of an article of the semantic web journal.
The used classification-facet choice is reasonable, however I disagree in some details or have to make additions (see below).
The choice of the presented use-cases is the main critical point of this submission. The focus of these use-cases (which derive from the INSEMTIVES project of the authors) is imo very different from the focus of the studied projects (delicious, flickr, faviki, youtube, ...). E.g. Corporate tagging applications have other requirements than public tagging applications for personal use, so the conclusion is biased in this direction. I suggest to either restructure the paper and give it more a focus on these use-cases (and loose the general approach) or enhance the evaluation section (the use-cases) and classify more than these three applications (esp. applications where you are not involved) to give some kind evaluation for your classification scheme. In my opinion, the latter would be a greater benefit for the community.

In particular, I have some specific annotations on different topics (content, style, typos, literature suggestions) and different granularity, which I will write down per page:

p0:
- The triple annotation system you mention is mostly often called SPO, subject - predicate - object, not OSP (p0) or SOP (p1,2)
- I wonder why you do not arrange the Fig1 horizontally. I know it makes sense in the context of all the other images of the same kind, but it looks wrong and confuses the reader

p1:
- SOP, see p0

p2:
- The term web 2.0 user is diffuse here
- SOP, see p0
- What is Youtube2009?
- The delicious tagging model is not that simple, they have system tags (eg.system:filetype) and tag bundles (allowing to organize your tags into groups). There collaboration model is also not correct described (see p15)

p3:
- Fig 3 and 4 should be combined to save space and have a more coherent text

p4:
- the difference between attributes and relations are are blurry on the technical level (e.g. with system tags)
- attributes can be clearly seen as (S)PO triples thus allowing ontology creation (in theory)
- please clarify the differences between the attribute, relation and ontology model

p5:
- amassive -> a massive, aresource -> a resource
- instead of [8] I would refer to T.R. Gruber 1993 here

p6:
- Fig 8 and 9 should be combined to save space and have a more coherent text

p7:
- is key -> is the key
- form an ontology -> form an ontology
- LOD for -> LOD cloud for (but LOD is not introduced at this point, so I would skip this)

p8:
- The referred project is called Semantic Media Wiki (SWM) (an extension of the media wiki, which runs the wikipedia). SWM is not used with wikipedia and is not an extension of wikipedia as you write
- also relevant here is WebProtege
- the current three apps here are described in different ways so that they are not comparable to each other (e.g. queries can be executed by all three apps but it is mentioned only once)

p9:
- Fig12 is a four years old screen shot, please add a current one

p11:
- I suggest a niceup of the table according to http://www.ctan.org/tex-archive/macros/latex/contrib/booktabs/booktabs.pdf

p12:
- Faviki uses dbpedia resources (and does not extract the content itself)

p13:
- the line height in the last paragraph is higher than normal

p14:
- o publicly -> on publicly

p15:
- I suggest to split the collective annotation category into one subcategory where annotations are modified together and one where personal annotations are merged but users can only modify their own annotations (as in delicious)

p17:
- no reference for the INSEMTIVES project

p18:
- semantic wikipedia -> semantic media wiki

p19:
- Fig18 too large (for the given information content)

References:
- also relevant: Milorad Tošić and Valentina Milićević: The Semantics of Collaborative Tagging System

p20:
- maybe relevat: Maria Maleshkova et.al: Semantic Annotation of Web APIs with SWEET

p21:
- line height
- footnote "athttp" -> "at http"

Review 2 by Jérôme Euzenat

This paper comes in two parts. An analysis of the state of the art in
online annotation systems through a particular classification. This
classification is based on three dimensions: annotation structure,
vocabulary type, collaboration. Then, a methodology for applying this
classification to the design of annotation systems is provided and is
applied to three use-cases.

The main contribution of the paper is, of course, the particular
proposed classification. I think that this classification makes a lot
of sense in spite of a few fixable disagreements that I have with it
(see DISAGREEMENTS). The system analysis properly illustrates its
relevance. This is a piece of work that deserves publication.

However, for a journal paper, it feels that this classification should
rely a bit more on published annotation models which are
available. This can be for referring to them in the proposed
classification or to criticise them (see SCHOLARSHIP).

Finally, the use of the classification in the methodology is
interesting because it shows one particular usage of such a
classification, but the status of the methodological section should be
clarified (see USE CASES AND METHODOLOGY).

In addition, the dimensions are of the classification are presented as
independent. However, there are some places in which this may be
discussed, so a discussion about the dependencies between the
dimensions or their actual independence would be welcome.

SCHOLARSHIP

The paper is based on reviewing systems, instead of scholarly
papers. This is fine as soon as the scholar literature is reviewed as
well. Indeed, when providing a classification, it is obviously
important that all previously published material is taken into
account, even if for criticising it.

I do not know well the literature on classical annotation
systems (it is not cited in this paper either), but concerning
semantic annotation systems, the topic in which this journal may be
interested, the state of the art is pretty short.

Indeed, there has been high interest in annotations since the beginning
of the semantic web. Several workshops have been held [1] and even a
book has been published on the topic [2]. It would be interesting that
this paper refers to it. I know that half of these papers are
concerned with automatic annotations but there remains the other
half.

Concerning something that I know better, I was surprised to not seeing
mentioned systems I knew, and for which I had been able to find
papers, and which would help illustrate some aspects of the
classification. It happens that I have worked in that area, from the
standpoint of p2p sharing of annotation, and when considering existing
systems, I found that PhotoStuff [3] was a good starting point to
semantic annotations, to the point that we extended it in our PicSter
system [4].

The difference between the two systems is that:
- PhotoStuff has a fixed ontology, PicSter offers to import
ontologies;
- PhotoStuff does not allow for modifying ontologies, PicSter does;
- finally, PicSter aims at working collaboratively.
I will come back later on the relevance of this.

So, in summary, it would be useful if the findings of this
classification took into account the existing literature. These are
only examples; I am not particularly knowledgeable in annotation
systems.

USE CASES AND METHODOLOGY

My second main concern is about the status of the methodological
part. What would be necessary is an explanation of this at the
beginning of Section 5. By status, I mean "what is this section
supposed to prove/show/illustrate".

This is a very difficult issue and I think that this should be better
discussed in the paper: are the use cases only yet another
illustration of the classification and an illustration of the
methodology (then it would be better to use only one).
Are the use cases here to validate, as far as such a methodology can
be validated, the classification and methodology. In which case, a
validation criteria should be defined. That the methodology applies is
a relatively weak criterion (because, usually, when you make a
methodology applying, it applies and because many other methodologies
may apply as well). That the project is a success would be in my view
stronger (because it shows that the methodology does not harm the
development of the project).

In the introduction of Section 5, it reads like if the methodology was
ready and was applied to the use cases. Then in step (2) of the
methodology, it is written that Section 2, 3, 4 are the result of this
step 2. So it seems that the classification was not preexisting. A
question that comes then and which is not answered in the paper is:
where the three use cases run together? Are the classification built
with one use case used with another one? Has it changed? So isn't the
proposed classification too dependent of these use cases?

So, the question that the reader has is: do the use cases led to the
methodology and classification, in this case, they justify it but not
"validate" it, or was the methodology and classification
pre-existing? And if the classification was pre-existing, did it
framed the development of these uses cases? (so, there is little
surprise that it fits).

The methodology itself is odd because it defines a common model (step
4) and an extension to this model (step 5). This means that this is a
methodology which has to be applied to several use cases together. But
despite the particular European project concerned, what would such a
methodology used for?

I am not arguing that more work may be done with these projects and
the methodology. I am asking that the authors make clear from the
beginning what is expected from the presentation of the use cases, and
how this is achieved.

Independently, it may be good to show on which aspects the methodology
prove useful in the particular use cases (the point is made that the
classification allows to have a common understanding between system
designers and costumers.

DISAGREEMENT

As I said, I think that the proposed classification makes a lot of
sense. So the disagreement below are only minor disagreements, but it
would be at least useful to discuss them in the paper.

I am not sure where to put parts of the issues I raise below in the
classification. The only thing I know is that they matter. But this is
also a sign that it would be worthwhile to provide at the end of the
introduction a sharper description of what the dimensions of the
classification are.

* Users... the term user is used freely within the paper. There are
obviously two types of users: annotation producers and annotation
consumers which use the annotations for retrieving information
(mostly). The producer are most often consumers but not necessarily
the other way around. Web 2.0 technologies are promoting the model of
the consumer-producer but in fact, most of the wikipedia users are not
wikipedia producers (same for other sites). It should be better to
make this distinction because some statements which are valid for some
kinds of users and not valid for others

In Section 5.1.1, comes yet another kind of user, the web service
provider who annotates its web service.

* One particular distinction which is not made in the document is that
between fixed ontology and open ontology that the user may
evolve. It happens that most current tagging systems are open (new
tags may be added by users) and controlled vocabularies are considered
in the classification, but this is not necessarily so.

It seems that the controlled/open nature of the vocabulary is
independent from its formalisation. It occurs that most simple tagging
systems are open and that controlled vocabularies are usually
taxonomies, but it is perfectly possible that a controlled vocabulary
is only a list of terms without structure and that a tagging system is
closed. And indeed the problem occurs with ontologies as the
comparison between PhotoStuff and PicSter above shows.

This is also visible on p9 column 1, in which it is argued that
finally, "controlled vocabularies" and "ontologies" are similar
because they have a similar structure (in spite of differences with
respect to the "controlled" aspect). In fact, they are different
because one is necessarily controlled and not the other, and because
the other is necessarily structured and not the one. So, it may happen
that they coincide, but not as a general rule, so using them
interchangeably is quite non rigorous for a work on classification.

So, in summary, it feels that the controlled/uncontrolled aspect
of the tag vocabulary is an independent dimension and not something
related to its type.

* It is strange to have an "ontology" item in the tag dimension, since
it is rather relevant to the vocabulary dimension. This may be a
matter of dependency between dimensions? I think that "linked data"
would be a better name than "ontology" for this section.

DETAILS

The Araucaria project cited on p4, can be considered as not having
passed the test of time if all that it has produced since 2001 is a
research report. This is not a judgement on the quality of the work,
only a surprise.

Try to cite published papers instead of technical reports.

I am surprised that foaf, is so little cited in this paper.

The work on RST is cited very prominently while it is not that
influential in the domain. It is rather used in a niche project.

p7. LOD is mentioned without having been introduced (and incorrectly,
as "in the LOD").

p7. co-occurrence of annotation is mentioned here in the "ontology"
part while it in fact applies to all aspects (co-occurrence is
certainly more adapted to unstructured tags). The paper is not very
disert about statistical methods. I am fine with this, but then they
should not be arbitrarily stuck to one category.

In the pros and cons descriptions, the paper could be a bit more
cautious. For instance, when the query is Java, this is difficult to
disambiguate, but if it is "java late-binding 1.5" or "java resorts",
it is quite easy and many systems do it properly. Again, it is
mentioned that users have to remember the terms they use, but this
can be well performed by tools: the Shoebox figure right above the
statement shows that this is possible (well if the considered
ontologies were more rigorous).

There are several statements implying that WordNet is a controlled
vocabulary, which it is not (as any serious linguistic work). WordNet
is supposed to reflect a particular state of a language. As such there
are no control of the terms to use (this is especially true in the
choice of synsets and not a preferred term and synonyms). The curator
function of the WordNet creators is the same as any dictionary
creator: determining if a term is in the language and if two senses
are distinct or not. But the referee here is the language, not the
creator. This is different in controlled vocabularies in which the
referees are the vocabulary creators.
In fact, this is the use of WordNet as a controlled vocabulary which
raises all the mentioned problems.

p16, col 2: "building the controlled vocabulary in a bottom-up fashion
[...] increasing steadily the precision and recall [...": it is
necessary to provide evidence of this (Does the "steadily" comes from
actual figures?).

p16, col 2: the lack of formal semantics is attributed to the
"collaborative annotation" (the section in which we are in the
collaboration dimension). This is not correct, it is perfectly
possible to consider collective formal annotation. So, either it
should be argued that this is the case, or this statement should be a
"con" of non-formal vocabularies, but not of collective
annotation. This seems to be a dependency between dimensions.

"syntags (as opposed to synsets..." you mean "similar to"

REFERENCES

[1] A few workshops:
http://semannot2001.aifb.uni-karlsruhe.de/
http://km.aifb.kit.edu/ws/semannot2003
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-289/
http://saakm2008.semanticauthoring.org/
Note that in Information retrieval, there has been also 3 such
workshop:
http://www.sics.se/jussi/esair2010/esair.html

[2] S. Handschuh and S. Staab (eds.), Annotation for the Semantic Web, IOS Press, Amsterdam (NL), 2003

[3] Halaschek-Wiener, C., Golbeck, J., Schain, A., Grove, M., Parsia, B., & Hendler, J. (2005). Photostuff - an image annotation tool for the semantic web. In Proceedings of the 4th International Semantic Web Conference poster session.

[4] Jérôme Euzenat, Onyeari Mbanefo, Arun Sharma, Sharing resources through ontology alignment in a semantic peer-to-peer system, in: Kalfoglou Yannis (ed), Cases on semantic interoperability for information systems integration: practice and applications, IGI Global, Hershey (PA US), 2009, pp107-126

TYPOS

p1, col1: Despite the problem -> In addition to the problem
"the level structural complexity" ???
col2: classification scheme -> classification
be provide -> be provided
a full-fledged ontologies -> ontology

p2, col2: the system is queried: please tell where the user is here
(and see the distinction between different users)

p4, col1: SpotLight -> Spotlight (this is more generally implemented
in Unix as xattr -extended attributes- and used by Spotlight).
Facebook is reported in both 2.2 and 2.3
very big -> large

p5, col2: I am not sure that the "definition" attributed to [8] does
not come from elsewhere.

p7, col2: LOD not introduced and incorrectly used

p9, col1: the discussion starting with "Controlled vocabularies
... bottom-up fashion)." is strange. It means that there are
differences but that it does not matter. It does not even say that
there are commonalities.
In the end this page tells us that no distinction is drawn between
"folkosomy" and "controlled vocabularies" which is used interchangeably
with "ontology". Usually, the work on classifications aims at
clarifying things, not rendering them more obscure.

p11, col1: the three problems seems to be four.
It is strange to see the trade-off between annotation time/query time
in this little corner: this is a very general problem that should have
been reported before.

p14, col2: o publicly -> or publicly
"the single annotator might not be an expert in the annotation process
and might miss relevant..." the status of the "and" is unclear. I read
it as "and therefore" instead of "and in addition". Then the question
comes from the place where this come: does the community is
necessarily an expert? So why is this item here?
Finding the desired resources may be helped by tools, again see the
Shoebox

p15, col1: "Currently, most web browsers such as". No need to name them, can
you cite one you know that does not have bookmarks? That's rather
"Since the early years".
col2: "Another issue... intentions" here we also have three types of
"users"

p16, col1: "In addition...annotation schemes" this is not specific to
collective annotation: as soon as there is enough data, statistical
methods may be used, so this should be put somewhere else.

p17, col1: Flikr has already been introduced, no need to do it again.

p20, col1: this features -> these features
col2: but not exclusively: put at the end of the sentence

p23, col1: or richer knowledge structures -> or a richer knowledge
structure
col2: requirements for new annotation models -> requirements for annotation models

Tags:

Reviewed

Log in or register to post comments
38731 reads

Comments

References Missing

Permalink Submitted by Sebastian Hellmann on 01/23/2012 - 14:03.

Please recompile the Bibtex and replace the PDF.
There are no references in http://www.semantic-web-journal.net/sites/default/files/swj123_4.pdf
and also a lot of [?] .
I hope this is the right field to post a comment...

pdf recompiled

Permalink Submitted by Pascal Hitzler on 02/02/2012 - 06:27.

thanks for the notice. I've updated the pdf.

Pascal.

Main menu

Editorial Board

Syndicate

A Classification of Semantic Annotation Systems

Comments

References Missing

pdf recompiled

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

A Classification of Semantic Annotation Systems

Comments

References Missing

pdf recompiled

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles