Review Comment:
This paper describes the Ontologies of Linguistic Annotation which provide an OWL ontology describing terms used in annotation of language resources for linguistic research. Overall, the presentation of the ontology is good and the ontology itself provides a useful resource in the context of a number of projects that are building RDF representations of Linguistic Annotations. The paper falls down a little in describing the motivation for developing this OWL version of the existing GOLD and ISOCat vocabularies and in discussing how this ontology might be more sustainable than the earlier standards.
The following are some rough notes on the paper:
Introduction doesn't really motivate the problem sufficiently, not clear what problem OLiA is solving
There are already a few 'standard' vocabularies in this domain which are widely used, GOLD and ISOCat are cited. we're told that neither of these is suitable but not really why. OLiA is introduced as a third vocabulary that somehow engineers compatibility for applications with both of these other vocabularies and possibly more. But it's not clear how OLiA improves on the situation, adding another standard that needs to be maintained with no idea how it will be better than the 'community based' processes referenced for the other two. Clearly the author is in a good position to make changes for his work but how does this help a third party who now needs to decide whether to petition GOLD, ISOCat or OLiA to make a change or include a new concept.
In section 4 there's an example based on a query system. The author doesn't really explain the difference between the cases which are just querying for strings ("NX", "NP") and the OLiA case which is querying for a namespaced RDF term. In general there's not enough discussion of why RDF is being used or that the vocabularies being referenced are not RDF based.
In S4:
"employed in NLP pipeline systems and other NLP pipeline systems for tagset-independent, interoperable information pro- ceasing"
the second "NLP pipeline systems" is redundant. Also, this example is not explained at all, how is OLiA used in this context, are there alternatives?
Fig 1. is slightly confusing. The concept PDAT (stts:PDAT form the text) is show as instance_of stts:AttributiveDemonstrativePronoun which is_a olia:DemonstrativeDeterminer but in the text both of these relationships are described as superconcept/subconcept, which suggests is_a to me. Also PDAT is drawn as an ellipse whereas the higher classes are rectangles suggesting abstract classes. Are these higher classes conceptually different to PDAT - could they have a hasTag relation for example?
Also here there is a suggestion of a process for disambiguating a tag used in annotation via the string representation "PDAT" matching to a literal in the ontology. Clearly it could be the case that two distinct concepts from different vocabularies have the same string tag (this is the point of using namespaced vocabularies after all). So, is there another part to this process (presumably another input is the vocabulary that has been used for annotation). Is the annotation actually stored as a string tag rather than as a reference to stts:PDAT?
End of p3: "One application are ensemble combination ar- chitectures," - also it's not clear this is a good way to explore this kind of algorithm. POS taggers are generally trained on tagged data and generate the same tags on new data. One could re-train any tagger on uniformly tagged data to get that kind of output and then combine the different taggers in parallel as described. Doing it the way the paper suggests means you don't need to re-train the taggers; apart from avoiding this extra work is there a real benefit here?
P5: "Unlike a direct mapping approach, OLiA allows to recover informa- tion" -> "allows one to recover"
in the discussion of combining concepts in section 5 the author says:
"Many tagsets for part-of-speech annotation, for example, introduce hybrid categories to represent either conceptual overlap/fusion or ambiguity using OWL/DL constructs..."
which suggests that there are tag sets in use that are defined by OWL vocabularies. I'm not sure that this is the case and the example that is given (of Penn Treebank) certainly doesn't use OWL/DL. Is what the author is trying to say here that these vocabularies can be modelled in this way or that the OWL/DL way provides a more formal way to model this kind of variation? To clarify, the author here is describing the case where a tag in one vocabulary is not the same as that in another vocabulary but is the disjunction of two tags or the compliment etc. It might be good to see an example of the other approach that the author alludes to with Penn Treebank.
ill defined data categories: criticism of ISOCat but what process or property of OLiA would prevent the creation of a tag that was ambiguous from the point of view of some researcher or theory. So the comment is that this confusion was only found by formally modelling the vocabulary as an OWL ontology. I guess the question is whether this approach is amenable to the kind of person who is developing tag sets - that is could a linguist make use of these tools to discover issues like this or do they need to find a tame SW engineer to do it for them?
|