Review Comment:
Overall evaluation
Select your choice from the options below and write its number below.
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject
0
Reviewer's confidence
Select your choice from the options below and write its number below.
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)
3
Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
5
Novelty
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4
Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4
Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present
3
Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
2
Review
Please provide your textual review here.
This paper asks if it is possible to predict "concept drift" in classification taxonomies such as can be represented in RDFS/OWL, SKOS, etc. Concept drift generally refers to a change in the meaning or usage of a concept over time (e.g., between versions). The authors motivate an analysis of concept drift as being important in the maintenance/versioning of a taxonomy. Three aspects of concept drift are considered: the labels of the concept change, the intensional definition of the concept changes where properties used to define the concept change (e.g., skos:broader, rdfs:subClassOf, etc.), the extension of the concept changes where the number of instances changes. Drift occurs in dynamic knowledge-bases. The authors consider two such types of dynamic knowledge-bases: closed knowledge-bases where changes have occurred in the past but the latest version is now static, and open knowledge-bases where changes are on-going. For both types of knowledge-bases, the authors consider the refinement of concepts by looking at the "coherence" of intermediate versions with respect to previous versions. For open knowledge-bases, the authors consider the problem of predicting which concepts will drift. For the tasks of refinement and prediction, the authors propose a machine learning framework where older versions of a knowledge-base and used to train a classifier that is tested on the next version; a newer version that the test dataset is used to verify predictions. The authors use the training, test and evaluation versions as input for a WEKA machine-learning phase. Features such as the descendants and ancestors of the concepts at various levels, number of instances, number of instances including sub-concepts of a certain depth, etc., are considered. Threshold-based feature selection is then applied and models trained. For evaluation, the authors consider two datasets: the first is the DBpedia ontology with 8 versions spanning from 2009 to 2013 (an open dataset); the second is CEDAR, a historical Dutch taxonomy of occupations for censuses, with 8 versions spanning from 1849 to 1930 (a closed dataset). Refinement is evaluated for both datasets considering a varying number of versions. Prediction is evaluated for DBpedia considering all but the last version.
The paper tackles an interesting area. Being able to predict which concepts are the most likely to change has consequences for versioning and maintenance of the knowledge-base. I like the general questions that the authors raise and the direction of the work. The paper is on-topic for EKAW and generally well-written. Likewise, evaluation seems appropriate for the questions raised.
My main concerns with the paper -- holding me back from making a more positive recommendation -- are (i) the coarse nature of the core notion of concept drift, and (ii) a lack of some crucial details.
With respect to the coarseness of concept drift, as defined, it seems that although the intensional, extensional and label similarity measures map to a real value between [0,1], when instantiated, they pretty much become binary values. Namely, my understanding is that a concept drifts intensionally if some part of the definition changes, it drifts label-wise if a label changes, but most problematically, it drifts extensionally if the number of instances changes. Focusing on the extensional drift, all of the instances of the classes could change but as long as the cardinality remains the same, the concept is not considered as drifting extensionally; one could validly argue that in practice, this is probably not such a major concern since if the instances change, it seems likely the cardinality will too. A larger concern is that even a change of one more/less instance in a class with a potentially very large legacy extension will be considered a drift, and thus the same as a class with an arbtirary-fold increase or decrease in instances. Likewise, with such a coarse notion of drift, I'm left wondering what concepts would not drift: if a change of one instance more or less is considered a drift, I could only imagine that some pretty obscure concepts would not be drifting in a dynamic knowledge-base.
This brings me onto my second concern, which is that the paper lacks in some crucial details. For example, I still have little idea what the goal of "refinement" is. Perhaps I've missed something but all I've found in the paper is a statement to the effect: "The purpose of refinement is to check the coherence of an intermediate version with respect to previous versions." But I cannot find where coherence is defined (and I'm guessing, e.g., it's not the same coherence as defined in the ontology debugging world). Likewise the goal of refinement is not intuitively obvious in the context of concept drift; I'm struggling to even hazard a good guess at it. Half of the evaluation presents results on "refinement": again, maybe I miss something, but at the moment I can't say what it is. It's a similar problem with "prediction": prediction deals with concept drift, but is there a particular type of concept drift that is predicted? Or is the prediction merely that there is a drift of some kind? If so, I could imagine that lots of concepts would quite naturally drift by at least one in the number of instances. Hence I would have been interested to see some raw numbers on the actual number of concepts that drift (per the different types of drift) across the versions. Likewise, I think lumping the different types of drift together would be rather coarse since different types of drift have different consequences and may be more/less predictable than others.
A third but more minor concern is what predicting label and intensional drift mean in practice. The party most likely to be concerned about concept drift is the maintainer of the taxonomy ... which is the same party in control of changing the labels and the concept definitions. Hence it is like they are being told in advance what they are going to do. I suppose someone from the community could use prediction to see which concepts are likely to change in the next version, but emailing the maintainers would seem more prudent. I'm a bit confused by this.
In general, it's an interesting topic and the author have some interesting data to play with and evaluate. I'm quite torn on the verdict. On the one hand, the problems with the paper seem to be ones of clarification that could be easily resolved; on the other hand, the details I'm missing feel quite major in that I do not have the whole picture to make a judgement: things like not knowing what "refinement" actually means and how many concepts actually "drifted" in the versions and what notions of drifting were considered. As such, I'll leave my recommendation at borderline.
Again, I quite like the core idea of investigating this notion of concept drift in a Linked Data context. In terms of improving the paper, I think the authors should clarify the above ambiguities. Likewise, I think the authors should consider the different types of concept drift separately from each other in the classification. I'd also like to see some raw statistics on how many of the different types of concept drifts occurred between versions. I would also favour a notion of extensional concept drift that was more fine-grained than "number of instances changed/didn't change". Also, since the authors mentioned in the future work about considering other datasets, they might be interested in looking into the DyLDO snapshots: http://swse.deri.org/dyldo/. There's two years worth of weekly Linked Data snapshots to play with.
MINOR COMMENTS:
Section 1:
* "As concepts drift their meaning" -> "As the meaning of concepts drift(s)"
* "In 6" -> "In Section 6"
Section 3:
* Definition 2: Are two concepts really identical if they have the same rigid intensional definition? For example, it would seem that many sibling classes would thus be identical, no? In any case, the notion of rigidity is not used later it seems. Why is it introduced?
* "cusotimizable" Run a spell-check!
* "dataset which versions" -> "dataset whose versions"
* "dataset which updated versions" -> "dataset whose updated versions"
* Footnote 5: it seems owl:equivalentClass would be more appropriate than owl:sameAs
* "splitted" -> "split"
* "in (OBO/OWL) ontologies" -> this is a very sweeping statement, especially since an OWL ontology could be considered roughly as "expressive"/generic as Linked Data.
Section 5:
* "relations is a count of relationships connecting these concepts/classes": this could mean many things. Please clarify.
* "10-12" -> "10--12"
* "and temporally closed (1795--1971)". This contradicts the earlier statement that the "last 8 versions" of CEDAR are used since the latest version listed in Table 1 is from 1930.
* "under the ROC curve ."
* I don't understand the absolute figures in Table 2. What does 21 mean?
Section 6:
* "Figures 3 and 4 show ..." The discussion refers to Figures 3 and 4 and talks about different ML methods like NaiveBayes and MultiLayerPerceptron, but none of the results data have such information or distinguish the ML method used. That discussion is confusing.
* I like that you present some concrete examples, but the discussion on CollegeCoach is more confusing than helpful. It doesn't really help me see why "it is easy to see why membership and structural features are highly ranked for CEDAR and DBpedia, respectively."
* "leave" -> "leaf"
* "respectivelly" -> "respectively" (Also the word is, in any case, redundant)
|