Review Comment:
As already said in my first review, I like the idea, and I think it is a novel and interesting approach to label numerical columns of tables on the web.
I still have two major concerns (detailed below); however, I suggest a “minor revision”, as I see many improvements (and as I know that the two-strike rule would rule the paper out).
1) Type detection heuristics & evaluation:
Still, I am not convinced by the detection of some of the types, and its evaluation. Your algorithm for the nominal hierarchical numbers is: “the numbers have the same number of digits, and they fail the sequential test; hence, they will be considered hierarchical.” Based on this detection, the “hierarchical” class could be any list of numbers with same length (e.g. years which you exclude manually in your experiments).
However, the example of a hierarchical type that you give in the paper is quite complex, and I wonder if this complex hierarchical type even exists in datasets; even more, since you did not report any hierarchical or categorical type in the T2Dv2 dataset.
Given the missing types in the dataset, the evaluation in Table 8 and Table 9 is not really broad and balanced: the sequential type is based on 1 column, the ordinal on 5 columns. So you basically only consider and test the “count” and “other” types?
Also, you do not discuss the precision and recall results of the “other” sub-type. I wonder what kind of types are in this “other” category? Do they belong to one of the other types? Or are these results indicating that there should be other (sub-)types for numeric columns?
2) Quality of writing:
The writing clearly improved in this version, however, there are still some misformulations, and also the organisation could still be improved. For instance:
- 6.4: The description of the detection algorithm is not very clear and should be reformulated and better organised.
For instance:
“because it *is* the most restrictive”,
“For the second one, it should be one of the sub-types that checks for equal digits” -> which second one?
“For the fourth, we check if it is hierarchical.”
- in the conclusions: “In this paper, we introduce a typology of numeric data taking into account the task of semantic labeling. We show that taking into account the typology of numeric data and using such information to perform semantic labeling results in better performance.”
- evaluations: While you split the paper in various sections and subsections in other parts of the paper (e.g. the very short section 4), you could restructure the evaluation of the type detection and labelling. At the moment both evaluations are in the same subsection and the result discussions can get a bit confusing.
|