Review Comment:
This paper proposes to facilitate the exploration of linked data graphs by using anchor knowledge to create exploration paths along the subsumption relationship in the graph. The authors first provide a solid theoretical foundation of their approach from a human cognitive perspective using knowledge utility and basic level object theories. Afterwards, they review related work, pointing out potential issues and raising their research questions. They develop metrics, algorithms, and methods for identifying knowledge anchors, human basic level objects, and exploration paths. The experiment result shows significant improvement of their method in helping users explore the data graphs.
Originality: The idea of using knowledge anchors and human basic level objects is original compared with similar works for linked data graph exploration. However, the methods and algorithms implemented in the paper are primarily derived from existing approaches.
Significance of result: the result is significant using their approach
Presentation: The paper is well structured and easy to read. There are still many grammar and spelling issues. I suggest the authors thoroughly proof-read the paper again and fix these issues.
The upside of this paper:
1. The idea is original. The authors conduct a comprehensive literature review on related work including visualization, text-based semantic browser, identifying key entities in graphs, etc. Their work seems to complement existing works from a more cognitive-oriented perspective.
2. They provide detailed definitions of terms and algorithm of their methods as well as the context in which they apply basic level object theory and subsumption theory for meaningful learning in this study.
3. The experiment evaluation is well designed, including the experiment conditions, participant selecting and explanation, and methodology.
The downside of this paper:
1. Although the idea is novel, the methods they use are not original. Existing methods are used to fit their study, sometimes in a way that it feels like they are just inventing new fancy names of existing methods. I suggest the authors include more rigorous methods that more naturally combine the theory they propose and the existing algorithms they use.
2. A lot of heuristics are used, e.g. in section 6.3.2 hybridization of algorithms. The algorithms they use are rather simple and a lot of high-level rule-based tricks are used. I wonder how scalable and extensible their approach would be since the hierarchies and depths are quite different in different data graphs. The authors also point out this potential issues in Section 9.1.1 in the paper.
3. In section 6.3.2, the authors mention that the 3 homogeneity metrics have the same value. I'm wondering if this means that the measures they use are insufficient to capture the homogeneities. Maybe more complex metrics can be used? In the same section, they use Precision and recall in table 3. Would you consider using the harmonic mean of precision and recall, namely F scores? What about other metrics such as mean reciprocal rank, Normalized Discounted Cumulative Gain, etc?
4. In section 7.2, the authors mention that they use only one semantic similarity metric. While the authors argue that they need a hierarchy based semantic similarity metric, there are plenty other methods besides the one they use in the paper, for example, another edge-based approach is proposed by Leakcock & Chodorow in [1]. In addition, information content based approaches sometimes also apply to hierarchical data, for example, models proposed by Lin [2] and Jiang & Conrath [3], and the information content can be calculated using methods proposed by Sánchez et al. [4] and Seco et al. [25]. I suggest the authors explore more options when calculating the semantic similarity. It would be interesting and important to see if results provided by different semantic similarity metrics concord with each other.
5. For the measuring knowledge utility part in section 8.2, in Q2, the authors use questions that specifically ask about the categorical information. However, the data exploration path is also designed to help users explore the category information, so I'm wondering if this test is fair since random exploration in the control group does not have to explore on the category information. Is there a better way to design the questions and quantify the difference?
Typos and minor issues:
1. section 2.1.1, "their primarily focus on helping layman users...", should be "they primarily focus..."
2. section 2.2, "logs of keywords and Web pages previously entered visited...", should be "...entered and visited..."
3. section 5.2.1, "After tha, for an entity v, all members...", typo
4. section 8, "the task template in Table was designed...", missing table number
5. section 8.3.2, "For instance, on participant indicated his...", typo "one"
6. section 9.1.2, "The derived human BLO set is depends on what...", should be "...set depends on..."
There are many other typos and minor issues, I suggest the authors thoroughly go through the text again and fix these issues.
Reference:
[1] Claudia Leacock and Martin Chodorow. 1998. Combining local context and WordNet similarity for word sense identification. WordNet: An electronic lexical database 49, 2 (1998), 265–283.
[2] Dekang Lin et al. 1998. An information-theoretic definition of similarity.. In Icml, Vol. 98. 296–304.
[3] Jay J Jiang and David W Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008 (1997).
[4] David Sánchez, Montserrat Batet, and David Isern. 2011. Ontology-based information content computation. Knowledge-Based Systems 24, 2 (2011), 297–303.
[5] Nuno Seco, Tony Veale, and Jer Hayes. 2004. An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the 16th European conference on artificial intelligence. IOS Press, 1089–1090.
|