Review Comment:
The authors propose a method to detect synonym predicates within and
across KBs and show how it can be used to improve query answer
completeness. The method uses class-based synonym description to
determine synonym predicates. While the general approach is good, the
writing lacks clarity and it is often difficult to follow the text.
Nowhere is SAP-KG acronym introduced.
Following is my detailed review:
1. How does the work differ from previous works in synonym predicate
discovery mentioned in Section 3.3? While the current work uses a new
technique, does it also overcome any particular challenge of previous work?
2. Section 2.1 has many inconsistencies and unclear definitions.
i) G = (V,E,L), V is a set of KB entity/class "nodes" and L is a set of
predicate "nodes" or labels? Since KG embedding defines the mappings
between nodes to vectors.
ii) E is defined as a set of triples (s,p,o) in RDF Knowledge Graph
definition. In Rule mining Models, E is denoted as predicate(entity1,
entity2). Is this notation overloading necessary?
iii) The paragraph on Rule Mining Models is more about terminologies
used in rule mining, except for PCA towards the end.
iv) "The PCA assumes that if there is an object .. subject and
predicate" is perfectly understood but the preceding sentence is
difficult to comprehend. There is no introduction to the term
heuristic-based negative edges hE-(r), but his important to understand PCA.
v) "The low value of PCA confidence score leads to retrieve the
incomplete answers." - please check the grammar.
3. In the motivating example in Section 2.2, how does the presence of
incorrect relation help motivate this particular case? Can the current
work identify incorrect relations? or, it is robust to incorrect statements?
4. The authors have separate sections on the approach (section 4) and
the architecture (section 5). While in the approach it seems that the
computations are on the graph, section 5 shows that it is query
specific. For fining synonym candidates a pair of predicates is required
as input. how is the pair determined in a query? Are all possible pairs
checked? Are the pairs restricted to redicates belonging to the same
class? This is unclear.
5. Section 4: "KG = (E,V,L), .. predicates p_i and p_j in E" - here E is
now a set of predicates and not edges?
6. Section 4: The difference between synonym predicate candidates and
synonym predicates is not clear. A cadidate set should be larger than
the final set, right? Currently, it seems that the candidate set and the
synonym predicates are disjoint. While the candidate predicates consider
predicate pairs if they have the same subject and object, synonym
predicate are predicate pairs with either different subject with same
object or same subject but different objects. What purpose do the
candidates serve? What is difference between synonyms predicates that
are equivalent/complement?
i) ".. definitions of the synonym predicates is as follows:" there are
two definitions for two conditions? How is the complement requirement
satisfied? In a case where a KG was complete: all subjects have a parent
relation as well as a mother relation, then SAP-KG wouldn't consider
parent and father as synonym predicates?
ii) "equivalent to RDF-MTs" - what is RDF-MT?
iii) "- SDP - is a set of tuples (p,SD) ... MSD is a set of class-level
metrics." does SDP keep track of all synonym predicates and their
corresponding scores? Why the acronym MSD?
7. Section 5: "SAP-KG comprises three components: a) Incompleteness,
..": The text that follows has 4 components.
8. Section 5.1:
i) "By calculating the value of PCA ..": w.r.t. while rule?
ii) ".. whether these predicates may cause the query to return
incomplete answers.": how do you determine this?
9. Definition 5.1: What exactly is measured by u(.)? How is POS
different from POS-D and POS-R?
10. Definition 5.2: Pred and Predicate is used interchangeably. here u()
for triples has double parenthesis, different from Def 5.1.
11. Section 5.3: What is the exact process? something along the lines of
p_i, p_j are selected if MSD_a > thetha.
12. Section 6.1:
i) Similarity Values of Synonym Predicates: On which predicate pairs is
the threshold decided?
ii) Metrics: What about POS? Please spend some words on the significance
of using the connectivity metrics, at this point it is recursive. Please
revisit the precision, recall definitions. Currently, the definitions
are sets of numbers?? {#}
13. Section 6.2: "The x-axis represents each predicate, and the y-axis
represents the counts of synonyms of predicates" From what I understand,
Fig 9 shows the number of predicates with x synonym predicates, but the
text says the opposite.
14. Table 3: What's the significance of the greyscale background?
15. Section 6.6: Answer to RQ1: how is connectivity related to
identifying synonym predicates? This is not relatable.
16. The findings of the synonym predicates are significant. In Table 2, discussing some of the pairs would be more insightful. The POS metrics of the child predicate in Wikidata and DBpedia is quite low. Any sepcific reason. Similarly insights on the queries in Table 3. Some queries like 4,5,6 seem easy for all detectors, but some are specially hard. What is the nature of these synonyms?
|