SAP-KG: Analysis of Synonym Predicates using Wikidata

Tracking #: 3384-4598

Emetis Niazmand
Maria-Esther Vidal

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Wikidata, as a community-maintained knowledge graph (KG), contains millions of facts; it may integrate different entities and relations with the same meaning. Contributors of community-maintained knowledge graphs can use new predicates which are similar in meaning to other predicates in the KG (a.k.a. synonym predicates). Detecting these synonym predicates plays a crucial role in interoperability and query answer completeness against community-maintained knowledge graphs. We tackle the problem of uncovering synonym predicates, and propose SAP-KG, a knowledge graph-agnostic approach, to uncover the predicates with similar meanings but relating complementary entities. SAP-KG comprises a set of metrics to describe and analyze synonym predicates; it resorts to Class-based Synonym Descriptions (CSDs) to capture the most important characteristics of the predicates of a knowledge graph. As a proof of concept, we evaluate SAP-KG over Wikidata and show the benefits of exploiting statements annotated with qualifiers, references, and ranks. Additionally, we present a query processing technique that put in perspective the role of synonym predicates in query answer completeness. We have empirically studied the distribution and percentage of overlapping synonym predicates in six domains in Wikidata. The highest percentage of synonyms has been detected in the Person domain at 86.66%, while Drug has the lowest percentage, i.e., 42.39%. These results provide evidence that community-maintained knowledge graphs enclose predicates that define the same real-world relationships.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Shrestha Ghosh submitted on 16/Mar/2023
Major Revision
Review Comment:

The authors propose a method to detect synonym predicates within and
across KBs and show how it can be used to improve query answer
completeness. The method uses class-based synonym description to
determine synonym predicates. While the general approach is good, the
writing lacks clarity and it is often difficult to follow the text.
Nowhere is SAP-KG acronym introduced.

Following is my detailed review:

1. How does the work differ from previous works in synonym predicate
discovery mentioned in Section 3.3? While the current work uses a new
technique, does it also overcome any particular challenge of previous work?

2. Section 2.1 has many inconsistencies and unclear definitions.

i) G = (V,E,L), V is a set of KB entity/class "nodes" and L is a set of
predicate "nodes" or labels? Since KG embedding defines the mappings
between nodes to vectors.

ii) E is defined as a set of triples (s,p,o) in RDF Knowledge Graph
definition. In Rule mining Models, E is denoted as predicate(entity1,
entity2). Is this notation overloading necessary?

iii) The paragraph on Rule Mining Models is more about terminologies
used in rule mining, except for PCA towards the end.

iv) "The PCA assumes that if there is an object .. subject and
predicate" is perfectly understood but the preceding sentence is
difficult to comprehend. There is no introduction to the term
heuristic-based negative edges hE-(r), but his important to understand PCA.

v) "The low value of PCA confidence score leads to retrieve the
incomplete answers." - please check the grammar.

3. In the motivating example in Section 2.2, how does the presence of
incorrect relation help motivate this particular case? Can the current
work identify incorrect relations? or, it is robust to incorrect statements?

4. The authors have separate sections on the approach (section 4) and
the architecture (section 5). While in the approach it seems that the
computations are on the graph, section 5 shows that it is query
specific. For fining synonym candidates a pair of predicates is required
as input. how is the pair determined in a query? Are all possible pairs
checked? Are the pairs restricted to redicates belonging to the same
class? This is unclear.

5. Section 4: "KG = (E,V,L), .. predicates p_i and p_j in E" - here E is
now a set of predicates and not edges?

6. Section 4: The difference between synonym predicate candidates and
synonym predicates is not clear. A cadidate set should be larger than
the final set, right? Currently, it seems that the candidate set and the
synonym predicates are disjoint. While the candidate predicates consider
predicate pairs if they have the same subject and object, synonym
predicate are predicate pairs with either different subject with same
object or same subject but different objects. What purpose do the
candidates serve? What is difference between synonyms predicates that
are equivalent/complement?

i) ".. definitions of the synonym predicates is as follows:" there are
two definitions for two conditions? How is the complement requirement
satisfied? In a case where a KG was complete: all subjects have a parent
relation as well as a mother relation, then SAP-KG wouldn't consider
parent and father as synonym predicates?

ii) "equivalent to RDF-MTs" - what is RDF-MT?

iii) "- SDP - is a set of tuples (p,SD) ... MSD is a set of class-level
metrics." does SDP keep track of all synonym predicates and their
corresponding scores? Why the acronym MSD?

7. Section 5: "SAP-KG comprises three components: a) Incompleteness,
..": The text that follows has 4 components.

8. Section 5.1:

i) "By calculating the value of PCA ..": w.r.t. while rule?

ii) ".. whether these predicates may cause the query to return
incomplete answers.": how do you determine this?

9. Definition 5.1: What exactly is measured by u(.)? How is POS
different from POS-D and POS-R?

10. Definition 5.2: Pred and Predicate is used interchangeably. here u()
for triples has double parenthesis, different from Def 5.1.

11. Section 5.3: What is the exact process? something along the lines of
p_i, p_j are selected if MSD_a > thetha.

12. Section 6.1:

i) Similarity Values of Synonym Predicates: On which predicate pairs is
the threshold decided?

ii) Metrics: What about POS? Please spend some words on the significance
of using the connectivity metrics, at this point it is recursive. Please
revisit the precision, recall definitions. Currently, the definitions
are sets of numbers?? {#}

13. Section 6.2: "The x-axis represents each predicate, and the y-axis
represents the counts of synonyms of predicates" From what I understand,
Fig 9 shows the number of predicates with x synonym predicates, but the
text says the opposite.

14. Table 3: What's the significance of the greyscale background?

15. Section 6.6: Answer to RQ1: how is connectivity related to
identifying synonym predicates? This is not relatable.

16. The findings of the synonym predicates are significant. In Table 2, discussing some of the pairs would be more insightful. The POS metrics of the child predicate in Wikidata and DBpedia is quite low. Any sepcific reason. Similarly insights on the queries in Table 3. Some queries like 4,5,6 seem easy for all detectors, but some are specially hard. What is the nature of these synonyms?

Review #2
Anonymous submitted on 06/Apr/2023
Review Comment:

Wikidata, as a community-built knowledge base, with contributors from various levels of expertise and domains, faces the problem of having predicates with similar and even synonymous meaning.
SAP-KG is about investigating the prevalence of these synonym predicates on large scale. The authors propose a new method based on RDF2Vec and Word2Vec for synonym detection. They show that they can use the method to perform query expansion on 10 different manually designed queries in several Wikidata domains. The proposed method outperforms the existing methods.

Overall Evaluation:
The authors present an interesting and important problem, incompleteness in Wikidata, and present a novel approach that can mitigate this problem. While I think, this novel method might indeed be helpful to overcome incomplete query results, I think the definition of synonyms is highly debatable and could lead to incorrect query results. This problem is not discussed anywhere in this work.
I also find it very difficult to follow Section 4 and 5 and would encourage the authors to restructure and improve the writing. E.g. a consistent running example would help already a lot.
The evaluation only consists of 10 manually created queries. This seems to be problematic and makes generalizability of the approach questionable.

Section 1:
- Page 2, line 2-3: More details on how this leads to incompleteness would be nice for a motivation here.
- Page 2, Line 7: I disagree with this statement. Approach [10] is motivated by synonyms that have no overlap and can deal with it.
- Overall, this section is a bit repetitive, e.g., mentioning the existing approaches twice, mentioning the own approach several times.
Section 2:
- Page 3, line 32: Word2Vec is no graph embedding method.
- The definition of the rule mining approach and the respective metrics is hard to follow. The definition of Horn rules is incorrect: you are referring to closed Horn rules, here. Also the wording on page 3, line 37 “negative predicates” is a bit confusing, since Horn rules are usually considered to only have positive predicates.
- Page 4, Line 13: The direction of these rules seems to be unintuitive? I would assume that the rules father(a,b)->parent(a,b) should have significantly higher PCA confidence.
- Page 4, Line 24: It is unclear whether you are considering, parent = father as synonyms, or father = mother. When I look at the later examples, I would assume that you mean father=parent and mother=parent. I find this highly debatable premises. Aren’t these hyponyms? Also, if you would consider them as being synonyms, I would assume a synonym relation to be transitive, which would imply that father=mother. I think this part needs to be much more extensive and the validity of these premises should be discussed in much more detail.
- Probably this is only subjective, but I would have expected the motivating example in a different place in the paper.
- Page 5: Fig 1: not sure if the caption is needed again, since this is what is already written in the motivating example.

Section 3:
- Since your work is also looking into finding synonyms in different KGs, I would assume that ontology matching is an important related work that is missing here.
- 3.2: Why mention association rule mining again, when you already defined it in Section 2?
- 3.3: I think there is some important related works that I would add:
o Zhengbao Jiang, Jun Araki, Donghan Yu, Ruohong Zhang, Wei Xu, Yiming Yang, and Graham Neubig. “Learning Relation Entailment with Structured and Textual Information”. In: Automated Knowledge Base Construction (AKBC). 2020.
o Weize Chen, Hao Zhu, Xu Han, Zhiyuan Liu, and Maosong Sun. “Quantifying
Similarity between Relations with Fact Distribution”. In: Proceedings of the
Annual Meeting of the Association for Computational Linguistics (ACL)).
2019, pp. 2882–2894.

Section 4:
- It would be helpful, if you could continue with the motivating example instead of introducing a completely new one.
- Page 6, Line 50: What is a statement? A Wikidata statement ( is a simple RDF triple. But what is your definition?
- Is a synonym candidate using the statements or not? Why are there 2 different definitions?
- Not sure, why there is definitions for synonym candidates and synonym properties. It is weird that the definition of synonym properties is a superset of synonym candidates. Intuitively I would it expect to be the other way around.
- Page 7, Line 30: Why is it necessary that the synonyms complement each other?
- The CSD definition is very hard to follow.

Section 5:
- Page 9, Line 12: What is the PCA value of predicates? You have only defined PCA confidence for rules. How does this translate to what is happening here?
- Page 9, Line 23: I think there is a typo: “relates the same number of entities”. Reading the next sentences, I would expect it to be about relating the same entities and not the same number.

Section 6:
- The explanation of the baselines is missing. I can imagine how you employ RDF2Vec, but it is unclear how you have used Word2Vec.
- The comparison to the rule mining approach should be discussed, since both your and their paper have different definitions of synonyms. While they explicitly exclude hyponyms, your definition includes this. Therefore, the results are hard to compare.
- DBpedia is not mentioned in 6.1 but is then used later on.
- The query benchmark is based on only 10, manually created queries. Is the approach generalizable? This could be improved by using queries from Wikidata query logs.
- The results of your approach are incredibly good in comparison to the other approaches. It would the evaluation more credible if you would not only evaluate on manually created queries. Does this always work this well? This should also be discussed in more detail.

Minor Comments:
- I would encourage the others to make this work reproducible by making code and datasets available online.
- What does SAP stand for?
- It would be helpful to use labels, instead of the Wikidata Identifiers in many of the examples and the result tables.
- Definition 5.1 and 5.2:
o These definitions are unnecessary lengthy.
- Figure 4 has rather low quality. Would be nice if these would become higher resolution or a vector graphic. Also, some of the edge labels could be placed more neatly.
- Figure 5: This is also a nice figure, but it would be nice to increase the resolution as well.
- I find it confusing that CSD and MSC are such important definitions in Section 4, but are never used in Section 5.

Review #3
Anonymous submitted on 25/May/2023
Review Comment:

In this paper, the authors tackle the problem of improving the completeness of a query by detecting synonyms and then rewriting the query. The detection is done through several metrics and thresholds.

I found this paper very hard to follow and poorly written. In addition, the formalism introduced is dubious. Besides, the comparison with the state-of-the-art is feeble, and the experimental setup is not convincing (only ten queries for all the evaluations). Finally, the contribution is insufficient for a journal paper, and the code is not provided.

In more detail:
Promising a precision of 1 in the introduction is always a sign that something is wrong somewhere. The authors should have been more critical of their results.
Page 3, Line 25: E needs to be introduced. Top is rarely used to talk about the vector set. Please use R^{d}.
2.2: The motivation example is very unconvincing: parent and father are not equivalent, although we have father => parent. In 3.2, the authors notice that the PCA for parent => father is 0.5, concluding that many people lack a parent. This is false. In half cases, the parent is a mother. The authors should read previous works about predicate mapping and synonyms (for example, Gashteovski, K., Gemulla, R., Kotnis, B., Hertling, S., Meilicke, C.: On aligning OpenIE extractions with knowledge bases: A case study).
The authors could have used the relation "equivalent properties" in Wikidata.
The previous works need to have ontology alignment.
Page 6: "Since both triples have the same subject and object, they are considered equivalent." I have doubts about this implication.
In the formulas in section 4, put parenthesis (si = sj) and (oi = oj) =>...
Change the Wikidata identifier by a label. This is hard to understand in the current state.
In "Synonym Predicate Candidates," what is a statement? A statement in Wikidata is a triple, but it seems to be different here. Besides, why do we have two definitions of predicate candidates, and, more importantly, why are these definitions never used? In Section 5.2, the authors say only three methods are used to find candidates: "Word2Vec, RDF2Vec, and rule-based".
According to the formulas, a synonym predicate candidate can not be a synonym predicate. Besides, the formulas in "Synonym Predicates" cannot be true. I think the authors wanted to introduce a notion of complementarity here.
In the definition of CSD, the authors say a class is defined by the relation rdf:type. However, in Wikidata, this relationship is called "instance of". Besides, the authors ignored that subclasses are also classes, and the relation is then rdf:subClassOf or "subclass of" in Wikidata.
In the definition of CSD, the domain should contain two classes, not only C.
Is it mandatory to have "p' is synonym of p" in a CSD?
I do not understand why the CSD is so central. It is just a basic data structure.
What is an Ideal Knowledge Graph? Is it the complete version of the KG?
In 4.1, the proposed solution is not a solution. It is just a data structure. The author should clearly redefine their problem and the real contribution. See my attempt in the summary above.
Linked to the previous point, it is unclear what the input and output of the system are. I took time to understand that the input is a query we want to improve the completeness.
I did not understand the definitions 5.1 and 5.2. They seem overcomplicated. What is \mu (it did not help me to understand)? How is it defined? What is "x100?" Do the authors mean "*100"?
In 5.2, although I do not understand the formula, I think the input is Predicate and not Pred.
In the original problem, the authors want to find synonyms in a single KG, but it changes later (inter-synonyms). They should focus on one problem.
I do not understand why complementarity is so important for finding synonyms. I understand it helps to improve the completeness, but it seems to go against finding good synonyms.
If the task is improving the completeness, why limit yourself to one rewriting in 5.4? Besides, the procedure for rewriting is unclear. How do you choose what to rewrite? The text says, "adding more triple patterns to the WHERE," but it is not what happens in the example.
In the evaluation, ten queries are clearly not enough to have relevant results.
What is the running time? For me, this approach does not scale and cannot be used to find all synonyms in a KG.
In 6.2, the experimental setup is not clear. Besides, I do not believe that 86.66% of predicates in the Person domain have synonyms. The system says this, but how does it compare with the truth?
Table 2 compares ten examples. What should we conclude from so few examples and newly introduced metrics?
What are the colors in Table 3? Please put in bold the best results because the baselines often have the best results (tied to SAP-KG). Besides, what are the baselines here? I thought SAP-KG was tested with these three distances (see above).
The results contain nothing from previous works. I am sure that finding synonyms is not limited to the Word2Vec distance.