Terminology Semantic Sememe Tree Knowledge Graph

Tracking #: 4061-5275

Authors: 
Peiyan Wang
Feiyan Jiang
Yuyang Wang
Sijia Shen

Responsible editor: 
Philipp Cimiano

Submission type: 
Dataset Description
Abstract: 
Knowledge graphs are foundational language resources and research tools derived from corpus studies. Existing graphs mostly focus on factual or event-based descriptions. Semantic category graphs for domain-specific terminology are scarce. Existing terminology repositories mostly remain at the level where signifiers and signifieds are not fully separated, lacking independent modeling and semantic linking. The Terminology Semantic Sememe Tree Knowledge Graph proposes a representation method for terminological concepts, which separately encodes signifiers (terms) and signifieds (concepts), based on a sememe tree structure. Our knowledge graph consists of three parts: the sememe system repository, the term record repository, and the relation repository. It covers core elements including terms, concepts, concept relations, sememes, and dynamic roles. Unlike other terminological semantic knowledge graphs that integrate terms and definitions into unified descriptions, this knowledge graph embeds such relationships within the sememe tree structures, which facilitates the implementation of semantic computation.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By John McCrae submitted on 23/Apr/2026
Suggestion:
Reject
Review Comment:

This paper presents the Terminology Semantic Sememe Tree Knowledge Graph, a resource that models domain-specific terms (primarily in aviation) by decomposing their meanings into hierarchical so-called "sememe trees". It aims to bridge the gap between terminology and semiotics by separating linguistic signifiers from semantic representations in HowNet.

The paper’s literature review is narrow, focusing heavily on the internal logic of the HowNet framework and recent large language model developments while failing to engage with established Semantic Web standards for lexical-conceptual modeling. Most critically, it overlooks the OntoLex-lemon ecosystem and the vast body of research regarding the Linguistic Linked Open Data (LLOD) cloud, which already describes a separation between the lexical and semantic layers with good interface with WordNet-like resources such as HowNet.

As a "Data Description" paper, there are several critical areas where the manuscript falls short of the required standards:

- I first note that this paper does not use linked data standards like RDF, OWL, and SKOS. This paper uses JSON, KDML, and Mermaid formats, but does not provide an RDF-based representation. Even if this could be recitifed it is not clear there is much linking in this resource beyond to HowNet.
- There is no evidence of third-party uses. While the authors suggest potential applications these have not been documented as being used by external researchers or in external applications.
- The evaluation is limited to a human-consistency check on only 200 records out of over 30,000. For a data description paper, this sample size (less than 1%) is insufficient to guarantee the quality and stability of the entire dataset, especially the parts generated by LLMs. Further, it is not clear if the evaluation really captures the structure of the resource in a way that validates the novel structure of this model, rather than the quality of the terminological taxonomy.
- The paper mentions a license and versioning in passing but does not explicitly define the licensing terms or provide a comprehensive versioning history within the manuscript.

Review #2
Anonymous submitted on 12/May/2026
Suggestion:
Major Revision
Review Comment:

The paper introduces a dataset called the Terminology Semantic Sememe Tree Knowledge Graph (TS-KG), which is designed to represent terminology in a structured way separating terms (signifiers) from concepts (signifieds) and represents the meaning of each concept using sememe trees based on the HowNet semantic system. The dataset is composed of three main parts: a sememe repository, a term repository (with definitions) and a relation repository.

Regarding the quality of the dataset, it was evaluated by four domain experts on 200 randomly sampled terms, achieving 95.25%. It is also based on an established resource as HowNet. However, there are some issues: only a small part of the data was checked, there is little detail about possible errors, and it is not clear how the dataset will be maintained or updated over time.

About the usefulness, it has some potential as it is a structured resource that can support several bilingual NLP tasks. It also includes helpful features like embeddings, links between similar terms, and both Chinese and English labels. However, the paper does not report any external use by third parties (other applications) neither it is referenced by previous studies.

In terms of clarity, the paper describes the dataset quite clearly, with well-organized parts such as the sememe taxonomy, term collection, and relations, and it explains how the dataset was created in detail. It also provides access to the data and the code through GitHub. However, the main flaw of this contribution is that it does not fully follow Semantic Web best practices, since it does not use standard vocabularies (like RDF), lacks an ontology, and does not follow Linked Data or FAIR principles.

The dataset is easy to access and structured well, so computers can read it, but it is not a true Semantic Web dataset. It does not use RDF or standard web identifiers, and it is not connected to other datasets. Because of this, it only reaches about 3 out of 5 stars in the Linked Data model.

Overall, while the paper presents an interesting dataset with clear potential for NLP applications, it does not meet the criteria for a Semantic Web Journal dataset description. In particular, the main gaps for improvement are the lack of compliance with Semantic Web standards (RDF representation, reuse of standard vocabularies and compliance with the LD principles), together with limited evaluation scope and no evidence of third-party adoption. Therefore, I recommend a major revision, encouraging the authors to revise the dataset towards full Semantic Web compliance and to provide stronger validation and usage evidence.