Review Comment:
The paper introduces a dataset called the Terminology Semantic Sememe Tree Knowledge Graph (TS-KG), which is designed to represent terminology in a structured way separating terms (signifiers) from concepts (signifieds) and represents the meaning of each concept using sememe trees based on the HowNet semantic system. The dataset is composed of three main parts: a sememe repository, a term repository (with definitions) and a relation repository.
Regarding the quality of the dataset, it was evaluated by four domain experts on 200 randomly sampled terms, achieving 95.25%. It is also based on an established resource as HowNet. However, there are some issues: only a small part of the data was checked, there is little detail about possible errors, and it is not clear how the dataset will be maintained or updated over time.
About the usefulness, it has some potential as it is a structured resource that can support several bilingual NLP tasks. It also includes helpful features like embeddings, links between similar terms, and both Chinese and English labels. However, the paper does not report any external use by third parties (other applications) neither it is referenced by previous studies.
In terms of clarity, the paper describes the dataset quite clearly, with well-organized parts such as the sememe taxonomy, term collection, and relations, and it explains how the dataset was created in detail. It also provides access to the data and the code through GitHub. However, the main flaw of this contribution is that it does not fully follow Semantic Web best practices, since it does not use standard vocabularies (like RDF), lacks an ontology, and does not follow Linked Data or FAIR principles.
The dataset is easy to access and structured well, so computers can read it, but it is not a true Semantic Web dataset. It does not use RDF or standard web identifiers, and it is not connected to other datasets. Because of this, it only reaches about 3 out of 5 stars in the Linked Data model.
Overall, while the paper presents an interesting dataset with clear potential for NLP applications, it does not meet the criteria for a Semantic Web Journal dataset description. In particular, the main gaps for improvement are the lack of compliance with Semantic Web standards (RDF representation, reuse of standard vocabularies and compliance with the LD principles), together with limited evaluation scope and no evidence of third-party adoption. Therefore, I recommend a major revision, encouraging the authors to revise the dataset towards full Semantic Web compliance and to provide stronger validation and usage evidence.
|