Structuring Abusive Language Semantically: An Ontology-Driven Lexicon for Enhanced Detection in Serbian using Generative AI

Tracking #: 3857-5071

Authors: 
Danka Jokić
Ranka Stankovic

Responsible editor: 
Armin Haller

Submission type: 
Full Paper
Abstract: 
The increasing prevalence of abusive speech on online community platforms and social networks poses a significant threat to online safety, impacting individuals and broader user communities, particularly vulnerable groups like children. Embedding automated abusive speech detection functionalities is crucial for proactively warning users of such inappropriate content and fostering safer online environments. To address this challenge, this paper introduces Alo, a novel ontology for modelling abusive language and the abusive speech analysis process. Addressing the existing gap in comprehensive ontologies for this domain, Alo builds upon established Semantic Web vocabularies such as Marl Westerski et al. (2011), Onyx Sánchez-Rada and Iglesias (2016), and Pro-V Lebo et al. (2013) to provide a structured representation of abusive language concepts, their relationships, and associated lexical resources. The ontology is designed to facilitate the representation of abusive language detection results and the integration of diverse lexical resources (e.g., corpora, lexicons) across different annotation schemas, thereby promoting data interoperability within the Semantic Web. We present the development of Alo alongside AloLex, an integrated lexicon and knowledge graph of abusive speech in Serbian. Furthermore, we explore the practical application of these resources by investigating the performance of Large Language Models (LLMs) on abusive language detection in Serbian, both with and without lexicon support. We also examine the capability of LLMs to generate and evaluate abusive language examples for lexicon enrichment. A key contribution of Alo lies in its enhanced conceptual model, which offers a broader coverage of abusive speech targets, incorporates data properties and embeddings, and supports multi-level annotation on the same dataset – features not fully addressed by existing ontologies. This work provides valuable semantic resources for advancing the understanding and automatic detection of abusive speech in under-resourced languages within the Semantic Web ecosystem, ultimately contributing to safer online environments.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 03/Sep/2025
Suggestion:
Reject
Review Comment:

The paper presents an model for abusive language concepts and their relation, a lexicon in Serbian listing abusive language and an annotated dataset with abusive language in Serbian. The lexicon is used in combination with language models to generate abusive language examples and then detect abusive language in the generated examples.

In terms of experiments, the authors used prompts to generate abusive language and then prompts to assess texts for abusive language. In addition, human evaluators manually classified the abusive language texts.

Given the paper has been submitted to the Semantic Web journal, I assess the (1) originality and (2) significance of the results mainly from the point of view of Semantic Web research. The (3) quality of writing is good, the overall structure of the paper makes sense and the presentation of related work is comprehensive, albeit missing some more recent works (e.g., HateCOT).

Regarding (1) originality, I have a difficult time identifying learnings that I can take from the paper. The "ontology" is created by the authors. Although they use literature to inform the concepts used in the modelling, there is no reuse by third parties. The authors mention Linked Data in the description of Alo and AloLex. Linked Data implies that resources can be dereferenced via HTTP, however, I could not find a link to either of them online.

Regarding (2) significance, given that the rather straightforward setup of text generation and assessing the generated text via language models, and the focus on a single language, Serbian, somewhat limits the audience and potential users of the created resources. The fact that the established ways of using language models to generate hate speech and classify the generated text are based on English is a valid initial motivation, but simply applying the existing approaches to a new language does not generate much insight.

Overall, given the straightforward application of Semantic Web technologies, I rate the innovation rather low and hence I recommend to reject the submission.

Review #2
By Zhangcheng Qiang submitted on 10/Sep/2025
Suggestion:
Reject
Review Comment:

This paper introduces an Abusive Language Ontology (Alo), a Serbian electronic lexicon of abusive language (AloLex), and a novel methodology that utilises GenAI to enhance the abusive language detection in Serbian.

There are several limitations that need to be considered.
1. The paper's structure lacks clarity.
(1) The related work is too long; some sub-sections can be removed or combined.
(2) The evaluation metrics appear in the methodology section.
(3) The ontology and lexicon are suggested to be a part of the methodology, and the rest should be moved into the evaluation.
(4) The main contributions are only presented for the first time in the conclusion.
2. The ontology evaluation is absent. I am unable to locate the ontology evaluation in either the result section or the evaluation section. It is essential to have this part to validate the completeness of the ontology.
3. The performance improvement is not significant and sometimes is even worse than the benchmark. There are several potential reasons:
(1) The ontology itself is incomplete. Please refer to Point 2.
(2) The ontology is only used as a dictionary to capture the domain concepts; the semantic part is not fully implemented.
(3) LLM hallucinations are not appropriately controlled. The temperature of 0.5 can also cause unstable and random results.
4. The evaluation is not conducted in the most recent LLMs. The authors claim some models (e.g. GPT-4-o) refuse to answer the questions for classifying abusive language. Recent LLMs have guardrails to prevent models from answering such questions. The authors have not provided a solution for this.