Abstract:
The increasing prevalence of abusive speech on online community platforms and social networks poses a significant threat to online safety, impacting individuals and broader user communities, particularly vulnerable groups like children. Embedding automated abusive speech detection functionalities is crucial for proactively warning users of such inappropriate content and fostering safer online environments. To address this challenge, this paper introduces Alo, a novel ontology for modelling abusive language and the abusive speech analysis process. Addressing the existing gap
in comprehensive ontologies for this domain, Alo builds upon established Semantic Web vocabularies such as Marl Westerski et al. (2011), Onyx Sánchez-Rada and Iglesias (2016), and Pro-V Lebo et al. (2013) to provide a structured representation of abusive language concepts, their relationships, and associated lexical resources. The ontology is designed to facilitate the representation of abusive language detection results and the integration of diverse lexical resources (e.g., corpora, lexicons) across different annotation schemas, thereby promoting data interoperability within the Semantic Web. We present the development of Alo alongside AloLex, an integrated lexicon and knowledge graph of abusive speech in Serbian. Furthermore, we explore the practical application of these resources by investigating
the performance of Large Language Models (LLMs) on abusive language detection in Serbian, both with and without lexicon support. We also examine the capability of LLMs to generate and evaluate abusive language examples for lexicon enrichment. A key contribution of Alo lies in its enhanced conceptual model, which offers a broader coverage of abusive speech targets, incorporates data properties and embeddings, and supports multi-level annotation on the same dataset – features not fully addressed by existing ontologies. This work provides valuable semantic resources for advancing the understanding and automatic detection of abusive speech in under-resourced languages within the Semantic Web ecosystem, ultimately contributing to safer online environments.