Incorporating Sarcasm-Infused Slang (‘Lughat al-Sarsagiyya’) into GPT Models: Challenges and Opportunities in Arabic AI Linguistics
Paper ID : 1060-ICEEM2025 (R1)
Authors
Wael Badawy *
Eru
Abstract
The rapid advancement of Generative Pre-trained Transformers (GPT) in Arabic Natural Language Processing (NLP) has yet to fully accommodate dialectical and culturally nuanced language, particularly the sarcastic street slang known as 'Lughat al-Sarsagiyya'. This paper investigates the integration of sarcasm-infused slang into GPT models by analyzing linguistic characteristics, curating annotated corpora, and fine-tuning a specialized Arabic GPT variant. We evaluate model performance across sarcasm detection and sentiment tasks using standard metrics, showing improvements over state-of-the-art Arabic models such as AraBERT and MARBERT. Our findings demonstrate the necessity of capturing sociolinguistic nuance to enhance the contextual accuracy of NLP systems operating in informal digital Arabic settings. Moreover, the paper addresses ongoing challenges including evolving slang, annotation complexity, and cultural specificity. The incorporation of 'Lughat al-Sarsagiyya' opens new opportunities for building culturally intelligent AI systems capable of deeper engagement with regional dialects in Arab digital spaces.sarcasm detection, Lughat al-Sarsagiyya, Arabic NLP, GPT models, slang, dialectal Arabic, sentiment analysis, sociolinguistics, deep learning, transformer models
Keywords
sarcasm detection, Lughat al-Sarsagiyya, Arabic NLP, GPT models, slang, dialectal Arabic, sentiment analysis, sociolinguistics, deep learning, transformer models
Status: Accepted