Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
RUEmoCorp is a large-scale emotion classification corpus for Roman Urdu, the informal transliterated writing style dominant in Pakistani digital communication. The dataset includes a formally annotated benchmark subset of approximately 28,000 samples and a larger raw corpus, created to address the underrepresentation of Roman Urdu in NLP research. It was authored by Muhammad Khubaib Ahmad and last updated in May 2026.
License information is unknown and should be verified before use.