DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Egyptian Arabic Hate Speech Dataset with 8,169 Manually Labeled Samples | DataSalon

Home Speech & AudioEgyptian Arabic Hate Speech Dataset with 8,169 Manually Labeled Samples

Speech & Audio

Egyptian Arabic Hate Speech Dataset with 8,169 Manually Labeled Samples

Name: Egyptian Arabic Hate Speech Dataset with 8,169 Manually Labeled Samples
Creator: IbrahimAmin
Published: 2025-05-02T12:37:00
Keywords: Egyptian-Arabic, Text Classification, Text, Offensive Language, Hate Speech, Audio

by IbrahimAmin·Updated 9mo ago

Available on 1 platform

Description

8,169 Egyptian-Arabic text samples are manually annotated for offensive language and hate speech. The dataset was created by IbrahimAmin, Mostafa Abbas, Rany Hatem, Andrew Ihab, and Mohamed Waleed Fahkr. It was last updated on August 17, 2025.

Use Cases

Fine-tuning transformer models for hate speech detection based on Egyptian dialect text.
Training classifiers for offensive language identification based on manually labeled samples.
Benchmarking NLP models on Egyptian Arabic dialect tasks.
Studying linguistic patterns of hate speech in a specific Arabic dialect.

Strengths

8,169 text samples provide a substantial corpus for model training.
Manual labeling process suggests higher annotation quality.
Focus on the Egyptian dialect addresses a specific linguistic niche.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but other metadata like file formats and license are unknown.
Data may reflect geographic bias inherent to its single-dialect focus.

Provenance

Source: huggingface
Collection Method: Manually labeled text samples.
Freshness: Last updated 2025-08-17 14:48:39
Geography: Egypt

License is listed as MIT in the raw description but 'unknown' in the input fields; verification is required.

Text Audio Egyptian-Arabic Text Classification Offensive Language Hate Speech

Related Datasets

Quality Score

D40

Description

Source

Reputation

Quality Score

D40

Description

Source

Reputation

Access

Community

58 downloads

1 likes

0 views

Dataset Info

Author: IbrahimAmin
Created: May 2, 2025
Updated: Aug 17, 2025
Last synced: May 31, 2026

Access

Community

58 downloads

1 likes

0 views

Dataset Info

Author: IbrahimAmin
Created: May 2, 2025
Updated: Aug 17, 2025
Last synced: May 31, 2026

Egyptian Arabic Hate Speech Dataset with 8,169 Manually Labeled Samples

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info