AraReview is a sentiment analysis dataset containing 105,466 labeled Arabic hotel reviews. It is derived from the HARD (Hotel Arabic Reviews Dataset) source and is perfectly balanced with 52,733 examples each for positive and negative classes. The dataset was created by author dralsarrani and is hosted on Hugging Face.
Use Cases
- Fine-tuning Arabic language models for sentiment classification based on labeled hotel reviews.
- Benchmarking the performance of Arabic NLP models on a balanced, domain-specific text corpus.
- Training binary sentiment classifiers for the hospitality industry based on Arabic customer feedback.
Strengths
- Contains 105,466 labeled Arabic text examples, providing a substantial corpus for model training.
- Perfectly balanced with 52,733 examples per class, which can simplify training and evaluation.
- Specifically designed for fine-tuning prominent Arabic NLP models like AraBERT, CAMeLBERT, and AraELECTRA.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- HARD (Hotel Arabic Reviews Dataset)
- Collection Method
- Built from the HARD source, likely involving cleaning and balancing.
- Freshness
- Last updated 2026-05-30 12:22:46; freshness should be verified.