HalluBench is a benchmark dataset for evaluating hallucination in vision language models on geospatial imagery. It was created by AuwAuwAuw and last updated on 2026-04-05. The dataset covers two application domains: emergency disaster assessment and urban scene understanding.
Use Cases
- Benchmarking model hallucination based on geospatial imagery.
- Evaluating disaster assessment accuracy based on before-and-after aerial and satellite imagery.
- Testing urban scene understanding capabilities based on urban imagery.
Strengths
- Focuses on two defined application domains: emergency and urban.
- Designed specifically to probe model reasoning accuracy and fabrication of details.
- Last updated on 2026-04-05.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- The description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Freshness
- 2026-04-05 23:39:52