ArenaOCR is a unit-test-driven benchmark for Optical Character Recognition and Document Understanding. It is designed to assess Vision-Language Models on challenging real-world layouts, moving away from traditional fuzzy metrics. The benchmark was created by Surpem and was last updated on 2026-05-17.
Use Cases
- Benchmarking OCR system performance based on unit-test-driven evaluation criteria
- Evaluating Vision-Language Models on document understanding tasks based on challenging real-world layouts
- Comparing model outputs against a rigorous benchmark that avoids traditional fuzzy metrics
Strengths
- Designed as a highly rigorous, unit-test-driven benchmark
- Focuses on extremely challenging real-world layouts
- Replicates the design paradigm and schema structure of the established allenai/olmOCR-bench
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
Provenance
- Source
- huggingface
- Freshness
- Last updated 2026-05-17 16:22:01