Results from multiple multilingual OCR models applied to the test split of the GlotOCR-bench dataset, containing 16,375 samples. The dataset was created by cis-lmu and last updated on April 12, 2026. It includes outputs from models such as rednote-hilab/dots.ocr, zai-org/GLM-OCR, and deepseek-ai/DeepSeek-OCR-2.
Use Cases
- Benchmarking OCR model accuracy based on outputs from multiple models mentioned in the description
- Analyzing error patterns in multilingual text recognition based on the provided model results
- Training or fine-tuning OCR models using a corpus of pre-processed image-text pairs
- Studying the comparative performance of different OCR architectures on a standardized test set
Strengths
- Contains 16,375 samples for model comparison
- Includes outputs from several named, contemporary OCR models
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
Provenance
- Source
- cis-lmu/GlotOCR-bench
- Collection Method
- OCR results generated by applying multiple models to images from a source benchmark dataset.
- Time Range
- null
- Freshness
- Last updated 2026-04-12 16:47:32
- Geography
- null