8 document images were processed for text recognition using the GLM-OCR model on 2026-06-05. The dataset contains OCR results generated from the source dataset davanstrien/ocr-affordances-pages. Processing was completed in 2.9 minutes by author davanstrien.
Use Cases
- Benchmarking OCR model performance based on the described SOTA compact model.
- Analyzing OCR output quality from document images mentioned in the description.
- Training or fine-tuning downstream NLP models on extracted text data.
- Studying document structure and layout from OCR-derived markdown output.
Strengths
- Uses a state-of-the-art compact OCR model (GLM-OCR, 0.9B parameters) as specified.
- Processing details are explicitly provided, including date (2026-06 05) and time (2.9 min).
- Source dataset and model are clearly cited (davanstrien/ocr-affordances-pages, zai-org/GLM-OCR).
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- The dataset is very small, containing only 8 processed samples.
Provenance
- Source
- davanstrien/ocr-affordances-pages
- Collection Method
- OCR processing using the zai-org/GLM-OCR model for text recognition.
- Time Range
- Processing date: 2026-06-05.
- Freshness
- Last updated 2026-06-05 14:15:10; freshness should be verified.
- Geography
- null