OCR Affordances GLM: Optical Character Recognition Results from Document Images

Name: OCR Affordances GLM: Optical Character Recognition Results from Document Images
Creator: davanstrien
Published: 2026-06-05T14:15:08
Keywords: Optical Character Recognition, Computer Vision, Text Recognition, Document Processing, Multimodal

by davanstrienUpdated 9d ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

8 document images were processed for text recognition using the GLM-OCR model on 2026-06-05. The dataset contains OCR results generated from the source dataset davanstrien/ocr-affordances-pages. Processing was completed in 2.9 minutes by author davanstrien.

Use Cases

Benchmarking OCR model performance based on the described SOTA compact model.
Analyzing OCR output quality from document images mentioned in the description.
Training or fine-tuning downstream NLP models on extracted text data.
Studying document structure and layout from OCR-derived markdown output.

Strengths

Uses a state-of-the-art compact OCR model (GLM-OCR, 0.9B parameters) as specified.
Processing details are explicitly provided, including date (2026-06 05) and time (2.9 min).
Source dataset and model are clearly cited (davanstrien/ocr-affordances-pages, zai-org/GLM-OCR).

Limitations

Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
The dataset is very small, containing only 8 processed samples.

Provenance

Source: davanstrien/ocr-affordances-pages
Collection Method: OCR processing using the zai-org/GLM-OCR model for text recognition.
Time Range: Processing date: 2026-06-05.
Freshness: Last updated 2026-06-05 14:15:10; freshness should be verified.
Geography: null

null

Multimodal Optical Character Recognition Computer Vision Text Recognition Document Processing

Related Datasets

Quality Score

D37

Description

42

Source

36

Reputation

35

Access

26

Community

1 likes

0 views

Dataset Info

Author: davanstrien
Created: Jun 5, 2026
Updated: Jun 5, 2026
Last synced: Jun 12, 2026

Access

26

Community

1 likes

0 views

Dataset Info

Author: davanstrien
Created: Jun 5, 2026
Updated: Jun 5, 2026
Last synced: Jun 12, 2026

OCR Affordances GLM: Optical Character Recognition Results from Document Images

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info