Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
13,371 line-level transcriptions verified by Qwen3-VL 235B form gold-standard OCR training data. The dataset includes line crop PNG images from 100 newspaper pages across 73 unique titles spanning the 1840s to the 2010s. It was created by NealCaren and is split by page into train, validation, and test sets.
License is unknown; terms of use must be verified before application.