Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A Romanian-language subset of the FinePDFs corpus, prepared for OCR, containing pairs of page images and extracted text. The dataset is part of an instruction fine-tuning protocol for Romanian Vision-Language Models proposed in the paper 'Înțelegi românește?'. It was created by OpenLLM-Ro and last updated on June 5, 2026.
License is unknown, which may restrict commercial use.