Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Encompassing 0.5 million synthetic Chinese document images generated by the SynthDoG tool for training the Donut model. It is part of a multi-language collection created by naver-clova-ix and was last updated in January 2024.
The dataset is intended for use with the Donut model architecture. Users should review the associated GitHub repository and paper for generation details and licensing information.