Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Infinity-Doc2-5M is a training dataset containing 5 million samples for document parsing scenarios. It covers diverse document types including academic papers, research reports, and financial reports, and supports both Chinese and English languages. The dataset was created by infly and was last updated on the platform in May 2026.
License is unknown, which may restrict commercial or research use.