Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A benchmark dataset for optical character recognition and document understanding, created by Timokerr and last updated on 2026-04-22. It contains source PDFs and corresponding JSON ground truth labels for documents across three domains: invoices, receipts, and logistics. The dataset is designed for benchmarking performance across 17 LLM models.
License is unknown; users must verify permissions before use.