A dataset of real receipt images intended for optical character recognition tasks, published on Kaggle. The dataset likely contains images of receipts and corresponding text annotations. Specific details on the number of samples, collection method, and time period are not provided in the available metadata.
Use Cases
- Train an OCR model to extract text from receipt images (inferred from domain, verify after download)
- Benchmark text detection and recognition algorithms on real-world documents (inferred from domain, verify after download)
- Fine-tune a pre-trained model for domain-specific receipt parsing (inferred from domain, verify after download)
Strengths
- Published on Kaggle, a platform for sharing datasets.
- Focuses on real-world receipt images, which may provide practical training data.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.