MBART-VI-OCR-Adaptation: Multilingual Text Data for OCR Enhancement
Available on 1 platform
Sign in to view source links and access this dataset
Description
A dataset named 'mbart-vi-ocr-adaptation-254000' hosted on Kaggle. The title suggests it contains text data, likely for adapting the mBART multilingual model to Optical Character Recognition tasks, potentially involving Vietnamese language. The dataset's specific content, size, and origin are not detailed in the provided metadata.
Use Cases
Fine-tuning mBART for post-OCR text correction in Vietnamese (inferred from domain, verify after download)
Training a language model to improve text recognition accuracy from scanned documents (inferred from domain, verify after download)
Benchmarking multilingual OCR adaptation techniques (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for data science resources.
Limitations
Metadata is minimal; actual content requires verification after download.
Row count, file formats, and column definitions are unknown, which limits suitability assessment.
License and authorship information are unavailable, affecting reproducibility and usage rights.
Provenance
Source
Kaggle
License is unknown; users must verify terms before commercial use.