Myanmar Word Glyphs (MWG) is a dataset containing 49.8 thousand synthetic grayscale word and phrase images. The dataset is intended for optical character recognition and text recognition tasks for the Burmese language. The author, organization, and last update date are unknown.
Use Cases
- Train optical character recognition models based on synthetic grayscale word images.
- Benchmark text recognition algorithms for the Burmese language.
- Augment training data for document analysis pipelines using synthetic phrases.
Strengths
- Contains 49.8 thousand synthetic images, providing a substantial volume for model training.
- Focuses on a specific language (Burmese), addressing a potential gap in OCR resources.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- Kaggle
- Collection Method
- Synthetically generated.
- Geography
- Myanmar (Burmese language)