Myanmar Written Corpus for NLP Tasks

Name: Myanmar Written Corpus for NLP Tasks
Creator: freococo
Published: 2024-12-28T05:21:27
Keywords: Myanmar Language, Text, Audio, Large Scale, Natural Language Processing, Multilingual Nlp, Text Corpus

by freococoUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Myanmar language text corpus designed to address the lack of large-scale, openly accessible resources for Myanmar Natural Language Processing. The dataset is tailored to support tasks like text-to-speech and automatic speech recognition. It was created by author 'freococo' and last updated on May 22, 2025.

Use Cases

Text-to-speech synthesis based on high-quality written Myanmar text.
Automatic speech recognition model training based on written text resources.
Machine translation model development for the Myanmar language.
Text generation tasks using the collected written corpus.

Strengths

Designed as a critical resource for a language lacking large-scale, openly accessible NLP resources.
Tailored to support various NLP tasks as mentioned in the description.
Last updated on May 22, 2025, indicating recent maintenance.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
The description notes the text is 'not fully CLEAN', suggesting it may require preprocessing.

Provenance

Source: huggingface
Freshness: Last updated 2025-05-22 07:00:09
Geography: Myanmar (implied by language focus)

Text Audio Myanmar Language Large Scale Natural Language Processing Multilingual Nlp Text Corpus

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

43

Access

26

Community

39 downloads

3 likes

0 views

Dataset Info

Author: freococo
Created: Dec 28, 2024
Updated: May 22, 2025
Last synced: Apr 27, 2026

Access

26

Community

39 downloads

3 likes

0 views

Dataset Info

Author: freococo
Created: Dec 28, 2024
Updated: May 22, 2025
Last synced: Apr 27, 2026

Myanmar Written Corpus for NLP Tasks

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info