DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Bahnar Speech Translation Dataset with Vietnamese and English Text | DataSalon

Home Reinforcement LearningBahnar Speech Translation Dataset with Vietnamese and English Text

Reinforcement Learning

Bahnar Speech Translation Dataset with Vietnamese and English Text

Name: Bahnar Speech Translation Dataset with Vietnamese and English Text
Creator: cuong06
Published: 2026-05-23T07:33:20
Keywords: Multilingual, Audio, Speech Translation, Audio Text Alignment, Low Resource Language, Multimodal

by cuong06·Updated 12d ago

Available on 1 platform

Description

A Bahnar speech translation dataset contains audio aligned with Bahnar, Vietnamese, and English text. It was created from internet data sources and automatically aligned using the Bahnar-Vietnamese-S2TT pipeline. The dataset includes 113,830 utterances in its training split.

Use Cases

Train models for direct speech-to-text translation from Bahnar to Vietnamese based on the aligned audio and text.
Develop multilingual speech recognition systems based on the Bahnar, Vietnamese, and English text transcriptions.
Research low-resource language processing techniques based on the dataset's focus on Bahnar.
Benchmark automatic alignment pipelines for speech data based on the described creation method.

Strengths

Training split contains 113,830 utterances, providing a substantial base for model training.
Provides aligned text in three languages (Bahnar, Vietnamese, English), enabling multi-task learning.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.

Provenance

Source: cuong06
Collection Method: Created from internet data sources and automatically aligned using the Bahnar-Vietnamese-S2TT pipeline.
Freshness: Last updated 2026-05-28 08:22:32; freshness should be verified.

Audio Multimodal Multilingual Speech Translation Audio Text Alignment Low Resource Language

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

72 downloads

1 likes

0 views

Dataset Info

Author: cuong06
Created: May 23, 2026
Updated: May 28, 2026
Last synced: Jun 6, 2026

Access

Community

72 downloads

1 likes

0 views

Dataset Info

Author: cuong06
Created: May 23, 2026
Updated: May 28, 2026
Last synced: Jun 6, 2026

Bahnar Speech Translation Dataset with Vietnamese and English Text

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info