DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

BanSpeech: Bangladeshi Bangla Broadcast Speech Benchmark for ASR Evaluation | DataSalon

Home Speech & AudioBanSpeech: Bangladeshi Bangla Broadcast Speech Benchmark for ASR Evaluation

Speech & Audio

BanSpeech: Bangladeshi Bangla Broadcast Speech Benchmark for ASR Evaluation

Name: BanSpeech: Bangladeshi Bangla Broadcast Speech Benchmark for ASR Evaluation
Creator: SUST-CSE-Speech
Published: 2024-03-03T04:05:51
Keywords: Broadcast, Benchmark, Multi Domain, Audio, Bangla, Speech Recognition

by SUST-CSE-Speech·Updated 2y ago

Available on 1 platform

Description

A benchmark containing approximately 6.52 hours of human-annotated broadcast speech, totaling 8085 utterances, across 13 distinct domains. It is designed for automatic speech recognition performance evaluation in challenging conditions. The dataset was created by SUST-CSE-Speech and last updated on March 9, 2024.

Use Cases

Benchmarking ASR model performance based on the multi-domain broadcast speech data.
Evaluating ASR robustness in spontaneous speech conditions based on the dataset's design.
Testing ASR systems for domain-shifting scenarios based on the 13 distinct domains.
Evaluating ASR performance on multi-talker speech based on the dataset's design.
Testing ASR systems for code-switching speech based on the dataset's design.

Strengths

Contains 8085 utterances, providing a substantial number of speech samples.
Includes approximately 6.52 hours of annotated audio, offering a significant duration of speech data.
Covers 13 distinct domains, likely providing variety in speech content.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2024-03-09 20:24:47; freshness should be verified.

Provenance

Source: SUST-CSE-Speech
Collection Method: Human-annotated broadcast speech.
Freshness: 2024-03-09
Geography: Bangladeshi

Audio Broadcast Benchmark Multi Domain Bangla Speech Recognition

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

290 downloads

6 likes

0 views

Dataset Info

Author: SUST-CSE-Speech
Created: Mar 3, 2024
Updated: Mar 9, 2024
Last synced: May 28, 2026

Access

Community

290 downloads

6 likes

0 views

Dataset Info

Author: SUST-CSE-Speech
Created: Mar 3, 2024
Updated: Mar 9, 2024
Last synced: May 28, 2026

BanSpeech: Bangladeshi Bangla Broadcast Speech Benchmark for ASR Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info