Bahraini Arabic Speech Corpus with 90,421 Utterance Clips

Name: Bahraini Arabic Speech Corpus with 90,421 Utterance Clips
Creator: Hishambarakat
Published: 2026-01-22T06:32:40
Keywords: Dialect Modeling, Speech Corpus, Audio, Natural Language Processing, Bahraini Arabic, Automatic Speech Recognition

by HishambarakatUpdated 6mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

Bahraini Speech Dataset is a Bahraini Arabic speech corpus built from publicly available podcast and video content. It contains 90,421 single-speaker utterance clips with aligned transcriptions, created by Hishambarakat and last updated on January 23, 2026.

Use Cases

Train Automatic Speech Recognition (ASR) models based on the aligned transcriptions.
Model dialectal Arabic variations based on the Bahraini speech content.
Support phonetic and linguistic analysis based on the processed utterance clips.
Experiment with low-resource speech and language workflows based on the described corpus.

Strengths

Contains 90,421 individual speech clips, providing a substantial number of data points.
Clips are processed into single-speaker utterances with aligned transcriptions, suggesting structured data for ASR.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: Hishambarakat on Hugging Face.
Collection Method: Built from publicly available podcast and video content, processed into clips.
Time Range: null
Freshness: Last updated 2026-01 23 06:10:51; freshness should be verified.
Geography: Bahrain (inferred from dataset title and description).

null

Audio Dialect Modeling Speech Corpus Natural Language Processing Bahraini Arabic Automatic Speech Recognition

Related Datasets

Quality Score

D38

Description

42

Source

36

Reputation

42

Access

26

Community

79 downloads

1 likes

0 views

Dataset Info

Author: Hishambarakat
Created: Jan 22, 2026
Updated: Jan 23, 2026
Last synced: Jun 19, 2026

Access

26

Community

79 downloads

1 likes

0 views

Dataset Info

Author: Hishambarakat
Created: Jan 22, 2026
Updated: Jan 23, 2026
Last synced: Jun 19, 2026

Bahraini Arabic Speech Corpus with 90,421 Utterance Clips

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info