Name: Darija Stt Mix
Creator: ayoubkirouane
Published: 2024-07-16T07:48:27
Keywords: Arabic Speech, Audio, Audio Transcription, Darija Dialect, Speech Recognition

Description

The Darija Speech To Text Dataset is a collection of 13,178 rows of transcribed speech audio totaling 8.23 GB, created by ayoubkirouane. It was last updated on 2024-07-18 and focuses on the Darija dialect, primarily from Algeria and Morocco, with slang from other Arabic-speaking countries.

Use Cases

Train automatic speech recognition (ASR) models based on the described audio and transcription pairs.
Fine-tune pre-trained speech models for Darija dialect comprehension based on the described dialectal focus.
Benchmark ASR system performance on colloquial Arabic speech based on the described dataset size and content.
Study linguistic variations and slang across Arabic-speaking regions based on the described inclusion of multiple dialects.

Strengths

Contains 13,178 transcribed audio samples, providing a substantial base for model training.
Audio data totals 8.23 GB, indicating a significant volume of speech material.
Focuses on specific Darija dialects (Algerian, Moroccan) and includes slang, offering targeted linguistic variety.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
The dataset's specific collection methodology and potential biases are not detailed in the provided description.

Provenance

Source: ayoubkirouane on Hugging Face
Collection Method: Meticulously gathered from diverse resources, according to the description.
Freshness: Last updated 2024-07-18 15:04:07; freshness should be verified.
Geography: Primarily Algeria and Morocco, with slang from other Arabic-speaking countries.

License is unknown; users should verify terms of use before downloading.

Audio Arabic Speech Audio Transcription Darija Dialect Speech Recognition

Darija Stt Mix

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info