DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Somali ASR Dataset: Audio Recordings and Transcriptions for Speech Recognition | DataSalon

Home Speech & AudioSomali ASR Dataset: Audio Recordings and Transcriptions for Speech Recognition

Speech & Audio

Somali ASR Dataset: Audio Recordings and Transcriptions for Speech Recognition

Name: Somali ASR Dataset: Audio Recordings and Transcriptions for Speech Recognition
Creator: skydheere
Published: 2025-01-15T14:18:35
Keywords: Multimodal Data, Audio, Somali Language, Speech Recognition, Multimodal

by skydheere·Updated 1y ago

Available on 1 platform

Description

10K - 100K audio samples with transcriptions in Somali, designed for automatic speech recognition tasks. The dataset is hosted on Hugging Face by the author 'skydheere' and was last updated on 2025-05-09. It is provided in Parquet format under a CC-BY 4.0 license.

Use Cases

Train automatic speech recognition models based on the described audio recordings and transcriptions.
Evaluate the performance of ASR systems on Somali speech based on the provided audio-text pairs.
Fine-tune pre-trained multilingual speech models for the Somali language based on the described dataset.
Develop speech technology applications for Somali speakers based on the described audio data.

Strengths

Contains between 10,000 and 100,000 samples, providing a substantial corpus for model training.
Includes both audio and text modalities, which is essential for supervised ASR tasks.
Released under the permissive CC-BY 4.0 license, facilitating open use and redistribution.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Freshness: Last updated 2025-05-09 13:26:42; freshness should be verified.

Audio Multimodal Multimodal Data Somali Language Speech Recognition

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

119 downloads

6 likes

0 views

Dataset Info

Author: skydheere
Created: Jan 15, 2025
Updated: May 9, 2025
Last synced: May 8, 2026

Access

Community

119 downloads

6 likes

0 views

Dataset Info

Author: skydheere
Created: Jan 15, 2025
Updated: May 9, 2025
Last synced: May 8, 2026

Somali ASR Dataset: Audio Recordings and Transcriptions for Speech Recognition

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info