Description

Aggregating 227 hours of Spanish speech data recorded by native speakers from Spain, Mexico, and Venezuela via mobile phones. The recordings, made in quiet environments, cover fields like economy, entertainment, and news, with all texts manually transcribed to 95% sentence accuracy.

Use Cases

Train an automatic speech recognition (ASR) model on manually transcribed Spanish audio from Spain, Mexico, and Venezuela.
Analyze acoustic or phonetic variations in Spanish speech data recorded via mobile phone across different regional accents.
Build a language model for Spanish using the transcribed text covering fields like economy, entertainment, and news.
Develop a dataset for benchmarking ASR systems on mobile-recorded audio with a known 95% sentence accuracy for transcriptions.

Strengths

227 hours of audio data provides substantial material for speech model training.
Manually transcribed text with a reported 95% sentence accuracy ensures high-quality labels.
Recordings from native speakers in Spain, Mexico, and Venezuela offer regional dialect diversity.
Recordings were made in quiet environments, which can improve audio clarity for model training.

Limitations

The dataset is described as a sample, suggesting it may not be the complete collection.
The specific audio file formats, sampling rates, and speaker demographics are unknown.
The 95% sentence accuracy for transcriptions indicates some level of label noise or error.

Provenance

Source: Nexdata
Collection Method: Recorded by Spanish native speakers reading texts via mobile phone in quiet environments.
Freshness: The dataset was last updated on 2025-04-24.
Geography: Spain, Mexico, Venezuela

The input description states this is a sample of a paid dataset; access to the full dataset may require purchase. The specific license terms are unknown.

AUDIOFOLDER Modalityaudio Size Categoriesn1 K Modalitytext Librarymlcroissant Librarydatasets Regionus

Spanish Speech Recordings from Mobile Phones Across Three Countries

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info