DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Pittsburgh: Urban and Geographic Data

Pittsburgh is a dataset published on Kaggle. The specific content, size, and features are not described in the provided metadata. The actual data requires download and inspection to determine its scope and utility.

TabularGeospatialPittsburghCityUrban Data+1

0 views

Speech & Audio

Spotify Data for Strategic Music Insights

Spotify data aggregated for analysis, likely containing metrics related to music tracks, artists, and listener engagement. The dataset appears to be sourced from the Kaggle platform, but specific details on volume, author, and update frequency are not provided. Its purpose is to turn raw streaming data into actionable insights for the music industry.

TabularAudioMusic IndustrySpotify+1

0 views

Speech & Audio

Anyplace But Here: African American Migration Narratives from 1945 to 1966

Anyplace But Here is a historical text originally published in 1945 and revised in 1966. The work details the African American search for a home in the North through stories of real individuals, covering themes of hope and disappointment. It includes chapters on figures like Marcus Garvey, Malcolm X, and events in Detroit, Chicago, and Watts.

TextNarrativeHistoryMigrationSocial HistoryComputer ScienceDisappointmentPsychologyWorld Wide WebTheme ComputingAfrican AmericanSocial Psychology+1

0 views

Speech & Audio

The Hired Money: French Debt to the United States, 1917-1929

1917-1929 coverage of the financial debt owed by France to the United States. The dataset is sourced from paperswithcode and is described in the context of a historical narrative about American expatriates in Paris. The license is closed, and other metadata such as author and update date are unknown.

TextHistoryPaintingArt HistoryLawEconomicsChoseArtEconomic HistoryPolitical ScienceNegotiationDebtPolitics+1

0 views

Speech & Audio

Librispeech Synth 300h: Synthetic Speech Audio from LibriSpeech

Librispeech Synth 300h is a speech synthesis dataset derived from the LibriSpeech corpus. It likely contains up to 300 hours of synthetic audio generated from a maximum of 20 speaker voices. The dataset is hosted on Kaggle.

AudioSpeech SynthesisAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

XTTS Real Audio Dataset

XTTS Real Audio Dataset is a collection of audio data published on Kaggle. The dataset likely contains audio samples intended for training or evaluating text-to-speech models. Its specific contents, size, and collection methodology require verification after download.

AudioText To SpeechMachine LearningSpeech GenerationAudio Synthesis+1

0 views

Speech & Audio

XTTS Vietnamese Dataset for Speech Synthesis

xtts-vietnamese-dataset is a dataset hosted on Kaggle. Its title suggests it contains data for training or fine-tuning text-to-speech models for the Vietnamese language. The dataset's author, organization, size, and specific contents are not detailed in the provided metadata.

TextAudioText To SpeechSpeech SynthesisVietnamese Language+1

0 views

Speech & Audio

Latin Music Playlists Featuring Reggaeton, Salsa, Bachata, and Merengue

A collection of music tracks from Latin genres including Reggaeton, Salsa, Bachata, and Merengue. The dataset is hosted on Kaggle, but details about its author, size, and creation date are not provided. Its contents likely include track identifiers and metadata for playlist analysis.

TabularAudioLatin MusicMusic GenresAudio TracksPlaylists+1

0 views

Speech & Audio

TTS_Fluer_LJspeech_Dataset: Text-to-Speech Audio Samples

TTS_Fluer_LJspeech_Dataset is a Kaggle-hosted collection likely intended for speech synthesis research. The dataset's title suggests it may combine or relate to the Fluer and LJ Speech audio corpora, which are common benchmarks in text-to-speech. Published on Kaggle, its specific content, size, and structure require verification after download.

AudioText To SpeechAudio DatasetSpeech Synthesis+1

0 views

Speech & Audio

Speech-to-Text Transcription Dataset with Acoustic Features

A multimodal dataset for speech recognition tasks. The description suggests it contains acoustic features relevant to speech-to-text transcription. Its origin, size, and temporal coverage are unknown.

AudioMultimodalAcoustic FeaturesSpeech Recognition+1

0 views

Speech & Audio

ieugwasr: Interface to the OpenGWAS Database API

An API wrapper and interface for the OpenGWAS database, which likely contains genome-wide association study results. The interface was created by Gibran Hemani and provides convenience functions for specific queries.

TabularApi InterfaceInterface MatterOperating SystemComputer ScienceDatabaseBioinformaticsGenome-wide association study+1

0 views

Speech & Audio

TTS Dataset: Text-to-Speech Audio Samples

A dataset likely containing audio samples and corresponding text transcripts for text-to-speech tasks. It is hosted on Kaggle, but its specific size, origin, and creation date are unknown. The author and organization details are not provided.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Nepali ASR Benchmark: 100 Hours of High-Fidelity Speech

Approximately 13,500 audio segments totaling around 100 hours of Nepali speech, professionally curated for Automatic Speech Recognition research. The dataset, created by 'tonibirat', contains high-fidelity 16kHz, 16-bit mono WAV files, with segments typically 15-20 seconds long. It was last updated on Hugging Face in January 2026.

AudioNepali LanguageBenchmarkLarge ScaleSpeech RecognitionAudio Benchmark+1

0 views

Speech & Audio

TTS-LJSPEECH-ANIMAN: Likely Text-to-Speech Audio Data

TTS-LJSPEECH-ANIMAN is a dataset hosted on Kaggle. Its title suggests a connection to text-to-speech synthesis, potentially using or extending the LJ Speech corpus. The dataset's specific content, size, and origin are not detailed in the available metadata.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Women Safety Distress Audio Dataset

An audio dataset focused on women's safety and distress signals, published on Kaggle. The dataset's specific content, such as the number of clips or recording conditions, is not detailed in the available metadata. Its primary purpose is likely for developing or testing audio-based safety and alert systems.

AudioSafetyWomen SafetyDistress Detection+1

0 views

Speech & Audio

Indic Total TTS Merge: 13 Languages with 3.0s Minimum Duration

RidheshBhati's collection merges text-to-speech data for 13 Indic languages, totaling between 100,000 and 1,000,000 records as of March 2026. Every audio clip in the set is filtered to ensure a minimum duration of 3.0 seconds.

OPTIMIZED-PARQUETParquetLibrarypolarsLibrarydaskModalityaudioModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Music Genres Dataset

A dataset concerning music genres, likely containing labels or features for audio classification tasks. It was published on Kaggle, but its specific contents, size, and creation details are not provided in the metadata. The last update date and author are unknown.

TabularMusic Information RetrievalAudio ClassificationMusic Genres+1

0 views

Speech & Audio

Music Emotion Dataset: Multi-Genre Emotional Audio Collection

A collection of audio files tagged with emotional labels across multiple music genres. The dataset is hosted on Kaggle, but its size, specific creation date, and original author are not detailed in the provided metadata. Columns and exact data formats are unknown.

AudioAffective ComputingMusic Emotion+1

0 views

Speech & Audio

Cantonese Storytelling Audio Collection by Zhang Yuekai

A Cantonese audio dataset features storyteller Zhang Yuekai narrating four classic literary works, including 'Romance of the Three Kingdoms' and 'Water Margin'. It is designed for TTS and ASR model training, as well as linguistic and literary research. The dataset contains audio files and corresponding standardized text transcripts.

TextAudioCantoneseLiteratureSpeech SynthesisStorytellingSpeech Recognition+1

0 views

Speech & Audio

Sound Velocity Profiles from North Atlantic Ocean Surveys 2024-2025

Sound velocity profiles were collected in Northern Massachusetts Bay during hydrographic surveys from August 2024 to March 2025. Data were gathered from multiple vessels including MV Northstar Challenger, RV North Cove, RV South Cove, RV Twister, and RV West Cove II. Profiles were recorded at intervals of approximately 2 hours for sound speed profilers and 15 minutes for moving vessel profilers.

North Atlantic OceanOceanographyRv West Cove IiOpr A325 Kr 24Northern Massachusetts Bay84bbn3Rv North CoveMv Northstar ChallengerRv TwisterRv South Cove+1

0 views

PreviousPage 66 of 130Next