DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

F5-TTS: Vietnamese Speech Synthesis Checkpoints

A set of pre-trained model checkpoints for a Vietnamese text-to-speech system named F5-TTS. The dataset likely contains model weights and configuration files necessary for generating synthetic Vietnamese speech. It is hosted on Kaggle under the 'Pre Trained Model' tag.

AudioText To SpeechPre Trained ModelSpeech SynthesisVietnamese Language+1

0 views

Speech & Audio

Music Origin Prediction from Audio Features

GeographicalOriginalofMusic is a dataset for predicting the geographical origin of music based on audio features. It is hosted on the OpenML platform, though specific details on its size, creator, and creation date are not provided in the input. The dataset's primary purpose is to link musical characteristics to specific geographic locations.

TabularAudio FeaturesMusic OriginCultural MusicologyGeographic Analysis+1

0 views

Speech & Audio

30 Music Instrument Audio Samples

Audio samples from 30 different musical instruments, published on Kaggle. The dataset's specific size, recording conditions, and origin are not detailed in the available metadata. Further details about the collection methodology and audio characteristics require verification after download.

AudioMusic InstrumentsAudio ClassificationSound Samples+1

0 views

Speech & Audio

Uzbek Speech-to-Text Evaluation Set with 745 Telegram Voice Messages

745 audio files totaling 1 hour and 40 minutes of Uzbek conversational speech, collected from open Telegram groups. The dataset was created by OvozifyLabs for evaluating speech-to-text models and was last updated on December 10, 2025. It features natural voice messages recorded in diverse acoustic conditions and speaking styles.

AudioUzbek LanguageSpeech To TextBenchmarkConversational SpeechAudio Evaluation+1

0 views

Speech & Audio

Brazilian Diaspora Demographic and Economic Profile in the U.S. and Massachusetts

A demographic and economic profile of Brazilians in the United States and Massachusetts. The dataset likely contains aggregated statistics on population characteristics and economic indicators. It was authored by Alvaro Lima and sourced from the paperswithcode platform.

TabularMedicineUs DemographicsGerontologyGeographyFinanceDemographySociologyBrazilian Diaspora+1

0 views

Speech & Audio

Compiam: Computational Analysis of Indian Art Music Traditions

Compiam provides data and tools for the computational analysis of Indian Art Music (IAM), developed by the Music Technology Group (MTG). Updated in February 2026, the resource focuses on Music Information Retrieval (MIR) tasks specifically tailored for Hindustani and Carnatic musical traditions.

Computational MusicologyMusic Information RetrievalIndian Art Music+1

0 views

Speech & Audio

Thorsten Voice: German Speech Synthesis Data with CC0 License

Thorsten-Voice provides German-language audio recordings and text transcripts for speech synthesis, created by Thorsten Müller and updated in February 2026. The dataset is designed to facilitate the creation of high-quality, offline German text-to-speech (TTS) models without licensing restrictions.

GermanText To SpeechThorsten VoiceSpeech SynthesisSprachsynthese+1

0 views

Speech & Audio

French Educational Speech Transcriptions, 12.82 Hours of Audio

3,933 transcribed audio segments from the French educational domain, totaling approximately 12.82 hours of audio. The dataset was created by MEscriva using the OpenAI Whisper API and was last updated on December 17, 2025.

AudioTranscriptionEducationFrench LanguageSpeech Recognition+1

0 views

Speech & Audio

MIDI Music Collection for Symbolic AI Training

A collection of over 6.74 million unique and deduplicated MIDI files curated for music information retrieval and AI training. The dataset was created by 'projectlosangeles' and was last updated in December 2025. It includes normalized MIDI data and comprehensive metadata for symbolic music analysis.

AudioSize Categories1 Mn10 MMirLanguageenMusic Information RetrievalLicensecc By Nc Sa 40MIDI discoveryMusic AiMidi SearchMidiDoi1057967hf7361RegionusMusic DatasetLarge ScaleMIDI datasetAudio GenerationMusic Discovery+1

0 views

Speech & Audio

Associations Between Color and Music Mediated by Emotion and Tempo

A dataset by Tawney Tsang, published on paperswithcode, investigating cross-modal sensory associations. The data likely contains experimental results linking color perception to musical stimuli, with emotion and tempo as mediating factors. The specific scale, row count, and collection date are not provided in the metadata.

TabularAudioPsychologyColor PerceptionCognitive ScienceCognitive psychologyCommunicationEmotionArtSocial Psychology+1

0 views

Speech & Audio

Biomusicology and Three Biological Paradoxes About Music

A paper discussing the intersection of biology and music, authored by Steven R. Brown and published on the paperswithcode platform. The content likely explores theoretical paradoxes in music from a biological perspective. The dataset's specific format, size, and structure are not detailed in the provided metadata.

TextAudioBiomusicologyEpistemologyPhilosophyMusic Cognition+1

0 views

Speech & Audio

Music Therapy Data for Bereaved Youth Grief Expression

A dataset on paperswithcode related to music therapy interventions for bereaved youth. The data likely contains clinical research materials, potentially including audio recordings and text, authored by Katrina Skewes McFerran. Temporal coverage and specific data volume are not provided in the available metadata.

TextAudioPsychotherapistMusic TherapyDevelopmental PsychologyPsychologyFeelingBereavementClinical ResearchPsychoanalysisGriefSocial PsychologyClinical Psychology+1

0 views

Speech & Audio

Substance Use in Popular Music Videos, Analysis Dataset

Substance Use in Popular Music Videos is a dataset published on paperswithcode. The dataset likely contains analysis of substance use depictions in music videos. The author is Donald F. Roberts.

TabularAudioVideoMedia AnalysisSubstance UseComputer SciencePsychologyArtMusic VideosPsychiatry+1

0 views

Speech & Audio

Barcelona Music Reward Questionnaire: Psychological Survey Data

Barcelona Music Reward Questionnaire data likely contains survey responses related to the psychological experience of music. The dataset is authored by Ernest Mas‐Herrero and is hosted on the paperswithcode platform. The specific number of participants, survey questions, and collection period are not detailed in the available metadata.

TabularAudioQuestionnairePsychologyCognitive psychologyMusic Reward+1

0 views

Speech & Audio

Mirdata: Standardized Python Loaders for Music Information Retrieval

Mirdata provides standardized Python loaders for Music Information Retrieval (MIR) datasets, maintained by the mir-dataset-loaders organization with updates through February 2026. It enables programmatic access to audio files and musical annotations such as beats, chords, and melodies across various research collections.

AudioMirPythonMirdata+1

0 views

Speech & Audio

Dolly-Audio: 1,000 Hours of Multi-Speaker Vietnamese Speech

Dolly-Audio contains 1,000 hours of professionally cleaned Vietnamese speech audio featuring 152 speakers from various regions. Created by the Dolly AI Team and updated in December 2024, the corpus is designed to support speech synthesis and recognition research. It includes both audio recordings and corresponding text transcripts across multiple Vietnamese dialects.

AudioOPTIMIZED-PARQUETParquetText To SpeechLibrarypolarsLibrarydaskModalityaudioModalitytextSize Categories100 Kn1 MLibrarymlcroissantVietnameseLibrarydatasetsRegionusLanguageviSynthetic+1

0 views

Speech & Audio

Pittsburgh Bridges Structural and Material Attributes

Pittsburgh Bridges is a classic dataset from the UCI Machine Learning Repository containing structural and material details for bridges in Pittsburgh, Pennsylvania. It is widely used for classification and regression tasks in civil engineering and machine learning education. The original creator and exact time period are not specified.

TabularTransportation InfrastructureStructural AnalysisCivil EngineeringMaterials Science+1

0 views

Speech & Audio

Balanced Audio Samples For Music Information Retrieval

3,500 balanced audio samples are provided for music information retrieval tasks. Each sample is represented by a 571-feature matrix.

AudioAudio ClassificationMulticlass ClassificationSignal Processing+1

0 views

Speech & Audio

LibriSpeech: A Large-Scale Corpus of Read English Speech

LibriSpeech is a widely used public domain corpus derived from audiobooks. The dataset is published on Kaggle, making it accessible for download and experimentation. Its specific size, version, and update details are not provided in the available metadata.

AudioMachine LearningAudio DataSpeech Recognition+1

0 views

Speech & Audio

Non-Music Audio Samples for Sound Classification

NO music is a dataset published on Kaggle, likely containing audio samples for classification tasks. The dataset's specific content, size, and origin are not detailed in the provided metadata. Its platform tags suggest it is focused on audio data and classification.

AudioAudio ClassificationClassificationNon Music AudioSound Detection+1

0 views

PreviousPage 80 of 130Next