DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,590 datasets

Speech & Audio

Multi-Instrumental MIDI Files with Monophonic Melodies and Segment Labels

Mono Segments contains over 310,000 multi-instrumental MIDI files selected from the Discover MIDI Dataset. The dataset is enriched with lead monophonic melodies and high-precision structural segment labels, created by author asigalov61.

AudioMidi SegmentationMonophonic MelodyMidi SegmentsMusic SegmentationLanguageenLicensecc By Nc Sa 40Size Categories100 Kn1 MTask Categoriesaudio ClassificationSegmentsMidiSotaMulti InstrumentalMusic SegmentsRegionus+1

0 views

Speech & Audio

STOMA: 23 Hours of Multi-Speaker Greek Speech for TTS

STOMA is a multi-speaker Greek speech corpus containing approximately 23 hours of studio-recorded read speech. It features audio from six native speakers (three male and three female), captured under controlled studio conditions to ensure high signal quality.

OPTIMIZED-PARQUETParquetSize Categories10 Kn100 KText To SpeechTask Categoriestext To SpeechLibrarypolarsLanguageelModalitytextLibrarymlcroissantTask Categoriesaudio ClassificationSpeech CorpusLibrarydatasetsLibrarypandasLicensecc By 40Neural TtsGreek LanguageRegionusTask Categoriesautomatic Speech RecognitionAnnotations Creatorsexpert Generated+1

0 views

Speech & Audio

Moroccan Darija Speech Recognition Dataset

Moroccan Darija ASR Dataset Split is a speech corpus for Automatic Speech Recognition, published on the Hugging Face platform by mohamedmou. The dataset was last updated on May 1, 2026, but its specific size, content, and collection methodology are not detailed in the available metadata.

AudioDarijaMoroccan ArabicSpeech Recognition+1

0 views

Speech & Audio

Music Audio Data

I music is a dataset hosted on Kaggle. Its specific content and scope are not detailed in the available metadata. The dataset's origin, size, and creation date are unknown.

AudioEntertainment+1

0 views

Speech & Audio

Spotify SASRec v1 Results: Sequential Recommendation Model Outputs

Results from the SASRec v1 model applied to Spotify data. The dataset is hosted on Kaggle. The specific content, size, and creation details are not provided in the metadata.

TabularSasrecSpotifyMusic RecommendationSequential Recommendation+1

0 views

Speech & Audio

Uyghur Speech Data with 2,157 Audio Files

Uyghur language speech recordings for natural language processing tasks. The dataset contains 2,157 audio files in MP3 format, totaling 3.03 GB, created by user 'anke01' and last updated on February 26, 2026.

AudioSpeech SynthesisUyghur LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

CoversBR: Metadata and Features for Brazilian Cover Song Identification

CoversBR is a large audio database focused predominantly on Brazilian music for cover and live song identification tasks. It comprises metadata and extracted features from 102,298 songs, organized into 26,366 cover groups, totaling approximately 7,070 hours of audio. The dataset is provided by Dirceu G Silva via AWS Open Data, but the original audio files are not included due to copyright restrictions.

AudioMusic Information RetrievalCopyright MonitoringAudio FeaturesCover Song IdentificationLive Song IdentificationBrazilian MusicMusic Features DatasetMusicMusic Recognition+1

0 views

Speech & Audio

MusicX: Audio Data Collection

MusicX is a dataset hosted on Kaggle. Its specific content and scale are not detailed in the provided metadata. The dataset likely contains audio data or features related to music, based on its title.

AudioAudio Analysis+1

0 views

Speech & Audio

Scotts Bluff National Monument Land Ownership and Boundary Data

ESRI shape files from the National Park Service Land Resources Division detail property ownership and interests. The data is intended for displaying NPS-owned lands and areas with scenic easements or rights of way. It was last updated on March 4, 2026.

GeospatialZIPParcelsCadastreMidwest RegionOwnershipNational Park ServiceTractsLand Status OwnershipPlanning CadastreLand OwnershipBoundariesBoundaryUnited StatesScotts Bluff National MonumentNebraskaScblLrdcab+1

0 views

Speech & Audio

Group 8 Audio Dataset: Urban Sound Recordings

Group 8 Audio Dataset Urban is a collection of audio recordings published on Kaggle. The title suggests the dataset likely contains sounds recorded in urban environments. Specifics regarding size, format, and creation details are unavailable from the provided metadata.

AudioUrban SoundsEnvironmental Audio+1

0 views

Speech & Audio

Luganda Mental Health Dialogues: Simulated Counselling Sessions

A conversational speech dataset of simulated mental health counselling sessions in Luganda, recorded in Uganda. It features dialogues between a Helper (counsellor) and a Seeker (client) discussing mental health topics. The dataset is designed for research in automatic speech recognition, speaker diarization, and speaker role classification for a low-resource African language.

AudioParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskModalitytextCode SwitchingLibrarymlcroissantLicensecc By Sa 40LibrarydatasetsLicensecc By 40DiarizationRegionusTask Categoriesautomatic Speech RecognitionGender ClasificationAutomatic Speech Recogniton+1

0 views

Speech & Audio

Survey on Relative Deprivation and Working Women in Newton, Massachusetts, 1978-79

405 adults aged 25 to 40 living in Newton, Massachusetts, were interviewed for this study on relative deprivation. The data, collected by ABT Associates of Cambridge, includes demographic information, job details, domestic arrangements, attitudes toward women's work, and depression scale scores. The study was designed by Faye J. Crosby to compare housewives and employed men and women in high and low prestige occupations.

TabularOccupational PrestigeSurvey DataPsychologyRelative DeprivationSocial PsychologyWorking Women+1

0 views

Speech & Audio

KIIS Music Recommendation Dataset

The KIIS Music Recommendation Dataset is hosted on Kaggle. Its specific contents, such as the number of user interactions or song entries, are not detailed in the available metadata. The dataset likely contains information related to music listening and user preferences for recommendation tasks.

TabularAudioMusic RecommendationUser BehaviorRecommender Systems+1

0 views

Speech & Audio

KIIS CSV Music Recommendation Dataset

Kaggle hosts the KIIS CSV Music Recommendation dataset. The title suggests it contains data for building music recommendation systems, likely involving user interactions with songs or artists. The dataset's author, organization, and specific details are unknown.

TabularAudioMusic RecommendationUser BehaviorCollaborative Filtering+1

0 views

Speech & Audio

Sesotho Speech Corpus with Orthographic Transcriptions

A corpus of orthographically transcribed broadband speech for Sesotho, one of South Africa's eleven official languages. It was created by researcher Febe de Wet and the NCHLT project, with transcriptions provided in XML format. The dataset was last updated in March 2026.

TextAudioSpeech CorporaAnnotated Monolingual Speech CorpusSpeech CorpusTranscribedNatural Language ProcessingSesothoSouth African LanguagesTranscribed Speech+1

0 views

Speech & Audio

Afrikaans Speech Corpus With Orthographic Transcriptions

Orthographically transcribed broadband speech for Afrikaans, one of South Africa's eleven official languages. Transcriptions are provided in XML format. The corpus was authored by Febe de Wet and was last updated in March 2026.

TextAudioSpeech CorporaAfrikaans SpeechAnnotated Monolingual Speech CorpusSpeech CorpusSouth AfricaNatural Language ProcessingAfrikaansOrthographic Transcription+1

0 views

Speech & Audio

NCHLT English Auxiliary Speech Corpus With Orthographic Transcriptions

Orthographically transcribed broadband speech is provided for each of South Africa's eleven official languages. Transcriptions are available in XML format. The corpus was authored by Laura Martinus and last updated in March 2026.

TextAudioEnglishSpeech CorporaAnnotated Monolingual Speech CorpusSpeech CorpusMultilingual SpeechSouth AfricaNatural Language ProcessingTranscribed Speech+1

0 views

Speech & Audio

UA_ASR: Ukrainian Speech Recognition Data

UA_ASR is a dataset hosted on Kaggle. The title suggests it contains audio data for Ukrainian automatic speech recognition. The dataset's specific size, collection method, and detailed contents are unknown from the provided metadata.

AudioSpeech Recognition+1

0 views

Speech & Audio

ASR V3: Full Training Data for Automatic Speech Recognition

ASR V3 Full Training Data is a dataset for training automatic speech recognition models, hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the available metadata. Its intended use is likely for developing and benchmarking speech-to-text systems.

AudioTraining DataSpeech Recognition+1

0 views

Speech & Audio

Emotional Speech Data with Acoustic and Speaker Features

Kaggle hosts an Emotional Speech Dataset. It contains acoustic, speaker, and emotion-based features for adaptive speech emotion analysis. The author, organization, and specific data scale are not provided in the input metadata.

TabularAudioSpeech EmotionAcoustic FeaturesSpeaker Characteristics+1

0 views

PreviousPage 48 of 130Next