DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,580 datasets

Speech & Audio

Music Data for Vietnam, 2015-2026

Vietnam is the geographic focus of this dataset, which appears to contain information related to music from 2015 to 2026. The data is hosted on Kaggle, a platform for data science and machine learning projects. The specific content, collection method, and original author are not detailed in the available metadata.

TabularAudioTime SeriesVietnam+1

0 views

Speech & Audio

Music Data for Vietnam, 2015 to 2026

Vietnam is the geographic focus of this dataset, which appears to contain information related to music from 2015 to 2026. The data is hosted on Kaggle, but the specific contents, size, and collection method are not detailed in the available metadata. The author and original source of the data are unknown.

TabularAudioTime SeriesVietnamCultural Data+1

0 views

Speech & Audio

Indian Text-to-Speech Audio Dataset

An audio dataset for text-to-speech applications, likely containing speech samples for Indian languages. It was created by sakthivarshans and is hosted on the Hugging Face platform. The dataset was last updated on June 6, 2026.

AudioText To SpeechIndian LanguagesSpeech Synthesis+1

0 views

Speech & Audio

Greek Parliament Speech Transcripts

Gr Parliament Speech Dataset is a collection of speech transcripts from the Greek Parliament, published on huggingface by ilsp. The dataset was last updated on June 11, 2026, but its exact size and content details are unspecified.

TextAudioTranscriptsParliamentary SpeechGreek LanguagePolitical Text+1

0 views

Speech & Audio

NSFW TTS Dataset with 30 Speakers and Over 118 Hours of Audio

30 speakers contributed over 300,000 audio clips for text-to-speech synthesis, with individual speaker durations ranging from 5 to 118 hours. The dataset, titled 'Nsfw Tts Dataset 30Speakers', was created by author DMC-ykfx33 and hosted on Hugging Face. It was last updated on April 17, 2026.

AudioText To SpeechAudio DatasetSpeech SynthesisVoice Cloning+1

0 views

Speech & Audio

Lwazi English Telephone Speech Corpus for ASR

South African English audio recordings and transcriptions for developing Lwazi speech recognition systems. The corpus contains telephone-quality audio files at 8 KHz, 16-bit, mono channel, with corresponding orthographic transcriptions in Unicode text format. Researcher Jaco Badenhorst created this dataset, which was last updated in April 2026.

TextAudioTelephone SpeechEnglish LanguageNatural Language ProcessingAudio TranscriptionSpeech Recognition+1

0 views

Speech & Audio

Taphonomic And Use Features For Mollusk Shells From El Mnasra Cave

Emilie Campmas published a dataset in 2026 detailing taphonomic and use-wear features for mollusk shells from US 8 of El Mnasra cave. The dataset records surface preservation, abrasion types, smoothing intensity, and presence of ochre or heating for species including Tritia cf. gibbosula and Columbella rustica. It is a small, specialized archaeological dataset shared as an 11.6 KB XLSX file under a CC BY 4.0 license.

TabularExcelMollusksTaphonomyArchaeologyUse Wear AnalysisPaleontology+1

0 views

Speech & Audio

Shell Bead Specimen Data from El Mnasra Cave

El Mnasra cave archaeological data provides descriptive measurements and condition assessments for shell bead specimens from US 8. The dataset includes features such as morphological type, perforation details, use-wear intensity, and evidence of heating or pigment. It was created by Emilie Campmas and is available as an XLSX file under a CC BY 4.0 license.

TabularExcelAnthropologyArchaeologyMaterial CultureShell Beads+1

0 views

Speech & Audio

WFly-F5TTS: Voices for Text-to-Speech Synthesis

Kaggle hosts the wfly-f5tts-voices dataset. The title and platform tags suggest it contains audio data related to text-to-speech and voice cloning. Specific details on size, format, and origin are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisVoice CloningAudio Generation+1

0 views

Speech & Audio

Yoruba-English Codeswitch ASR Dataset

Published on huggingface by author UmarBaba1, this dataset appears to contain audio data for automatic speech recognition (ASR). The dataset's title suggests it focuses on speech mixing Yoruba and English languages. Its last recorded update was on 2026-05-31.

AudioEnglishMultilingualYorubaCodeswitchingSpeech Recognition+1

0 views

Speech & Audio

Sampleflip MIDI Chord Progressions for Music Production

Sampleflip MIDI Chord Progressions is a collection of 3,764 MIDI files containing chord progressions. The dataset was created by author ronantakizawa and was last updated on 2026-04-21. It is used by SampleFlip for melody derivation and harmonic reference.

AudioMidiChord ProgressionsMusic ProductionAudio Generation+1

0 views

Speech & Audio

NSFW Text-to-Speech Audio Dataset with 30 Characters and 1000+ Hours

A high-quality audio dataset designed for training and fine-tuning NSFW text-to-speech models. It includes over 1000 hours of audio from 30 characters, with annotations for emotion and sound effects. The dataset was created by DMC-ykfx33 and was last updated on Hugging Face in April 2026.

AudioMultimodalText To SpeechNsfw ContentEmotion AnnotationAudio SynthesisSound Effects+1

0 views

Speech & Audio

Music Listening Profiles for 500,000 Users with Top Artists and Playcounts

500,000 user profiles containing top artists, tracks, albums, and playcounts. The dataset includes rankings, user countries, and MusicBrainz IDs where available, created by GabeKahen and last updated on April 28, 2026. It is designed for modeling music taste and analyzing listening behavior.

TabularAudioUser BehaviorRecommender SystemsLarge ScaleMusic ListeningCollaborative Filtering+1

0 views

Speech & Audio

Musicological Term Classification Examples Using Large Language Models

Paolo Bonora's dataset provides examples of classifying musicological terms of interest using Large Language Models (LLMs). The dataset is 59.6 KB in size and was last updated on April 15, 2026. It is available under a CC-BY-4.0 license on figshare.

TextTabularExcelMusicologyTerm ClassificationLlm Evaluation+1

0 views

Speech & Audio

Indian English and Hindi Speech Dataset for Text-to-Speech

73 minutes of curated, single-speaker speech data includes human-checked transcripts and quality control metrics. The dataset is hosted on Kaggle and appears designed for speech synthesis tasks. Specifics on the author, organization, and license are not provided.

AudioHindiText To SpeechAudio DatasetSpeech SynthesisIndian English+1

0 views

Speech & Audio

God Level Music Producer Dataset: 9,941 Examples for LLM Training

9,941 high-quality examples of music production workflows and reasoning, created by author gss1147. The dataset is intended for training large language models to become elite music producers across genres like Rap, Crunk, East Coast Boom Bap, West Coast G-Funk, and Dubstep. It was last updated on April 23, 2026.

TextAudioLlm TrainingLarge ScaleBeat MakingMusic ProductionAudio Synthesis+1

0 views

Speech & Audio

Ancient Music Instrument Audio for Timbre and Spectral Restoration

An audio dataset focused on timbre and spectral restoration, likely containing recordings of ancient musical instruments. It is published on Kaggle, but details about its size, creation date, and authorship are not provided. The dataset's primary purpose appears to be related to audio signal processing tasks involving historical instrument sounds.

AudioSpectral RestorationMusic InstrumentsAudio TimbreAncient MusicTimbre+1

0 views

Speech & Audio

IISc Mono Hindi Female: 54-Hour Studio-Quality Speech Dataset

54 hours 54 minutes of studio-quality Hindi speech from a single professional female voice artist, recorded at 48kHz and 24-bit. The dataset contains 22,058 utterances, split into 21,662 for training and 396 for evaluation, and was created by the Indian Institute of Science (IISc) SYSPIN project. It was uploaded to Hugging Face by user 'somu9' and last updated on April 15, 2026.

AudioHindiText To SpeechSpeech SynthesisMonolingualSingle Speaker+1

0 views

Speech & Audio

Aerograph ASRS: 2,000 Aviation Incident Reports with Extracted Entities

2,000 real NASA Aviation Safety Reporting System (ASRS) incident reports processed for knowledge graph construction. The dataset includes structured entity and relation extractions conforming to an aviation safety ontology, with 10 entity types and 8 edge types. It was created by Aryan95614 and last updated on 2026-04-23.

TextGraphIncident ReportsNlp ExtractionAviation Safety+1

0 views

Speech & Audio

HattSet-12: Arabic Calligraphy Style Dataset

HattSet-12 is a dataset of Arabic calligraphy styles, published on Kaggle. The dataset likely contains images of different Arabic script styles for analysis. Metadata is minimal; the exact number of images, collection method, and specific styles require verification after download.

ImageFont StyleArabic CalligraphyImage DatasetCultural Heritage+1

0 views

PreviousPage 25 of 129Next