DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

Dots.tts SOAR Models: Text-to-Speech Model Files

Official model files for the Dots.tts text-to-speech system are hosted on Kaggle. The description indicates these are Hugging Face model files from the rednote-hilab organization. Specific details on the model architecture, training data, and performance are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisHuggingface Models+1

0 views

Speech & Audio

Dots.tts: Text-to-Speech Model Files from Rednote-Hilab

Official model files for the Dots.tts text-to-speech system, hosted on Hugging Face. The dataset appears to be a collection of model weights and configuration files for speech synthesis. It was uploaded to Kaggle by the rednote-hilab organization, though the specific creation date and update history are not provided.

AudioText To SpeechSpeech SynthesisHuggingface Models+1

0 views

Speech & Audio

Dots.tts: Text-to-Speech Model Files from Rednote-hilab

Official rednote-hilab Dots.tts model files for Hugging Face. The dataset consists of files for a text-to-speech model. The dataset's author, organization, and specific model details are not provided in the available metadata.

AudioText To SpeechSpeech SynthesisHuggingface Models+1

0 views

Speech & Audio

LibriSpeech: A Large-Scale Corpus of Read English Speech

LibriSpeech is a large-scale corpus of read English speech derived from audiobooks. The dataset is published on Kaggle, but specific details about its size, creation date, and original authors are not provided in the available metadata. Its content likely contains audio files and corresponding transcriptions for speech processing tasks.

AudioMachine LearningSpeech CorpusSpeech Recognition+1

0 views

Speech & Audio

Sun-Illuminated Sea Floor Topography off Eastern Cape Cod, 4-Meter Resolution

November 1998 data collected by the USGS survey 98015 aboard the Canadian Coast Guard vessel Frederick G. Creed. This set is a sun-illuminated topographic image of the sea floor offshore eastern Cape Cod, Massachusetts, created from multibeam sonar data. The image has a 4-meter pixel size and was reprojected into the Massachusetts State Plane coordinate system in September 2006.

ImageAudioGeospatialSun Illuminated TopographyComputer VisionCoastal MappingMultibeam SonarMarine Geology+1

0 views

Speech & Audio

Northeast US Benthic Fauna Surveys from 1881

Northeast US coastal waters contain benthic fauna data collected from 1881 to the present by National Marine Fisheries Service laboratories. The dataset includes 21,000 sample sites with parameters like depth, sediment type, species name, and abundance. Major studies incorporated are Ocean Pulse, the Northeast Monitoring Program, and surveys of the New York Bight and Long Island Sound.

TabularAudioTime SeriesOcean MonitoringBenthic FaunaNortheast Us CoastMarine Biology+1

0 views

Speech & Audio

Gemini 2.5 Pro TTS Voice Profiles: 21 Identities with Emotion Scores

28,946 high-quality voice acting samples generated with Gemini 2.5 Pro Preview TTS, organized into 21 voice identities. Each sample is annotated with 59 Empathic Insight Voice Plus emotion/quality scores, BUD-E Whisper audio captions, and word-level timestamps. The dataset was created by 'laion' and last updated on March 22, -2026.

AudioMultimodalText To SpeechTask Categoriestext To SpeechVocal BurstsBude WhisperAudio CaptionsTask Categoriesaudio ClassificationEmotion AnalysisVoice ProfilesLicensecc By 40Gemini TtsRegionusEmotionFAISSSemantic SearchEmpathic InsightVoice ActingSynthetic+1

0 views

Speech & Audio

Saint Kitts and Nevis Education Indicators: UNESCO SDG 4 Metrics

Education, demographic, and socio-economic indicators for Saint Kitts and Nevis are provided by UNESCO, with the latest update in March 2026. The data aggregates national-level metrics specifically aligned with Sustainable Development Goal 4 (SDG 4) and other policy-relevant frameworks.

IndicatorsEducationSustainable Development Goals SdgDemographicsSocioeconomicsSustainable Development+1

0 views

Speech & Audio

Global Spanish Speech Recordings from 20+ Countries

81 audio files of Mexican Spanish are included in this dataset of high-quality, real-world speech recordings from native Spanish speakers. The dataset is provided by SilencioNetwork and was last updated on March 30, 2026. It aims to cover the global Spanish-speaking population of over 500 million people across Europe and Latin America.

AudioMultilingualAUDIOFOLDERTask Categoriestext To SpeechSpeech DataSpanish AccentsAudio ClassificationSize Categoriesn1 KModalitytextLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsGlobal SpanishLicensecc By Nc 40Spanish SpeechRegionusLarge ScaleTask Categoriesautomatic Speech RecognitionLanguageesCastilianAutomatic Speech RecognitionEuropean Spanish+1

0 views

Speech & Audio

TTS-German: High-Quality German Speech Dataset for Synthesis and Recognition

TTS-German is a high-quality German speech dataset containing 670,509 audio samples totaling 1,250 hours, derived from the CML-TTS German source. The dataset was processed by datadriven-company, with the last update recorded on March 13, 2026. Audio files are standardized to 24kHz mono WAV format, segmented to a maximum of 12 seconds, and include phoneme transcriptions.

AudioParquetText To SpeechTask Categoriestext To SpeechLibrarypolarsLibrarydaskSize Categories1 Mn10 MGerman LanguageSpeech SynthesisModalitytextLibrarymlcroissantLibrarydatasetsLicensecc By 40AudiobooksRegionusTask Categoriesautomatic Speech RecognitionProcessedAudio ProcessingLanguagedeAutomatic Speech Recognition+1

0 views

Speech & Audio

Phonk Music Audio Dataset for Generative Audio Model Training

Hundreds of hours of high-quality Phonk music, including Drift Phonk and Hard Phonk subgenres, have been scraped and pre-processed for machine learning. The dataset was created by Prhokbvf556 and last updated on Hugging Face in April 2026. It is formatted for efficient training on hardware like TPUs and GPUs.

AudioSize Categories100 Kn1 MTask Categoriesaudio ClassificationPhonk MusicGenerative MlRegionusMusic DatasetAudio Generation+1

0 views

Speech & Audio

Higgs Audio v3 TTS 4B: Text-to-Speech Model Files

Model files for the Higgs Audio v3 TTS 4B text-to-speech system, hosted on Kaggle for serving via SGLang-Omni. The dataset's author, organization, and last update date are unknown. The specific contents and scale of the model files are not detailed.

AudioText To SpeechMachine LearningAi ModelsAudio Synthesis+1

0 views

Speech & Audio

Higgs Audio v3 TTS 4B: Text-to-Speech Model Files for vLLM-Omni

Higgs Audio v3 TTS 4B model files are for serving text-to-speech functionality on the vLLM-Omni framework via Kaggle. The dataset's author, organization, and specific data characteristics are not provided in the description. The last update date and dataset size are also unknown.

AudioText To SpeechSpeech SynthesisAi ModelsAudio Generation+1

0 views

Speech & Audio

Multi-SNR Speech Dataset for Audio Processing

A speech dataset containing audio samples at multiple signal-to-noise ratio levels. The dataset is hosted on Kaggle, but its specific size, collection method, and creator are unknown. Content and structure require verification after download.

AudioMachine LearningSignal To Noise RatioAudio Processing+1

0 views

Speech & Audio

Output TTS: 100 Text-to-Speech Audio Samples

100 audio samples likely generated by a text-to-speech system. The dataset is hosted on Kaggle, but its author, creation date, and specific source are not documented. Column details and file formats are unknown, limiting initial assessment.

AudioText To SpeechSpeech SynthesisAudio Samples+1

0 views

Speech & Audio

Merge_no_music_test2_from_ytb: Audio Samples for Speech Processing

A collection of audio data sourced from YouTube, as indicated by the title. The dataset's specific content, size, and collection methodology are not detailed in the provided metadata. Its origin from the Kaggle platform suggests it is intended for machine learning applications.

AudioYoutube SourcedSpeech AnalysisAudio Processing+1

0 views

Speech & Audio

Sound Ordinance Permit Applications for Austin Events

City of Austin records detail sound ordinance permit applications for events like concrete pourings and outdoor music venues. The data includes application status, case numbers, event descriptions, applicants, dates, and locations. Information is sourced from the city's AMANDA database managed by Development Services.

GeospatialDevelopment ServicesLocationTransit NetworkSound Ordinance PermitsTransportationEconomyCity Of Austin+1

0 views

Speech & Audio

Zindi ASR Challenge: Speech Recognition Dataset

zindiasrchallenge is a dataset hosted on Kaggle. The title suggests it is likely associated with an Automatic Speech Recognition (ASR) challenge, potentially focusing on Indian languages. The dataset's specific content, size, and origin are not detailed in the provided metadata.

AudioIndian LanguagesChallenge DatasetAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Google WAXAL: Automatic Speech Recognition Dataset

Google WAXAL ASR Dataset is a collection of audio data for automatic speech recognition tasks. It was published on Kaggle, but its specific size, creation date, and detailed content are not provided in the available metadata. The dataset's author, organization, and license information are unknown.

AudioGoogleAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

Google WAXAL ASR Challenge: Automatic Speech Recognition Data

A dataset from Kaggle associated with the Google WAXAL Automatic Speech Recognition (ASR) challenge. The dataset likely contains audio recordings and transcriptions for training and evaluating ASR models. Specific details on size, origin, and collection date are not provided in the available metadata.

AudioMachine Learning ChallengeSpeech DataAudio ProcessingAutomatic Speech Recognition+1

0 views

PreviousPage 35 of 130Next