DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

MIX-HI-EN-TTS: Bilingual Multi-Speaker Speech Synthesis Data

SKT AI LABS sorted this multi-speaker bilingual speech synthesizer dataset. The dataset is intended for text-to-speech applications. It was last updated on 2026-05-19.

AudioText To SpeechAi LabsSpeech SynthesisMulti SpeakerBilingual+1

0 views

Speech & Audio

ViYT-Diar: Vietnamese YouTube Audio for Speaker Diarization Benchmarking

ViYT-Diar is a manually annotated audio dataset extracted from Vietnamese YouTube videos. It is designed as a test benchmark for evaluating Speaker Diarization models on in-the-wild data. The dataset was created by author tuanduy1612 and last updated on 2026-04-03.

AudioOPTIMIZED-PARQUETParquetLibrarypolarsYoutubeTask Categoriesvoice Activity DetectionModalitytimeseriesSize Categoriesn1 KModalitytextLibrarymlcroissantVietnameseLibrarydatasetsBenchmarkLibrarypandasSpeech ProcessingRegionusVietnamese AudioLicenseapache 20Speaker DiarizationLanguagevi+1

0 views

Speech & Audio

German Speech Audio with English Translations for ASR and TTS

A multi-source collection of German speech audio paired with transcriptions and English translations, curated by aman4014. The dataset is designed for training and evaluating Automatic Speech Recognition, Speech Translation, and Text-to-Speech systems. It was last updated on March 30, 2026.

TextAudioMultilingualText To SpeechMachine TranslationLarge ScaleAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

My Speech Dataset

My Speech Dataset is a collection of audio recordings, likely containing human speech. It is published on the Kaggle platform. The dataset's specific content, size, and origin are not detailed in the available metadata.

Audio+1

0 views

Speech & Audio

Kallaama Wolof ASR Corpus: Speech Data for Automatic Recognition

Kallaama Wolof ASR corpus is a dataset for automatic speech recognition in the Wolof language. The dataset is hosted on Kaggle, but detailed metadata such as size, format, and collection details are not provided. Its content likely consists of audio recordings and corresponding transcriptions for training speech models.

AudioWolof LanguageNatural Language ProcessingAudio CorpusSpeech Recognition+1

0 views

Speech & Audio

Massachusetts Coastal Public Access Sites with Photos and Amenities

GIS point data from the Massachusetts DEP Waterways Program shows locations licensed under Chapter 91 for public access. Each site includes hyperlinks to photos and licenses, as well as a list of amenities like walkways and boat ramps. The dataset supports the Commonwealth's goal to preserve public rights in tidelands and waterways.

GeospatialGis PointsLegal LicensingCoastal management+1

0 views

Speech & Audio

Massachusetts Coastal Oceanographic Time Series Data from USGS

Hourly time-series oceanographic data for the Massachusetts coast, collected by the USGS or used in its projects, is available online through the USGS Coastal Marine Time Series Browser. The data includes variables such as current, temperature, pressure, conductivity, and light transmission. Specific deployments range from July 1980 to the present, with a long-term observation series beginning in January 1990.

Time SeriesGeospatialMarine ScienceUSGSMassachusetts CoastFinanceCoastal Oceanography+1

0 views

Speech & Audio

LaikoSet: Greek Laïko Music Metadata for Analysis

Structured metadata for Greek laïko music tracks is provided for research and machine learning. The dataset includes fields for emotion, era, and genre but does not contain audio files. It was created by author christosfouk and was last updated on 2026-04-16.

TabularAudioLaiko MusicMusicologyGreek MusicMusic Metadata+1

0 views

Speech & Audio

CommonVoice22 Sidon Dacvae: Speech Audio Converted to VAE Latents

CommonVoice 22 speech data enhanced by Sidon and converted into DAC VAE latent representations. The dataset is provided by TTS-AGI and was last updated on March 22, 2026. Each sample includes original FLAC audio, a corresponding latent vector, and metadata.

AudioMultimodalMachine LearningTts ResearchSpeech ProcessingAudio Representation+1

0 views

Speech & Audio

Language Proficiency Speech Dataset for Assessment

Speech records intended for assessing language proficiency. The dataset is hosted on Kaggle, a platform for data science competitions and projects. Its specific content and scale require verification after download.

AudioAudio AssessmentLanguage Proficiency+1

0 views

Speech & Audio

Cleaned Audio-Text Corpus for Mooré Speech Processing

Moore Speech Corpora provides aligned audio and text data for the Mooré language (ISO 639-3: mos), curated for low-resource speech processing. The dataset is cleaned and denoised to support text-to-speech and automatic speech recognition research. It was created by goaicorp and last updated in July 2025.

OPTIMIZED-PARQUETParquetSize Categories1 Kn10 KTask Categoriestext To SpeechLibrarypolarsLibrarydaskModalitytextLibrarymlcroissantLibrarydatasetsLicensecc By Nc 40RegionusTask Categoriesautomatic Speech RecognitionLanguagemos+1

0 views

Speech & Audio

MIMII Pump: Acoustic Audio for Fault Detection at All SNRs

Pump audio dataset designed for acoustic classification tasks, likely containing recordings of mechanical equipment. The dataset is hosted on Kaggle and appears to be part of the MIMII (Malfunctioning Industrial Machine Investigation and Inspection) initiative. Recordings likely include various signal-to-noise ratio (SNR) conditions to simulate real-world industrial environments.

AudioPumpMimiiMechanical FaultAcoustic Classification+1

0 views

Speech & Audio

WaxalNLP: Multilingual African Speech Corpus

The Waxal dataset is a large-scale multilingual speech corpus specifically designed for African languages. It was created to facilitate research in improving the accuracy and fluency of speech and language technologies across the continent. The dataset supports both Automated Speech Recognition (ASR) and Text-to-Speech (TTS) tasks.

Source DatasetsoriginalTask Categoriestext To SpeechLanguageamhLanguageachLanguage Creatorscreator 1LanguagebauLanguageakaLanguagedgaSource Datasetsdigital Umugandaafri VoiceMultilingualitymultilingualLanguagedagTask Categoriesautomatic Speech RecognitionSource Datasetsugspeech Data+1

0 views

Speech & Audio

Oceanus Cruise 34: High-Resolution CTD/STD Ocean Profiles

NCEI Accession 8400047 contains CTD and STD data from R/V OCEANUS Cruise 34, which took place from September 22 to October 3, 1977. The data were received from Dr. Carl Wunsch at MIT and processed by the Woods Hole Oceanographic Institution into the NODC standard High-Resolution F022 format. The dataset provides nearly continuous vertical profiles of temperature, salinity, density, and other parameters at depth intervals as fine as 1 meter, along with station metadata and environmental conditions.

TabularTime SeriesOceanographyAtlantic OceanCtd ProfilesSea Water Properties+1

0 views

Speech & Audio

Music Score Arrangements with Audio Alignment Metadata

Operation Legato contains 22,060 tokenized music score arrangements sourced from MuseScore. The dataset was created by user hidude562 and was last updated on the platform in April 2026. Each record includes arrangement metadata and alignment information with original audio references.

TextAudioMultimodalParquetSize Categories10 Kn100 KTask Categoriestext GenerationMusescoreTranscriptionLibrarypolarsLibrarydaskArrangementLicensecc0 10Music ArrangementsModalitytextModalitytabularLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsAudio AlignmentOrchestraRegionusPianoScoreMusicxml+1

0 views

Speech & Audio

MusicBrainz Metadata for Analytics and Entity Tagging

Music metadata extracted from the MusicBrainz API. The dataset is intended for analytics and entity tagging tasks. Its specific size, scope, and update frequency are not detailed in the provided description.

TabularAudioAudio AnalyticsMusicbrainzMusic Metadata+1

0 views

Speech & Audio

Batoul Music Genre Model: Audio Data for Genre Classification

Batoul_music_genre_model is a dataset published on Kaggle. The title suggests it contains audio data or features for music genre classification tasks. The dataset's specific contents, size, and creation details are not provided in the available metadata.

AudioMachine LearningAudio ClassificationMusic Genre+1

0 views

Speech & Audio

Hank Hehmsoth MacDowell and Norton Stevens Fellowship Materials, 2011-2012

Documentation related to composer and pianist Hank Hehmsoth's MacDowell Colony Fellowship in 2011 and Norton Stevens Fellowship in 2012. Materials include official correspondence, photographs, musical scores, recordings, and related documentation associated with the residency and fellowship. The dataset was harvested by the Texas Data Repository from a Dataverse source.

MultimodalArts ArchiveFellowship DocumentationMusic CompositionMacdowell Colony+1

0 views

Speech & Audio

Desert Dances Musical Score by Hank Hehmsoth from MacDowell Collection

Hank Hehmsoth's musical score for the composition Desert Dances, which contributed to his selection as a MacDowell Colony Fellow in 2011 and a Norton Stevens Fellow in 2012. The score is part of the permanent collection of the James Baldwin Library at MacDowell in Peterborough, New Hampshire. This repository copy is provided for scholarly, archival, and research purposes.

TextScholarly ArchiveMacdowell CollectionMusical ScoreAmerican Composer+1

0 views

Speech & Audio

Data Management and Sharing Plan for Perovskite Solar Cell Research

A Data Management and Sharing Plan outlines the strategy for handling scientific data from a perovskite solar cell research project. Authored by Jinsong Huang, the plan was last updated on May 11, 2026. It describes the data to be generated and the framework for its management and sharing.

TextData Management PlanPerovskite Solar CellsMaterials ScienceResearch DataSynthetic+1

0 views

PreviousPage 33 of 130Next