DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,587 datasets

Speech & Audio

Maternal Health Audio Recordings in Wolof from Senegal

Audio recordings collected in community settings in Senegal cover topics including family planning, healthcare access, pregnancy practices, and cultural beliefs around maternal and reproductive health. The dataset was created by YUXCulturalAILab and last updated on March 19,我们发现了一个问题。 2026. Recordings were captured using mobile devices or portable recorders in natural conversational conditions, and all transcriptions were manually verified.

AudioHealthcare AccessMaternal HealthHealthcareAudio RecordingsWolof LanguageSENEGAL+1

0 views

Speech & Audio

Yoruba Multi-Speaker Speech Corpus for Text-to-Speech

A Yoruba language speech corpus likely intended for text-to-speech applications. The dataset is hosted on Kaggle and appears to be associated with a Jupyter notebook. The number of speakers, recording hours, and specific collection details are unknown.

AudioText To SpeechSpeech CorpusYoruba LanguageMultispeakerNatural Language Processing+1

0 views

Speech & Audio

SententicDataTTS: Hebrew and English Speech Synthesis Audio with Male and Female Speakers

A bilingual text-to-speech dataset containing Hebrew and English audio generated by male and female speakers. Audio files have been resampled to 44.1kHz and time-stretched to a slower speed. The dataset was created by author notmax123 and last updated on March 30, 2026.

AudioEnglishText To SpeechHebrewSpeech SynthesisAudio GenerationSynthetic+1

0 views

Speech & Audio

Asru Data: Speech and Audio Dataset

Asru Data is a dataset uploaded to HuggingFace by author closerG. The dataset was last updated on 2026-05-14. Its specific content and scale are not detailed in the provided metadata.

AudioAsru+1

0 views

Speech & Audio

ViMedCSS: Medical Speech Recognition Dataset

ViMedCSS is a dataset for medical speech recognition, sourced from the HuggingFace platform and hosted on Kaggle. The dataset's specific size, format, and collection details are not provided in the available metadata. Its primary application appears to be in training or evaluating automated speech recognition systems for clinical or healthcare settings.

AudioMedical SpeechHealthcareSpeech RecognitionHealthcare Ai+1

0 views

Speech & Audio

MOSS TTSD SGLang: Text-to-Speech Assets

MOSS TTSD SGLang assets are hosted on Kaggle. The dataset likely contains audio and text assets for text-to-speech synthesis. Specific details on size, format, and creation are unavailable from the provided metadata.

TextAudioText To SpeechSpeech SynthesisLanguage ModelAudio Assets+1

0 views

Speech & Audio

Massachusetts Building Code Regulations and Compliance

Regulatory text covers structural integrity, fire safety, and energy conservation for all new construction, renovation, and demolition projects in Massachusetts. The code is written by the State Board of Regulations and Standards and administered locally by certified building inspectors. The dataset originates from the SCIOPS organization via the NASA Earthdata platform.

TextAudioMassachusettsConstruction RegulationBuilding CodesPublic Safety+1

0 views

Speech & Audio

Irodori Clones 3M: Text-to-Speech Voice Clones

Irodori TTS Voice Clones is a collection of 2.99 million voice clones for text-to-speech synthesis. It was created by SynDataLab and references the SynDataLab/irodori-refs-10k dataset for source audio. The dataset was last updated on April 23, 2026.

AudioText To SpeechVoice CloningAudio Synthesis+1

0 views

Speech & Audio

Clean Air Act Implementation Plans and Permit Approvals

Fall 2003 documentation details the Massachusetts air quality program's implementation of federal and state Clean Air Acts. The dataset includes regulatory procedures, application forms, fee structures, and review timelines for construction permits. It was published by SCIOPS in 2003.

TextLegal DocumentsAir QualityGovernment Policy+1

0 views

Speech & Audio

Beach Water Quality Monitoring and Public Health Data

Environmental Protection Agency's BEACH Program data focuses on improving public health for beachgoers through five key areas, including pollution prediction and faster water testing. The program, sponsored by the EPA and managed by SCIOPS, provides information on coastal water quality. Specific contact information is available for data related to Massachusetts beaches.

TabularGeospatialEnvironmental monitoringCoastal Water QualityHealthcareBeach SafetyPublic Health+1

0 views

Speech & Audio

Agri STT Benchmarking: Multilingual Agricultural Speech for ASR Models

10,934 real-world audio recordings from Farmer.Chat provide a benchmark for speech-to-text models in agricultural advisory contexts. The dataset is human-annotated and focuses on three Indian languages: Hindi, Telugu, and Odia. Bullseye-4 created this resource, which was last updated on March 20, 2026.

AudioMultilingualBenchmarkingBenchmarkAgricultureSpeech Recognition+1

0 views

Speech & Audio

Google Search Console Data for a Sheet Music Marketplace in May 2026

Google Search Console normalized data from the Tably.es marketplace for May 2026. The dataset likely contains aggregated search performance metrics for the platform. The author, organization, and specific data volume are unknown.

TabularAudioWeb AnalyticsGoogle Search ConsoleMarketplaceSheet Music+1

0 views

Speech & Audio

Scotts Bluff National Monument Vegetation Field Plots Database

Vegetation field plots at Scotts Bluff National Monument were visited, described, and documented in a digital database. The database consists of three parts: Physical Descriptive Data, Species Listings, and Strata Descriptive Data. Information for this metadata was obtained from a USGS site and put into NASA Directory Interchange Format.

TabularGeospatialEcologyNational ParkField PlotsVegetation Mapping+1

0 views

Speech & Audio

DBp: Multimodal Performance Data from Dueling Brains

DBp is a multimodal dataset from the Music-in-Medicine program, recording a Dueling Brains performance. The data includes audio, video, and tabular file formats, totaling approximately 9.8 GB in size. It is openly licensed under CC-BY-4.0 and authored by Maxine Annel Pacheco-Ramírez.

AudioMultimodalPerformanceMusic In MedicineDueling Brains+1

0 views

Speech & Audio

MHp: Multimodal Brain and Audio Data from a Musical Healing Performance

MHp provides a 5.8 GB multimodal dataset capturing a live 'Musical Healing' performance from the Music-in-Medicine program. It likely contains synchronized electroencephalogram (EEG) brain activity recordings and audio data, such as piano music. This dataset supports research into the neurological and physiological effects of therapeutic music interventions.

AudioTime SeriesMultimodalBrainMo BiEegMusic In MedicinePiano+1

0 views

Speech & Audio

Block Scale Rooftop Solar Potential for Orlando

Block-scale rooftop solar technical potential estimates for the city of Orlando, Florida, derived from LiDAR and national parcel data. It includes developable roof area and technical potential in kilowatts, along with the most common building use and occupancy type per block.

Building Roof AreaSolar Technical PotentialPvOrlandoRooftop SolarParcel Scale+1

0 views

Speech & Audio

Common Voice Geo Cleaned: 35 Hours of Georgian Speech for TTS

21,421 cleaned Georgian speech samples totaling 35 hours were curated by NMikka from Mozilla Common Voice 19.0 in 2026. The collection features 24 kHz mono WAV audio from 12 speakers specifically filtered for speech synthesis and recognition tasks.

ParquetSize Categories10 Kn100 KCommon VoiceText To SpeechTask Categoriestext To SpeechLibrarypolarsLibrarydaskTask Categoriesaudio To AudioGeorgianLicensecc0 10Speech SynthesisModalitytextLibrarymlcroissantLibrarydatasetsRegionus+1

0 views

Speech & Audio

Sam Wake Word Audio Clips for Keyword Spotting

Contains audio clips for training a model to recognize the keyword 'Sam'. Each clip is labeled as positive (contains 'Sam') or negative (phonetically similar words). The dataset includes varied speaking styles, speeds, and intonations.

AUDIOFOLDERSize Categories1 Kn10 KLanguageenLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsKeyword SpottingRegionusVoice AssistantLicensemitWake Word+1

0 views

Speech & Audio

AMDp: Multimodal Data from a Musical Dialogue Performance

Maxine Annel Pacheco-Ramírez's dataset contains multimodal recordings from a Music-in-Medicine program performance titled 'A Musical Dialogue'. The data includes brain activity, audio, and video, but emotional ratings for participant 5 are missing. It is a large dataset, approximately 7.66 GB in size, and is available under a CC-BY-4.0 license.

AudioTime SeriesMultimodalBrainMusic MedicineMo BiEegPianoNeuroscience+1

0 views

Speech & Audio

Gojjam Dialect Amharic Speech and Text Corpus

A parallel speech corpus containing audio recordings paired with text transcripts for the Gojjam dialect of Amharic. It is curated by leyu-amharic to support speech technology research. The dataset was last updated in March 2026.

TextAudioSpeech CorpusAmharic SpeechNatural Language ProcessingGojjam DialectAutomatic Speech Recognition+1

0 views

PreviousPage 41 of 130Next