DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

TTSSDSC: Speech Dataset for Text-to-Speech Synthesis

TTSSDSC is a speech dataset published on Kaggle. Its title suggests a focus on text-to-speech synthesis. The dataset's specific content, size, and origin require verification after download.

AudioText To Speech+1

0 views

Speech & Audio

EEG ADHD Nasrabadi MAT

EEG ADHD Nasrabadi MAT is a dataset of electroencephalogram (EEG) recordings related to Attention-Deficit/Hyperactivity Disorder. The dataset is hosted on Kaggle, but its specific scale, collection methodology, and authorship details are not provided in the available metadata. The title suggests it likely contains time-series brainwave data for analysis.

Time SeriesADHDEegNeuroscienceMedical Signals+1

0 views

Speech & Audio

TTS Human Preferences: 15,000 Annotations for Audio Quality Evaluation

Datapointai released this dataset in March 2026 containing 1,000 text-to-speech audio pairs and 15,000 human preference annotations. Each entry consists of a single text prompt rendered by two different TTS systems, with 15 human labels indicating which version sounds more natural.

OPTIMIZED-PARQUETParquetSize Categories1 Kn10 KText To SpeechLibrarypolarsRlhfModalityaudioLanguageenModalitytextAudio QualityLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsLibrarypandasPreference DataLicensecc By 40Human PreferencesRegionusDpo+1

0 views

Speech & Audio

F5TTS Clouds: Satellite Imagery of Cloud Formations

Kaggle hosts the f5tts_clouds dataset. The title suggests it contains imagery of cloud formations, likely for meteorological or computer vision analysis. The dataset's author, organization, and specific collection details are not provided in the available metadata.

ImageGeospatialSatellite ImageryClouds+1

0 views

Speech & Audio

Indian Telecaller to US Customer Speech Recordings

Speech audio data from telemarketing calls placed by Indian agents to customers in the United States. The dataset is hosted on Kaggle, but the author, organization, and specific collection details are unknown. The size, format, and number of recordings are unspecified.

AudioCall CenterTelemarketingCross Cultural+1

0 views

Speech & Audio

Arabic Professional Voice: Single-Speaker TTS Dataset with Full Diacritics

A high-quality Arabic Text-to-Speech dataset recorded by a professional male speaker. It contains 439 utterances in Modern Standard Arabic, all transcriptions include full Tashkeel (diacritical marks). The dataset was created by NightPrince and last updated on Hugging Face in March 2026.

AudioText To SpeechAudio DatasetSpeech SynthesisArabic Language+1

0 views

Speech & Audio

Common Voice ASR Clean: Filtered Speech Recognition Samples

A filtered version of the Common Voice dataset for automatic speech recognition (ASR). Samples with fewer than three words, repetitive tokens, or chat token leaks have been removed. The dataset was created by OpenSpeechHub and was last updated on March 31, 2026.

AudioAudio ProcessingSpeech RecognitionAutomatic Speech RecognitionFiltered Dataset+1

0 views

Speech & Audio

Kazakh Songs ASR: 1,000-10,000 Manually Aligned Audio-Text Pairs

Aggregating between 1,000 and 10,000 manually aligned audio-text pairs from Kazakh commercial songs, released by yeshpanovrustem in 2026. It provides line-level vocal segments designed to investigate the utility of sung speech for low-resource automatic speech recognition (ASR) systems.

ParquetSize Categories1 Kn10 KLicenseotherLibrarypolarsLibrarydaskModalityaudioModalitytextLibrarymlcroissantLibrarydatasetsRegionusTask Categoriesautomatic Speech RecognitionArxiv260300961Languagekk+1

0 views

Speech & Audio

French Asr Quebec Eu: French Speech Recognition Data from Quebec

French Asr Quebec Eu is a speech dataset hosted on HuggingFace by the author ele-sage. The title suggests it contains audio data for automatic speech recognition (ASR) in French, likely with a focus on the Quebec dialect. The dataset was last updated on April 5, 2026.

AudioQuebec DialectFrench LanguageAudio ProcessingSpeech Recognition+1

0 views

Speech & Audio

AI-Terms: 12 Audio Samples for Technical AI Terminology ASR Evaluation

12 audio samples of spoken AI news content comprise this ASR evaluation benchmark created by Trelis in 2026. It provides reference transcriptions and entity annotations specifically for technical AI terminology like model names and benchmarks.

ParquetLibrarypolarsModalityaudioLanguageenSize Categoriesn1 KModalitytextLibrarymlcroissantEvaluationLibrarydatasetsBenchmarkLibrarypandasLicensecc By 40RegionusTask Categoriesautomatic Speech RecognitionTechnical TerminologySpeechSpeech RecognitionEntity Recognition+1

0 views

Speech & Audio

USCG Facilities in Massachusetts, August 2007

Massachusetts contains all United States Coast Guard facilities within its borders as of August 2007. The data were compiled by the Massachusetts Office of Coastal Zone Management and are provided as GIS data. The dataset shows the location of these facilities.

GeospatialMassachusettsCoast GuardFacilities+1

0 views

Speech & Audio

Real-Time Streamflow Conditions for Five Major U.S. Rivers in the Gulf of Maine

Five major U.S. rivers entering the Gulf of Maine are monitored, including the Penobscot, Kennebec, Androscoggin, Saco, and Merrimack. The dataset provides real-time discharge data for the past 7 days and current streamflow conditions in Maine and Massachusetts. It is sourced from the US Geological Survey Water Resources Division via a NASA Earthdata gateway.

Time SeriesGeospatialUs RiversHydrologyGulf Of MaineReal Time DataRiver Discharge+1

0 views

Speech & Audio

Massachusetts Coastal Zone Management Boundary

Massachusetts Coastal Zone polygons represent the official coastal management boundary as defined by state regulation 301 CMR 21.99. The boundary layer was compiled by the Massachusetts Office of Coastal Zone Management in accordance with the federal Coastal Zone Management Act of 1972.

GeospatialCoastal managementGeospatial BoundariesPolicy ZoningMarine Resources+1

0 views

Speech & Audio

Lighthouse Locations Along the Massachusetts Coastline

All extant lighthouses on the coastline of Massachusetts are mapped in this dataset. Locations reflect current positions, which may differ from original sites. The data was compiled by SCIOPS.

GeospatialMassachusettsGeospatial LocationsCoastal InfrastructureLighthouses+1

0 views

Speech & Audio

Digital Geologic Map of Cape Cod and the Islands

A digital geologic map of Cape Cod and the islands, reprojected into the Massachusetts State Plane coordinate system. The data was processed by the Massachusetts Office of Coastal Zone Management in June 2006. The original data source is the SCIOPS organization.

GeospatialGeologyCoastal managementGeographic Information System+1

0 views

Speech & Audio

Aerial Photographs of Aquatic Vegetation from U.S. Coastal Waters

NOAA NCEI Accession 0000411 contains aerial photographs of aquatic vegetation captured from aircraft over Florida Bay, the Indian River in Florida, and the Coast of Massachusetts. The photographs were scanned and geo-referenced for mapping purposes. Data is stored on a DLT tape as a secure backup copy.

ImageGeospatialGeospatial MappingAquatic VegetationAerial PhotographyCoastal Ecology+1

0 views

Speech & Audio

TTS Pretrain 1M: 1 Million Synthetic Audio Samples Across 1000 Speakers

One million synthetic audio samples for text-to-speech applications, generated across 1000 distinct speakers. The collection was created by Aynursusuz, with each speaker contributing 1000 samples derived from 100 texts and 10 voice clones. The dataset was last updated on Hugging Face on March 11, 2026.

AudioMultimodalParquetText To SpeechLibrarypolarsLibrarydaskModalityaudioSize Categories1 Mn10 MSpeech SynthesisSpeaker CloningModalitytextLibrarymlcroissantLibrarydatasetsRegionusSyntheticSynthetic Audio+1

0 views

Speech & Audio

Amharic BDU-Speech: 32,901 Paired Audio and Transcriptions

32,901 paired Amharic speech audio files and transcriptions processed from the BDU-speech dataset by Yohannes A. Ejigu. Updated in March 2026, the collection provides mono audio recordings specifically structured for automatic speech recognition research and model training.

ArrowSize Categories10 Kn100 KModalityaudioModalitytextLibrarymlcroissantArxiv250318485LibrarydatasetsLicensecc By 40RegionusLanguageamTask Categoriesautomatic Speech Recognition+1

0 views

Speech & Audio

Nepali-English Code-Switched Technical Interview Audio and Transcripts

Fewer than 1,000 audio recordings and text transcripts of Nepali-English code-switched speech from technical interviews. Developed by devrahulbanjara and updated in March 2026, it captures software engineering terminology embedded in Nepali conversational grammar.

ParquetLibrarypolarsModalityaudioLanguageenLanguageneTechnicalSize Categoriesn1 KModalitytextCode SwitchingNepali Technical InterviewLibrarymlcroissantLibrarydatasetsLibrarypandasNepali DatasetRegionusTask Categoriesautomatic Speech RecognitionNe En CodeswitchingLicenseapache 20Speech Recognition+1

0 views

Speech & Audio

zh_asr_dataset: Chinese Speech Recognition Audio Data

Chinese speech recognition data published on Kaggle. The dataset likely contains audio recordings and corresponding transcriptions for training and evaluating automatic speech recognition (ASR) systems. Specific details on size, collection method, and contributors are not provided in the available metadata.

AudioAudio DataChinese LanguageSpeech Recognition+1

0 views

PreviousPage 59 of 130Next