DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

My XTTS Finetuned: Fine-Tuned Text-to-Speech Model

A fine-tuned text-to-speech model dataset published on Kaggle. The dataset likely contains audio samples and model checkpoints for a customized XTTS (eXtended Text-to-Speech) system. Its specific size, creation date, and author are unknown.

AudioText To SpeechSpeech SynthesisFine Tuning+1

0 views

Speech & Audio

High-Fidelity Conversational Speech in Four Asian Languages

Featuring high-quality conversational audio samples for Automatic Speech Recognition tasks in Vietnamese, Korean, Arabic, and Filipino. It includes paired audio and transcripts of natural, non-scripted speech, featuring both single-speaker and dual-speaker interactions. Audio specifications include a sampling rate of 16 kHz to 24 kHz and a 16-bit bit depth.

OPTIMIZED-PARQUETParquetLibrarypolarsLanguagearLibrarydaskSize Categoriesn1 KModalitytextMulti SpeakerLibrarymlcroissantTask Categoriesaudio ClassificationLibrarydatasetsLicensecc By 40RegionusSingle SpeakerTask Categoriesautomatic Speech RecognitionNatural SpeechSpeech RecognitionLanguagevi+1

0 views

Speech & Audio

AI-Generated Music Collection from Five Platforms, Over 865,000 Songs

Approximately 865,000 AI-generated music songs collected from five platforms: Mureka, Riffusion, Sonauto, Suno, and Udio. The dataset includes original audio files and full platform metadata stored as JSON sidecar files. It was created by 'ai-music' and last updated on February 25, 2026.

AudioMultimodalWEBDATASETLanguagezhLanguageenLibrarywebdatasetMultimodal MetadataAi Generated MusicModalitytextSize Categories100 Kn1 MLibrarymlcroissantTask Categoriesaudio ClassificationTask Categoriestext To AudioLibrarydatasetsGenerative MusicAi MusicLanguagekoRegionusLarge ScaleLanguagejaAudio GenerationLicensemitSyntheticSynthetic Audio+1

0 views

Speech & Audio

Aviation Accident Investigation Report for 2019 Quebec Crash

16 June 2019 report details the loss of control and ground collision of an amateur-built Pitts S2E aircraft, registration C-GONV, in Saint-Jean-Port-Joli, Quebec. The investigation was conducted and published by the Transportation Safety Board of Canada. The dataset is a single HTML document containing the official safety investigation narrative.

0 views

Speech & Audio

VoxCeleb2 Dev: Speaker Identification and Audio Retrieval Training Set

VoxCeleb2 Dev is the training subset of the VoxCeleb2 dataset, used for speaker identification and audio retrieval tasks. It is an expanded version of VoxCeleb1, containing more speakers and audio samples, and includes standardized audio files with corresponding metadata. The dataset was uploaded by 'humanify' to Hugging Face and was last updated on 2026-03-05.

TabularAudioAudio DatasetSpeaker IdentificationSpeech ProcessingAudio Retrieval+1

0 views

Speech & Audio

DAM: Clinical Mental Health Labels and Model Scores for 35,000 Users

Offering clinical mental health labels and audio-based model scores for 35,000 individuals, totaling 863 hours of speech data. Created by KintsugiHealth in 2026, it includes demographic metadata for validation and test sets used in model development.

ParquetSize Categories10 Kn100 KLibrarypolarsModalitytextModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusLicenseapache 20+1

0 views

Speech & Audio

Eastern Massachusetts National Wildlife Refuge Forest Inventory Data

Forest inventory outputs from the Eastern Massachusetts National Wildlife Refuge Complex likely contain measurements of tree cavities, canopy structure, and biomass. The data is managed by the Department of the Interior and was last updated in March 2026. It provides detailed metrics for forest communities and ecosystems.

TabularZIPTreeForest InventoryTree CavityForbsGeneral Biology Communities And Ecosystems ForestCanopyGeneral Management Natural Resources Management FoHerbaceousWildlife RefugeVegetation StructureRegenerationGeneral Management InventoryCrown ConditionBiomassForestGrassDeer BrowseForest ManagementCrown Structure+1

0 views

Speech & Audio

Asmr-Zh-R18: Chinese R18 ASMR Audio for TTS and Voice Cloning

asmr-zh-r18 is a collection of Chinese R18 ASMR audio works for fine-tuning text-to-speech and voice cloning models. The dataset contains 5,876 works, sourced from the asmr.one API, with a total raw size of approximately 1.1 terabytes. It was uploaded by jasonfan and last updated on March 20, 2026.

AudioChineseText To SpeechVoice CloningAsmr+1

0 views

Speech & Audio

Jazzmus: Jazz Lead Sheets with Expert OMR Annotations

Jazzmus provides approximately 1,000 expert-annotated jazz lead sheets for Optical Music Recognition (OMR), developed by the PRAIG research group in 2025. The dataset includes high-resolution images paired with system-level bounding boxes and musical encodings for end-to-end transcription tasks.

TextAudioParquetSystem LevelLibrarypolarsTask Categoriesimage To TextAnnotations Creatorsmanually Expert GeneratedSize Categoriesn1 KModalitytextTask Categoriestext RetrievalLibrarymlcroissantModalityimageLibrarydatasetsLibrarypandasArxiv250905329Task Categoriesimage SegmentationLicensecc By Nc 40Full PageRegionus+1

0 views

Speech & Audio

ViMedCSS: Vietnamese Medical Speech with English Code-Switching Terms

ViMedCSS is a Vietnamese medical speech dataset designed for code-switching automatic speech recognition. It contains 11,832 training utterances totaling 24.30 hours, with each utterance embedding at least one non-Vietnamese medical term, primarily English. The dataset was created by tensorxt and is associated with the LREC 2026 conference.

AudioAudio DatasetCode SwitchingHealthcareMedical TerminologyVietnamese LanguageSpeech Recognition+1

0 views

Speech & Audio

Global Upper Air Meteorological Observations from 1958-1963

Historical upper air meteorological data collected globally over a 5-year period from 1958 to 1963. This small dataset was created by the Massachusetts Institute of Technology (MIT) and archived at the National Climatic Data Center. It was later incorporated into the larger, quality-controlled Comprehensive Aerological Data Set (CARDS).

TabularTime SeriesHistorical ClimateMeteorologyGlobal WeatherUpper Air+1

0 views

Speech & Audio

Tts Female 70H: A Text-to-Speech Voice Model

Tts Female 70H is a text-to-speech model published on HuggingFace by author vfdanil. The dataset was last updated on April 24, 2026. Its specific content and scale are unknown from the available metadata.

AudioText To SpeechSpeech SynthesisFemale VoiceAudio Generation+1

0 views

Speech & Audio

Open Source Musical Instruments In SFZ Format

A collection of open source musical instruments using the SFZ format, sourced from the sfzinstruments website. The dataset was created by 'projectlosangeles' and was last updated in March 2026.

AudioSFZ-InstrumentsSfz FormatTask Categoriesaudio To AudioLicensecc By Nc Sa 40Music InstrumentsSize Categoriesn1 KMidiSFZInstrumentsRegionusMIDI-InstrumentsAudio Synthesis+1

0 views

Speech & Audio

Ghana English ASR: 2,700 Hours of Transcribed News Broadcasts

Ghana NLP Community released this 2,700-hour collection of Ghanaian English speech and transcriptions in March 2026. Sourced from news media broadcasts, it contains up to 1,000,000 audio segments specifically for West African accent modeling.

AudioOPTIMIZED-PARQUETParquetLibrarypolarsLibrarydaskModalityaudioLanguageenModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsGhanaian EnglishRegionusTask Categoriesautomatic Speech RecognitionSpeech RecognitionWest African English+1

0 views

Speech & Audio

Electronic Music Audio Features from Beatport Top 100 Songs, November 2018

November 2018 top 100 songs from over 20 electronic music subgenres on Beatport. The dataset contains audio features extracted from two-minute samples of each song using the pyAudioAnalysis library. It was used in a publication on automatic subgenre classification in electronic dance music.

TabularAudioMusic Genre ClassificationMusic AnalysisAudio FeaturesElectronic MusicBeatport+1

0 views

Speech & Audio

Historical Ocean Temperature Profiles from Bathythermographs, 1947-1983

7,537 bathythermograph observations of water depth and temperature collected from over 45 different ships. The data was submitted by Ted Dalzell of the Hydro Department in Birkenhead, UK, and spans from August 1947 to October 1983. It is now available online through NOAA NCEI in C116 and C128 file formats.

TabularTime SeriesOceanographyHistorical DataBathythermograph+1

0 views

Speech & Audio

Water Temperature and Depth from HMAS Cook, 1989

Water depth and temperature data collected from February 7, 1989 to December 14, 1989 as part of the Global Temperature-Salinity Pilot Project (GTSPP). The data was gathered by the Australian Oceanographic Data Center using bathythermograph (XBT) instruments aboard HMAS Cook and submitted to NOAA NCEI.

TabularTime Series🇦🇺 AustraliaOceanographyPhysical OceanographyBathythermograph+1

0 views

Speech & Audio

Ocean Water Temperature and Depth from HMAS Cook, November-December 1988

From November 28, 1988 to December 10, 1988, water depth and temperature data were collected as part of the Global Temperature-Salinity Pilot Project (GTSPP). The data originates from bathythermograph (XBT) casts taken from the vessel HMAS Cook and was submitted by the Australian Oceanographic Data Center. It is archived by NOAA's National Centers for Environmental Information under accession 9500115.

TabularTime Series🇦🇺 AustraliaOceanographyBathythermograph+1

0 views

Speech & Audio

Water Temperature and Depth from HMAS Cook, 1987

Bathythermograph (XBT) data on water depth and temperature collected by the Australian Oceanographic Data Center. The data was gathered from the HMAS Cook as part of the Global Temperature-Salinity Pilot Project (GTSPP). It covers a specific time range from January 20, 1987, to November 9, 1987.

TabularTime Series🇦🇺 AustraliaOceanographyBathythermograph+1

0 views

Speech & Audio

Massachusetts Bay Hydrophysical and Hydrochemical CTD Data, 1990-1991

Hydrophysical and hydrochemical data were collected from CTD casts in Massachusetts Bay and adjacent waters from April 1990 to June 1991. The dataset includes measurements of water depth, temperature, salinity, chlorophyll a concentration, percent light transmission, and beam attenuation. Data were gathered from the R/V Asterias and other platforms as part of the Massachusetts Bays Program.

TabularTime SeriesOceanographyMassachusetts BayCtd CastsHydrochemistry+1

0 views

PreviousPage 52 of 130Next