DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Speech & Audio Datasets | DataSalon

All Categories

🎤

Speech & Audio

Speech recognition, text-to-speech, speaker identification, music classification, audio event detection

2,602 datasets

Speech & Audio

Tricky TTS Piper: English GB Speech Synthesis Samples

A text-to-speech dataset hosted on HuggingFace by the author Trelis. The dataset was last updated on March 31, 2026. Its specific content and scale are not detailed in the provided metadata.

AudioText To SpeechSpeech SynthesisVoice CloningAudio Generation+1

0 views

Speech & Audio

Tricky TTS Chatterbox: Text-to-Speech Audio Samples

A text-to-speech dataset authored by Trelis and hosted on Hugging Face. The dataset was last updated on March 31, 2026. Its specific content and scale are not detailed in the available metadata.

AudioText To SpeechSpeech SynthesisAudio Generation+1

0 views

Speech & Audio

Golos Balalaika: Filtered Russian Speech Corpus for Generative Modeling

49.1 hours of filtered Russian speech recordings derived from the 1,300-hour GOLOS corpus. The dataset consists of audio segments processed through the BALALAIKA pipeline specifically for generative speech modeling.

ParquetSize Categories10 Kn100 KTask Categoriestext To SpeechLibrarypolarsModalitytextArxiv250713563ModalitytabularLibrarymlcroissantLicensecc By Sa 40LibrarydatasetsLibrarypandasRegionusTask Categoriesautomatic Speech RecognitionLanguageru+1

0 views

Speech & Audio

Biggest Ru Book Balalaika

528.2 hours of filtered Russian speech data across the audiobook genre. The corpus is processed through the BALALAIKA pipeline by the MTUCI lab260 team for generative speech tasks.

ParquetTask Categoriestext To SpeechLibrarypolarsModalitytextSize Categories100 Kn1 MArxiv250713563ModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusTask Categoriesautomatic Speech RecognitionLanguageruLicenseapache 20+1

0 views

Speech & Audio

Global Radiosonde Observations from 1958 to 1963

Daily upper-air atmospheric observations collected worldwide using radiosonde weather balloons. The dataset covers a five-year period from May 1958 to April 1963. Data collection was managed by the Massachusetts Institute of Technology.

TabularTime SeriesUpper Air ObservationsAtmospheric ScienceHistorical ClimateWeather Balloons+1

0 views

Speech & Audio

European Football Match Results for the 2024/2025 Season

1,941 matches from the 2024/2025 European football season across six major competitions. The dataset, created by Tarek Masryo, includes results, dates, referees, and detailed score breakdowns. It was last updated on Hugging Face in February 2026.

TabularEuropean SportsFootballSports AnalyticsMatch Results+1

0 views

Speech & Audio

MM 2020/W21: Music Industry Sales Over 40 Years

A dataset concerning music industry sales, likely covering a 40-year period. It was published on Kaggle, but the author, organization, and specific data collection method are unknown. The dataset's exact size, structure, and variables are not detailed in the provided metadata.

TabularAudioTime SeriesMusic IndustryEntertainment+1

0 views

Speech & Audio

Music Lyrics and Audio Features from 1950 to 2019

Lyrics and metadata for songs spanning 70 years from 1950 to 2019. The dataset includes features such as sadness, danceability, loudness, and acousticness. It was published on Mendeley Data in 2020 by authors Moura, Luan; Fontelles, Emanuel; Sampaio, Vinicius; Frana, Mardnio.

TextTabularAudioAudio FeaturesTabular DataMusic LyricsNatural Language ProcessingTemporal Analysis+1

0 views

Speech & Audio

2011 NOAA Ortho-rectified Mosaic of Merrimack River and Plum Island Sound, Massachusetts (

NOAA's Integrated Ocean and Coastal Mapping initiative produced this ortho-rectified mosaic from aerial imagery. The source data was captured in a single day on June 19, 2011, using an Applanix Digital Sensor System. The final mosaic tiles are derived from higher-resolution original photographs.

ImageAudioGeospatial🌎 North AmericaAircraftMassachusettsAerialNoaaDocnoaanosngsNational Geodetic SurveyPhotographInfrared WavelengthsMosaicCoastal MappingCamerasCoastalNational Ocean ServiceEarth ScienceNgs ImageryRectified ImageInfrared ImageryContinentDigital OrthophotographyOrthophoto+1

0 views

Speech & Audio

Aerial Orthophoto Mosaic of Merrimack River and Plum Island Sound, Massachusetts (2011)

Massachusetts coastal imagery from the NOAA Integrated Ocean and Coastal Mapping initiative. The ortho-rectified mosaic was created from aerial photographs captured on June 19, 2011, using an Applanix Digital Sensor System. The original source imagery was acquired at a higher resolution than the final mosaic product.

ImageAudioGeospatial🌎 North AmericaAircraftMassachusettsAerialNoaaDocnoaanosngsNational Geodetic SurveyOrthophoto mosaicPhotographInfrared WavelengthsMosaicCoastal MappingAerial ImageryCamerasCoastalNational Ocean ServiceEarth ScienceNgs ImageryRectified ImageInfrared ImageryContinentDigital OrthophotographyOrthophoto+1

0 views

Speech & Audio

2011 NOAA Ortho-rectified Mosaic of Merrimack River and Plum Island Sound, Massachusetts (

A 2011 ortho-rectified mosaic of the Merrimack River and Plum Island Sound in Massachusetts, created by the NOAA Integrated Ocean and Coastal Mapping initiative. The source imagery was acquired on June 19, 2011, using an Applanix Digital Sensor System (DSS). The final mosaic is derived from higher-resolution original images.

ImageAudioGeospatial🌎 North AmericaAircraftMassachusettsAerialNoaaCoastal ImageryDocnoaanosngsNational Geodetic SurveyOrthophoto mosaicPhotographInfrared WavelengthsMosaicCamerasCoastalNational Ocean ServiceEarth ScienceNgs ImageryRectified ImageInfrared ImageryContinentDigital OrthophotographyOrthophoto+1

0 views

Speech & Audio

NOAA Orthophoto Mosaic of the Merrimack River and Plum Island Sound, Massachusetts

2011 NOAA Ortho-rectified Mosaic of Merrimack River and Plum Island Sound, Massachusetts (Mean Lower Low Water) is a set of ortho-rectified mosaic tiles produced by the NOAA Integrated Ocean and Coastal Mapping initiative. The source aerial imagery was acquired on June 19, 2011, using an Applanix Digital Sensor System. The final ortho-rectified mosaic is derived from higher-resolution original images.

0 views

Speech & Audio

XTTSv2 Final: Text-to-Speech Model Outputs

XTTSv2 Final is a dataset hosted on Kaggle. The title suggests it contains outputs or training data related to the XTTSv2 text-to-speech model. The dataset's specific content, size, and creator are not detailed in the provided metadata.

AudioText To SpeechMachine LearningSpeech Synthesis+1

0 views

Speech & Audio

NV-Bench: 1,651 Samples for Nonverbal Vocalization Synthesis Benchmarking

NV-Bench is a benchmark dataset for evaluating nonverbal vocalization synthesis in text-to-speech models, created by AnonyData and last updated on March 1, 2026. It comprises 1,651 samples grounded in a functional taxonomy that treats nonverbal vocalizations as communicative acts. The dataset is hosted on Hugging Face and aims to provide standardized metrics and reliable ground truth references for this expressive TTS subfield.

TextAudioParquetSize Categories1 Kn10 KTask Categoriestext To SpeechLibrarypolarsLanguagezhModalityaudioLanguageenNonverbal VocalizationLicensecc By Nc Sa 40Speech SynthesisModalitytextLibrarymlcroissantLibrarydatasetsBenchmarkLibrarypandasRegionusExpressive TtsAudio Synthesis+1

0 views

Speech & Audio

AudioX-IFcaps: 7 Million Instruction-Following Audio Samples with Timestamps

AudioX-IFcaps contains over 7 million audio samples with instruction-following captions, developed by HKUSTAudio for ICLR 2026. The dataset provides structured annotations for audio and music generation, focusing on sound event categories, counts, and temporal ordering.

WEBDATASETLibrarywebdatasetLicensecc By Nc Nd 40ModalitytextSize Categories100 Kn1 MLibrarymlcroissantTask Categoriestext To AudioLibrarydatasetsRegionusArxiv250310522+1

0 views

Speech & Audio

1Hit.No Music Images: A Multimodal Dataset from HuggingFace

A multimodal dataset titled '1Hit.No Music Images' was published on HuggingFace by author MySafeCode. The dataset was last updated on March 22, 2026. Its specific content and scale are not detailed in the available metadata.

ImageAudioMultimodalIMAGEFOLDERSize Categories1 Kn10 KLibrarymlcroissantModalityimageLibrarydatasetsRegionusNo MusicLicensemit+1

0 views

Speech & Audio

Librispeech Synth 300h: Synthetic Speech Audio from Up to 20 Speakers

Librispeech Synth 300h max 20spks is an audio dataset published on Kaggle. The title suggests it contains up to 300 hours of synthetic speech audio, likely generated from the LibriSpeech corpus, featuring a maximum of 20 distinct speakers. Its specific creation method and exact content require verification after download.

AudioMachine LearningAudio DatasetSpeech SynthesisSpeech Recognition+1

0 views

Speech & Audio

XTTS v2 Pretrained: Text-to-Speech Model Weights

XTTS v2 pretrained model weights published on Kaggle. The dataset likely contains the necessary files for a text-to-speech synthesis system. Its specific contents, such as model checkpoints and configuration files, require verification after download.

AudioText To SpeechSpeech SynthesisPretrained Models+1

0 views

Speech & Audio

IR-MUSIC-LYRICS: Music Lyrics Dataset

IR-MUSIC-LYRICS is a dataset of music lyrics, likely for information retrieval or natural language processing tasks. It is hosted on Kaggle, but its specific size, origin, and update history are not detailed in the available metadata. The dataset's content and structure require verification after download.

TextAudioIr MusicMusic LyricsText Corpus+1

0 views

Speech & Audio

MTG-Jamendo: Music Autotagging Metadata with Genre and Mood Labels

MTG-Jamendo provides metadata, scripts, and baselines for music autotagging research, created by the Music Technology Group (MTG). It serves as a benchmark for audio analysis tasks using tracks sourced from the Jamendo platform under Creative Commons licenses.

Music Information RetrievalAutotaggingDeep Learning+1

0 views

PreviousPage 65 of 130Next