Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,912 datasets
XTTSv2 patch 9 is a model checkpoint for a text-to-speech system, published on Kaggle. The dataset's specific content, such as audio samples or model weights, requires verification after download. No information is provided about the author, organization, or the exact data format.
450-liter water butts and light traps monitored Culex pipiens mosquito populations at a Wallingford field site in 2015. Immature life stages (eggs, larvae, pupae) were sampled three times weekly from March to October, while adult counts were taken four times weekly from April to October. The adult data also includes counts for Culiseta annulata and Aedes geniculatus species.
Music Foundry Vault is a dataset hosted on Kaggle. Its title suggests it contains audio samples or music production assets. The dataset's specific contents, scale, and origin require verification after download.
Kaggle hosts a dataset designed for analyzing emotional perception in piano music. The dataset's creator, size, and specific contents are not detailed in the provided metadata. Its last update date and license information are also unknown.
Professor's Fongbe Speech Dataset is a unified, high-quality collection of Fongbe speech data curated to preserve the linguistic integrity of this tonal language. It acts as a complete, unsegmented, and tone-accurate assembly of the Fongbe Continuous Speech Recognition corpora, merging the foundational ALFFA Project data from 2016 with an expanded Zenodo release from 2022. The dataset was last updated on the Hugging Face platform in February 2026.
A dataset for Environmental Sound Classification. It likely contains audio recordings of various environmental sounds. The dataset is published on Kaggle.
CoRal V3 is an Automatic Speech Recognition dataset designed to capture the diversity of spoken Danish. The dataset, created by the CoRal-project, includes variations across dialects, accents, genders, and age groups. It was last updated on February 24, 2026.
A text-to-speech dataset for Egyptian Arabic, created by AlaaSamir and hosted on Hugging Face. The dataset was last updated on April 2, 2026. Its specific size, format, and content require verification after download.
Musicskills 3.5M is a dataset published on HuggingFace by AndreasXi, with a last update timestamp of 2026-03-31. Its title suggests a collection of data related to musical skills, potentially containing audio recordings or performance metrics. The dataset's specific content, scale of 3.5 million items, and intended use require verification after download due to minimal provided metadata.
This dataset estimates the economic impact of 1,423 independent music venues across 109 U.S. music zones. It provides regional-level estimates of annual economic output and jobs supported, calculated using a venue economic impact calculator. The analysis finds venues contribute approximately $1.4 billion annually and support 11,824 jobs.
Speech Dataset is an audio collection uploaded to HuggingFace by author wonderwind271. The dataset was last updated on April 4, 2026. Its specific content, size, and structure require verification after download.
XTTSv2_checkpoint is a dataset published on Kaggle. The title suggests it contains model weights or training data for a text-to-speech system. The dataset's specific content, size, and origin are not detailed in the available metadata.
Kaggle hosts the Irodori-TTS Training Data. The dataset likely contains audio recordings and corresponding text transcripts for training text-to-speech models. Its creator, size, and specific collection date are unknown.
Otoearth released this 141-hour dataset of processed, two-speaker full-duplex conversational English speech in February 2026. It is a curated subset of the otoSpeech-full-duplex-280h collection, refined through human quality reviews and noise reduction techniques.
Librispeech Synth 300h is a synthetic speech audio dataset derived from the LibriSpeech corpus. The title suggests it contains up to 300 hours of generated audio, likely from a maximum of 10 distinct speaker profiles. It is hosted on the Kaggle platform, but detailed metadata about its creation and contents is not provided.
Tricky Tts Orpheus is a dataset authored by Trelis and hosted on Hugging Face. The dataset was last updated on March 31, 2026. Its specific content and scale are not detailed in the available metadata.
Hindi speech data created by sol9x-sagar and published on Hugging Face. The dataset is designed for speaker diarization tasks, which involve identifying and segmenting speech by different speakers. It was last updated on April 1, 2026.
A text-to-speech dataset hosted on HuggingFace by the author Trelis. The dataset was last updated on March 31, 2026. Its specific content and scale are not detailed in the provided metadata.
A text-to-speech dataset authored by Trelis and hosted on Hugging Face. The dataset was last updated on March 31, 2026. Its specific content and scale are not detailed in the available metadata.
49.1 hours of filtered Russian speech recordings derived from the 1,300-hour GOLOS corpus. The dataset consists of audio segments processed through the BALALAIKA pipeline specifically for generative speech modeling.