Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,912 datasets
Librispeech Synth 300h is a speech synthesis dataset derived from the LibriSpeech corpus. It likely contains up to 300 hours of synthetic audio generated from a maximum of 20 speaker voices. The dataset is hosted on Kaggle.
XTTS Real Audio Dataset is a collection of audio data published on Kaggle. The dataset likely contains audio samples intended for training or evaluating text-to-speech models. Its specific contents, size, and collection methodology require verification after download.
xtts-vietnamese-dataset is a dataset hosted on Kaggle. Its title suggests it contains data for training or fine-tuning text-to-speech models for the Vietnamese language. The dataset's author, organization, size, and specific contents are not detailed in the provided metadata.
A collection of music tracks from Latin genres including Reggaeton, Salsa, Bachata, and Merengue. The dataset is hosted on Kaggle, but details about its author, size, and creation date are not provided. Its contents likely include track identifiers and metadata for playlist analysis.
TTS_Fluer_LJspeech_Dataset is a Kaggle-hosted collection likely intended for speech synthesis research. The dataset's title suggests it may combine or relate to the Fluer and LJ Speech audio corpora, which are common benchmarks in text-to-speech. Published on Kaggle, its specific content, size, and structure require verification after download.
A multimodal dataset for speech recognition tasks. The description suggests it contains acoustic features relevant to speech-to-text transcription. Its origin, size, and temporal coverage are unknown.
An API wrapper and interface for the OpenGWAS database, which likely contains genome-wide association study results. The interface was created by Gibran Hemani and provides convenience functions for specific queries.
A dataset likely containing audio samples and corresponding text transcripts for text-to-speech tasks. It is hosted on Kaggle, but its specific size, origin, and creation date are unknown. The author and organization details are not provided.
TTS-LJSPEECH-ANIMAN is a dataset hosted on Kaggle. Its title suggests a connection to text-to-speech synthesis, potentially using or extending the LJ Speech corpus. The dataset's specific content, size, and origin are not detailed in the available metadata.
An audio dataset focused on women's safety and distress signals, published on Kaggle. The dataset's specific content, such as the number of clips or recording conditions, is not detailed in the available metadata. Its primary purpose is likely for developing or testing audio-based safety and alert systems.
RidheshBhati's collection merges text-to-speech data for 13 Indic languages, totaling between 100,000 and 1,000,000 records as of March 2026. Every audio clip in the set is filtered to ensure a minimum duration of 3.0 seconds.
A dataset concerning music genres, likely containing labels or features for audio classification tasks. It was published on Kaggle, but its specific contents, size, and creation details are not provided in the metadata. The last update date and author are unknown.
A collection of audio files tagged with emotional labels across multiple music genres. The dataset is hosted on Kaggle, but its size, specific creation date, and original author are not detailed in the provided metadata. Columns and exact data formats are unknown.
A Cantonese audio dataset features storyteller Zhang Yuekai narrating four classic literary works, including 'Romance of the Three Kingdoms' and 'Water Margin'. It is designed for TTS and ASR model training, as well as linguistic and literary research. The dataset contains audio files and corresponding standardized text transcripts.
Sound velocity profiles were collected in Northern Massachusetts Bay during hydrographic surveys from August 2024 to March 2025. Data were gathered from multiple vessels including MV Northstar Challenger, RV North Cove, RV South Cove, RV Twister, and RV West Cove II. Profiles were recorded at intervals of approximately 2 hours for sound speed profilers and 15 minutes for moving vessel profilers.
A protocol for audio classification tasks, published on Kaggle. The dataset's specific content, size, and features are not detailed in the provided metadata. Its author, organization, and last update date are unknown.
GigXchange provides annual income data for working musicians in the United Kingdom for the year 2026. The dataset likely contains breakdowns by musical genre, geographic region, and primary performance venue type. It was sourced from the Kaggle platform, but the original author and specific collection methodology are unknown.
2,381 observations of live-music booking rates in the United Kingdom from April 2026. The data covers bookings for weddings, pubs, and corporate events. The original source is the GX Index, and it was published on Kaggle.
Parallel Speech dataset designed for the speech translation task. The dataset appears to contain aligned audio data in multiple languages. Its source, size, and specific creation details are not provided in the available metadata.
A dataset titled 'TTS_PLShrimp' published on Kaggle. The name suggests it contains data for text-to-speech synthesis tasks. Metadata is minimal; the specific content, size, and origin require verification after download.