Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,925 datasets
Urdu TTS Corpus Processed - 5 is a dataset for text-to-speech applications, published on Kaggle. The title suggests it contains processed audio and corresponding text data for the Urdu language. The specific content, scale, and creation details require verification after download.
VoxCeleb is an audio dataset hosted on Hugging Face. The dataset was uploaded by author N02N9 and was last updated on 2026-02-24. Its specific content, scale, and collection method are not detailed in the provided metadata.
The Journal of the Musical Arts in Africa is a source of academic publications. It likely contains scholarly articles and research papers on music and related arts from an African context. The dataset is aggregated from the paperswithcode platform.
modern-tts-dataset is a dataset for text-to-speech (TTS) research, published on Kaggle. The dataset likely contains audio recordings paired with corresponding text transcripts. Specific details on size, source, and creation date are not provided in the available metadata.
LibriSpeech-subset is a dataset of audio speech recordings, likely derived from the LibriSpeech corpus. The dataset is hosted on Kaggle, but its specific size, content details, and creation date are not provided in the metadata. The original LibriSpeech corpus is a widely used benchmark for automatic speech recognition research.
Wolfram Research, Inc. provides a text dataset based on Friedrich Nietzsche's philosophical work 'The Birth of Tragedy Out of the Spirit of Music'. The dataset's description references sections like 'attempt at self-criticism' and 'preface to Richard Wagner'. The exact size, format, and update schedule are not specified.
VoxCeleb2 contains over 1 million audio-visual utterances from 6,112 celebrities, extracted from YouTube videos. This large-scale speaker identification dataset includes MP4 video files and associated metadata for training and development. It was updated in early 2026 by user Oldi451.
A dataset hosted on HuggingFace by author wangxinnan, last updated on 2026-02-24. The title suggests it contains data for text-to-speech synthesis, likely pairing text prompts with audio outputs. The specific content, scale, and intended use require verification after download.
Spectral Forensics analysis compares the Lyria 3 and Suno v4.5 AI music generation models. The dataset contains spectral features extracted from audio outputs for benchmarking purposes. Author and creation date are unknown.
IndicTTS_14 is a dataset for text-to-speech synthesis, published on Kaggle. The dataset likely contains audio samples and corresponding text transcripts for one or more Indic languages. Its specific size, creation date, and author are not detailed in the provided metadata.
A dataset titled 'datamusicRS' hosted on Kaggle. The name suggests it contains data related to music and recommendation systems, likely involving user interactions or song attributes. No further descriptive metadata, sample data, or authorship details are available from the input.
A dataset titled 'date_music' hosted on Kaggle. The title suggests it likely contains audio or music data associated with temporal information. Its specific content, size, and origin are not detailed in the provided metadata.
A dataset named 'Sargam_Music_Dataset' is available on Kaggle. The dataset's specific content, size, and origin are not detailed in the provided metadata. Users must download the dataset to verify its actual composition and suitability for their tasks.
INSWXTTS is a speech and audio dataset published on Kaggle. The dataset's specific content, size, and origin are not detailed in the provided metadata. Further verification is required to confirm its exact scope and structure.
An audio dataset hosted on Kaggle, likely containing recordings for classifying environmental sounds related to fire and forest events. The dataset's specific size, source, and creation details are not provided in the available metadata. Its content and structure require verification after download.
Audio recordings likely intended for classifying sounds related to fire events and forest environments. The dataset is hosted on Kaggle, but details on its size, origin, and specific content are not provided in the metadata. Further verification is required to confirm the exact number of files, recording conditions, and annotation quality.
Cinematic-music-src is an audio dataset hosted on Kaggle. The dataset's specific size, creator, and update date are not provided in the available metadata.
Kaggle hosts this audio dataset focused on domestic environments. The dataset's specific content, size, and collection details are not provided in the metadata. Users must download the data to verify its scope and suitability for their projects.
Kaggle hosts the VoxCeleb1-Training dataset, a collection of audio clips likely used for speaker identification tasks. The dataset appears to contain speech samples from celebrities, as suggested by its name and platform tags. Specific details on size, format, and collection methodology are not provided in the available metadata.
A dataset titled 'als-music-model' is hosted on Kaggle. The platform tags suggest it contains audio data and is intended for machine learning applications. The author, organization, and specific data characteristics are unknown.