Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,925 datasets
A 15GB variant of the LibriVAD dataset, which is built on the LibriSpeech corpus. The dataset is noise-augmented, suggesting it is designed for training models in noisy acoustic environments. Its author, organization, and specific creation date are unknown.
Quantum Melody Drift GOLD 2M is a benchmark dataset for quantum machine learning and music generation. It contains 2 million records for modeling the drift from qubit states to melodies, fusing concepts from physics and music. The description reports an LSTM model achieving over 92% accuracy on a harmony-related task.
University of Massachusetts Press publications listed on Papers with Code. The dataset likely contains metadata or text related to academic books and journals. The license is closed, and specific details like size and columns are unknown.
Kaggle hosts a dataset titled 'music-features'. The dataset likely contains extracted audio features from music tracks, which are commonly used for analysis and modeling. Its specific content, scale, and origin are unknown from the provided metadata.
A dataset for detecting hate speech in the Bengali language, sourced from social media platforms. The dataset is hosted on Kaggle and likely contains text samples with classification labels. Its specific size, annotation methodology, and creation date are not detailed in the provided metadata.
A dataset titled 'vietmusicAI' hosted on Kaggle. The dataset likely contains audio data related to Vietnamese music, intended for artificial intelligence or machine learning applications. No further metadata on size, format, or creation details is available.
A dataset for text-to-speech and speech synthesis tasks, likely containing Indonesian language audio. It is published on Kaggle, but details on its creation date, author, and specific size are not provided. The platform tags indicate a focus on audio generation for the Indonesian language.
7second-commonvoice is a dataset hosted on Kaggle, likely derived from the Mozilla Common Voice project. The dataset appears to contain audio data, as suggested by the platform tags 'Audio Data' and 'Speech Recognition'. The exact number of samples, file formats, and specific content are unknown from the provided metadata.
Music-features is a dataset hosted on Kaggle, likely containing quantitative attributes extracted from audio tracks. The specific features, number of samples, and data collection methodology are not detailed in the provided metadata. Its Kaggle platform tags indicate a focus on music and audio analysis.
ASR_evaluation is a dataset hosted on Kaggle, likely containing audio files and associated transcripts for evaluating Automatic Speech Recognition systems. The dataset's specific size, origin, and update history are not detailed in the provided metadata. Its content and structure must be verified after download.
A dataset combining dance motion capture and music audio data, likely containing coordinate information for movement analysis. The dataset is published on Kaggle and includes platform tags for Dance, Motion Capture, and Audio. Specific details on the number of rows, columns, and collection methodology are not provided in the available metadata.
4000 audio files across four languages, sourced from Kaggle. The dataset likely contains speech recordings for multilingual machine learning applications. Specific details on languages, recording conditions, and annotation are not provided in the metadata.
VoxCeleb1-Test is a dataset for speaker recognition tasks, published on Kaggle. The dataset likely contains audio samples for testing machine learning models. Its specific content, size, and origin require verification after download.
vkr_tts is a dataset for text-to-speech research, published on Kaggle. The dataset likely contains audio samples and corresponding text transcripts for training speech synthesis models. Specific details on size, format, and origin are not provided in the available metadata.
Kaggle hosts the F5TTS-VI-Model, a pre-trained model for speech synthesis. The model's architecture, training data, and performance metrics are not detailed in the provided metadata. Its release date and original author are currently unknown.
ASR recordings likely contain speech audio data intended for training or evaluating automatic speech recognition systems. The dataset is hosted on Kaggle, but details on its size, origin, and collection date are not provided. Columns and specific content are unknown, requiring verification after download.
A dataset titled 'offline_wheels_asr' is hosted on Kaggle. The dataset's specific content, size, and origin are not detailed in the provided metadata. Its title suggests a focus on automatic speech recognition, potentially for offline or embedded applications.
VoxCeleb1-Tuyen is a dataset hosted on Kaggle. The title suggests it is a variant or derivative of the VoxCeleb1 dataset, which is commonly used for speaker identification and verification tasks. The dataset's specific content, scale, and origin require verification after download.
An audio dataset of Iranian folk music, sourced from Kaggle. The dataset's specific content, size, and collection methodology are not detailed in the provided metadata. Further verification is required to determine the exact number of recordings, their formats, and the recording conditions.