Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,926 datasets
A speech corpus for the Hindi language, published on Kaggle. The dataset likely contains audio recordings and associated transcriptions. The author, organization, and specific collection details are unknown.
Call Center Audio Dataset is a collection of audio recordings from real customer service calls, published on Kaggle. The dataset likely contains audio files intended for speech and audio processing tasks. Specific details on the number of recordings, file formats, and collection methodology are not provided in the available metadata.
TTS Training Data likely contains audio files and corresponding text transcripts for training text-to-speech systems. The dataset is hosted on Kaggle, but its specific size, source, and creation date are unknown. The content is inferred to be relevant for speech synthesis research and development.
Dataset_TTS_LT is a dataset for text-to-speech (TTS) tasks, likely containing Lithuanian language audio and corresponding text transcripts. It is hosted on the Kaggle platform, but detailed metadata such as author, size, and creation date are not provided. The dataset's specific content and structure require verification after download.
XTTS-Checkpoint is a dataset published on Kaggle. The title and platform tags suggest it likely contains a pre-trained model checkpoint for a text-to-speech system. Specific details about its size, author, and creation date are unknown.
Massachusetts Roads is a dataset published on Kaggle. The raw description is in a non-English language, suggesting it may contain geospatial information about road networks in the state of Massachusetts. The dataset's specific content, size, and origin require verification after download.
A single-speaker Tamil text-to-speech model named 'ta_IN-Valluvar-medium' with a 22050 Hz sample rate, created by Jeyaram-K. It is a custom model designed for the Piper TTS system and was last updated in December 2025.
556,667 audio files totaling 1,024.71 hours of speech data, with an average clip length of 6.63 seconds. The dataset includes a breakdown of clips by speaker, with the top contributor, 'Despina', accounting for 60,150 clips or 11.5% of the total duration. It was uploaded by 'setfunctionenvironment' to Hugging Face and last updated on July 18, 2025.
A dataset for Music Genre Classification tasks. The dataset is tagged for Audio Classification, Music Information Retrieval, and Music Genre analysis. Specific details on the number of audio samples, features, or temporal coverage are not provided.
Replication data from an experimental design study on saxophone mouthpieces, authored by Robert Kunstadt and hosted by Harvard Dataverse. The dataset includes tags related to mouthpiece components such as Ligature and Mouthpiece Case, indicating a focus on physical design parameters.
Kaggle hosts a dataset exploring the relationship between music and animal emotions. The description suggests it is intended for analysis and visualization. The author, organization, and specific data collection details are not provided.
An unknown number of music artist records includes popularity metrics and genre classifications. The dataset is hosted on Kaggle and is tagged for data storytelling and analytics. The author, organization, and last update date are not specified.
Spotify Tracks DB is a music database containing 232,000 tracks. The description indicates that key, mode, and time signature audio features have been cleaned. The dataset was sourced from Kaggle, but the author, organization, and last update date are unknown.
An audio dataset containing speech samples from multiple Indian languages, sourced from Kaggle. The dataset is tagged for use in Artificial Intelligence and audio processing tasks. Specific details on the number of recordings, contributors, and collection date are not provided.
A curated dataset of voice samples designed for Text-to-Speech voice cloning applications. The dataset includes high-quality audio clips and corresponding metadata, created by sdialog and last updated on December 5, 2025.
Long-term oceanographic observations from two sites in western Massachusetts Bay, collected from 1989 to 2006. The dataset includes over 160 separate mooring deployments across about 90 research cruises, measuring parameters like current, temperature, and salinity. The U.S. Geological Survey conducted this study in cooperation with the Massachusetts Water Resources Authority and the U.S. Coast Guard.
Long-term oceanographic observations from two sites in western Massachusetts Bay, LT-A and LT-B, collected by the U.S. Geological Survey. The dataset spans 16 years from December 1989 to February 2006 and includes over 160 mooring deployments across about 90 research cruises. Measurements include current, temperature, salinity, light transmission, pressure, oxygen, fluorescence, and sediment-trapping rate.
Anime_tts_2 is a dataset for text-to-speech synthesis featuring anime character voices. The dataset's exact size, creator, and creation date are unspecified. It is hosted on the Kaggle platform.
A study dataset contains stimuli, raw data, and statistical analyses for investigating how music training modulates the temporal dynamics of musical tension. The research examines responses to both tonal and atonal music. It was authored by Jiaqi Xu and is hosted by Harvard Dataverse.
YODAS-Granary is a curated subset of the NVIDIA Granary dataset, providing high-quality pseudo-labeled speech data. It is designed for Automatic Speech Recognition and Automatic Speech Translation tasks across 23 European languages. The dataset was shared by ESPnet and last updated on August 8, 2025.