Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,912 datasets
A curated Russian speech dataset for advanced speech generative tasks. The corpus was filtered and annotated by the lab260 team at MTUCI using the BALALAIKA pipeline. It includes genres such as podcasts, public speech, YouTube content, audiobooks, phone calls, and TTS.
Encompassing Slovak speech recordings from a female voice, suitable for text-to-speech (TTS) model training. It provides Slovak language transcripts and audio with a 48,000 Hz sample rate.
Sentinel-2 ACOLITE-DSF Aquatic Reflectance for the Conterminous United States is a dataset of unitless water-leaving radiance reflectance values. The data was produced by the United States Geological Survey using the ACOLITE software's dark spectrum fitting algorithm for atmospheric correction. The dataset covers the conterminous United States.
Somali Asr Subset 68H is a speech dataset published on the Hugging Face platform by DDD-Kenya. The dataset's title suggests it contains audio data for the Somali language, likely intended for automatic speech recognition tasks. The record was last updated on March 19, 2026, but detailed metadata about its size, format, and contents is unavailable.
A speech dataset containing recordings of Nigerian-accented English and Nigerian Pidgin, intended for research and development in speech technology. The dataset includes audio files paired with transcriptions. It was authored by AlaminI and last updated on February 8, 2026.
VocalSet is a singing voice dataset containing 10.1 hours of monophonic audio recordings. It features 20 professional singers (9 male, 11 female) performing standard and extended vocal techniques on five vowels. The dataset was created by Julia Wilkins to support singing voice research.
A book analysis of military and civilian influence on decisions regarding the use of force in U.S. foreign affairs. The work examines twenty intervention decisions and ten escalation decisions during crises, including cases in Korea, Berlin, Cuba, and Vietnam. An updated edition includes a preface and epilogue discussing recent cases and declassified information.
This dataset examines changes in accompanist music for Wayang Kulit (leather puppet) performances in Java and Bali, Indonesia, from a performing art management perspective. It was authored by Setyabudhi R. Situmorang and last updated in February 2026.
This dataset examines changes in accompanist music for Wayang Kulit (leather puppet) performances from a performing art management perspective. The research focuses on innovations driven by a need to engage younger audiences in Java and Bali, Indonesia. Specific data dimensions such as row count, column count, and sample data are not provided.
Isolated guitar chord recordings are designed for audio classification tasks like chord recognition and real-time music analysis. The dataset was recorded on a Fender FA-15 3/4 acoustic guitar in realistic acoustic conditions, including minor background sounds, to improve inference robustness. The author is rodriler, with a last recorded update in February 2026.
Published on huggingface by author skit-ai and last updated on 2026-03-25. The dataset, titled 'Emotion Tts', likely contains audio samples and associated metadata for text-to-speech synthesis. Its specific content, scale, and structure require verification after download.
A speech dataset designed for deepfake audio detection, containing both real and fake audio samples. The dataset was sourced from Kaggle, but the author, organization, and specific collection details are unknown. The total size, number of samples, and last update date are not provided.
22 Indian language speech subsets provided in Parquet format for the Hugging Face ecosystem. The collection includes language-specific configurations for modular access to audio data and transcriptions sourced from the AI4Bharat Nirantar project.
2502_dataset_TTS is a Kaggle-hosted collection likely containing audio data for text-to-speech applications. The dataset's specific content, size, and origin are unconfirmed due to minimal metadata. Its title suggests it may include speech samples or synthesis parameters for machine learning model training.
A dataset likely focused on the recognition of musical notes from audio signals. The title suggests it includes a cross-validation scheme, which may indicate a structured setup for model evaluation. It is published on Kaggle, but details on its size, origin, and specific content are unavailable.
A dataset titled 'hinglish-tts-src' is hosted on Kaggle. The title suggests it contains source materials for text-to-speech synthesis in Hinglish, a code-mixed language of Hindi and English. The dataset's specific content, size, and creation details are unknown from the provided metadata.
Kaggle hosts the RTTS_C0C0 dataset. The title suggests it may relate to a specific project or codename. Its content and structure require verification after download.
A 235-mile polyline feature depicting the New England National Scenic Trail from the Long Island Sound in Guilford, Connecticut, to the Massachusetts/New Hampshire border. The dataset was created by combining work from the Connecticut Forest & Park Association and the Appalachian Mountain Club. It was last updated on March 4, 2026.
Kikuyu language audio data, preprocessed for automatic speech recognition tasks. The dataset was published on huggingface by the author InterstellarCG and was last updated on March 16, 2026. The specific content, scale, and preprocessing methods require verification after download.
Dysarthric speech data published on Kaggle. The dataset likely contains audio recordings of speech affected by motor speech disorders. Specific details on size, collection method, and origin are not provided in the available metadata.