Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,912 datasets
528.2 hours of filtered Russian speech data across the audiobook genre. The corpus is processed through the BALALAIKA pipeline by the MTUCI lab260 team for generative speech tasks.
1,941 matches from the 2024/2025 European football season across six major competitions. The dataset, created by Tarek Masryo, includes results, dates, referees, and detailed score breakdowns. It was last updated on Hugging Face in February 2026.
A dataset concerning music industry sales, likely covering a 40-year period. It was published on Kaggle, but the author, organization, and specific data collection method are unknown. The dataset's exact size, structure, and variables are not detailed in the provided metadata.
Lyrics and metadata for songs spanning 70 years from 1950 to 2019. The dataset includes features such as sadness, danceability, loudness, and acousticness. It was published on Mendeley Data in 2020 by authors Moura, Luan; Fontelles, Emanuel; Sampaio, Vinicius; Frana, Mardnio.
NOAA's Integrated Ocean and Coastal Mapping initiative produced this ortho-rectified mosaic from aerial imagery. The source data was captured in a single day on June 19, 2011, using an Applanix Digital Sensor System. The final mosaic tiles are derived from higher-resolution original photographs.
Massachusetts coastal imagery from the NOAA Integrated Ocean and Coastal Mapping initiative. The ortho-rectified mosaic was created from aerial photographs captured on June 19, 2011, using an Applanix Digital Sensor System. The original source imagery was acquired at a higher resolution than the final mosaic product.
A 2011 ortho-rectified mosaic of the Merrimack River and Plum Island Sound in Massachusetts, created by the NOAA Integrated Ocean and Coastal Mapping initiative. The source imagery was acquired on June 19, 2011, using an Applanix Digital Sensor System (DSS). The final mosaic is derived from higher-resolution original images.
2011 NOAA Ortho-rectified Mosaic of Merrimack River and Plum Island Sound, Massachusetts (Mean Lower Low Water) is a set of ortho-rectified mosaic tiles produced by the NOAA Integrated Ocean and Coastal Mapping initiative. The source aerial imagery was acquired on June 19, 2011, using an Applanix Digital Sensor System. The final ortho-rectified mosaic is derived from higher-resolution original images.
XTTSv2 Final is a dataset hosted on Kaggle. The title suggests it contains outputs or training data related to the XTTSv2 text-to-speech model. The dataset's specific content, size, and creator are not detailed in the provided metadata.
NV-Bench is a benchmark dataset for evaluating nonverbal vocalization synthesis in text-to-speech models, created by AnonyData and last updated on March 1, 2026. It comprises 1,651 samples grounded in a functional taxonomy that treats nonverbal vocalizations as communicative acts. The dataset is hosted on Hugging Face and aims to provide standardized metrics and reliable ground truth references for this expressive TTS subfield.
AudioX-IFcaps contains over 7 million audio samples with instruction-following captions, developed by HKUSTAudio for ICLR 2026. The dataset provides structured annotations for audio and music generation, focusing on sound event categories, counts, and temporal ordering.
A multimodal dataset titled '1Hit.No Music Images' was published on HuggingFace by author MySafeCode. The dataset was last updated on March 22, 2026. Its specific content and scale are not detailed in the available metadata.
IR-MUSIC-LYRICS is a dataset of music lyrics, likely for information retrieval or natural language processing tasks. It is hosted on Kaggle, but its specific size, origin, and update history are not detailed in the available metadata. The dataset's content and structure require verification after download.
XTTS v2 pretrained model weights published on Kaggle. The dataset likely contains the necessary files for a text-to-speech synthesis system. Its specific contents, such as model checkpoints and configuration files, require verification after download.
MTG-Jamendo provides metadata, scripts, and baselines for music autotagging research, created by the Music Technology Group (MTG). It serves as a benchmark for audio analysis tasks using tracks sourced from the Jamendo platform under Creative Commons licenses.
Librispeech Synth 300h max 20spks is an audio dataset published on Kaggle. The title suggests it contains up to 300 hours of synthetic speech audio, likely generated from the LibriSpeech corpus, featuring a maximum of 20 distinct speakers. Its specific creation method and exact content require verification after download.
Pittsburgh is a dataset published on Kaggle. The specific content, size, and features are not described in the provided metadata. The actual data requires download and inspection to determine its scope and utility.
Spotify data aggregated for analysis, likely containing metrics related to music tracks, artists, and listener engagement. The dataset appears to be sourced from the Kaggle platform, but specific details on volume, author, and update frequency are not provided. Its purpose is to turn raw streaming data into actionable insights for the music industry.
Anyplace But Here is a historical text originally published in 1945 and revised in 1966. The work details the African American search for a home in the North through stories of real individuals, covering themes of hope and disappointment. It includes chapters on figures like Marcus Garvey, Malcolm X, and events in Detroit, Chicago, and Watts.
1917-1929 coverage of the financial debt owed by France to the United States. The dataset is sourced from paperswithcode and is described in the context of a historical narrative about American expatriates in Paris. The license is closed, and other metadata such as author and update date are unknown.