Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,908 datasets
Persian-language audio and corresponding text data, likely for automatic speech recognition tasks. The dataset contains approximately 2.69 million entries and was published by Reza2kn on Hugging Face. It was last updated on May 12, 2026.
Voice Acting Pipeline Output is a synthetic emotional speech dataset generated by an automated, multi-GPU system. Each sample consists of 6 audio generations from a consistent speaker, scored across 59 perceptual dimensions by Empathic Insight Voice+. The dataset was created by TTS-AGI and was last updated on March 31, 2026.
A Text-to-Speech corpus for the Kashmiri language, derived from the IndicVoices-R and RASA speech datasets. It was created by GAASH-Lab and used to develop the Bolbosh neural TTS system, as documented in a 2026 paper.
28,946 high-quality voice acting samples generated with Gemini 2.5 Pro Preview TTS, organized into 21 voice identities. Each sample is annotated with 59 Empathic Insight Voice Plus emotion/quality scores, BUD-E Whisper audio captions, and word-level timestamps. The dataset was created by 'laion' and last updated on March 22, -2026.
Education, demographic, and socio-economic indicators for Saint Kitts and Nevis are provided by UNESCO, with the latest update in March 2026. The data aggregates national-level metrics specifically aligned with Sustainable Development Goal 4 (SDG 4) and other policy-relevant frameworks.
81 audio files of Mexican Spanish are included in this dataset of high-quality, real-world speech recordings from native Spanish speakers. The dataset is provided by SilencioNetwork and was last updated on March 30, 2026. It aims to cover the global Spanish-speaking population of over 500 million people across Europe and Latin America.
TTS-German is a high-quality German speech dataset containing 670,509 audio samples totaling 1,250 hours, derived from the CML-TTS German source. The dataset was processed by datadriven-company, with the last update recorded on March 13, 2026. Audio files are standardized to 24kHz mono WAV format, segmented to a maximum of 12 seconds, and include phoneme transcriptions.
Hundreds of hours of high-quality Phonk music, including Drift Phonk and Hard Phonk subgenres, have been scraped and pre-processed for machine learning. The dataset was created by Prhokbvf556 and last updated on Hugging Face in April 2026. It is formatted for efficient training on hardware like TPUs and GPUs.
City of Austin records detail sound ordinance permit applications for events like concrete pourings and outdoor music venues. The data includes application status, case numbers, event descriptions, applicants, dates, and locations. Information is sourced from the city's AMANDA database managed by Development Services.
An automated pipeline for collecting Egyptian Arabic text-audio pairs from YouTube videos. The dataset is created by OmarAhmedSobhy and was last updated on 2026-04-25. It uses forced alignment and automatic speech recognition models to process the audio and text.
Cantonese Audio TTS Dataset is a collection for text-to-speech applications, combining alvanlii/cantonese-radio and alvanlii/cantonese-youtube with an additional dataset of equal size. The dataset creator alvanlii applied filtering and audio enhancement techniques, including the removal of overlapped voices and music. It was last updated on 2026-04-05.
A speech database created by Nordic Language Technology for developing automatic speech recognition and dictation systems in Swedish. The dataset has been reorganized from its original version to improve its usefulness, with changes to the file and folder structure. It is hosted by KTH and was last updated on March 26, 2026.
Guangyangmusic derived this dataset from the OpenScore String Quartets corpus for evaluating Optical Music Recognition systems. It contains a subset of string quartets from the 'long 19th century' with both scanned images of real scores and corresponding MusicXML ground truth, plus clean images rendered from the MusicXML. The dataset was last updated on Hugging Face in April 2026.
OpenScore Lieder is a dataset derived from a corpus of 19th-century songs in MuseScore/MusicXML format, created by Gotham & Jonas in 2022. It is built for evaluating Optical Music Recognition systems under piano-only, full-page conditions. The dataset provides both camera page images from source PDFs and clean page images rendered from ground-truth MusicXML.
PortWatch tracks daily port call counts and shipment volume estimates in metric tons for maritime hubs in Saint Kitts and Nevis. This time-series dataset provides high-frequency monitoring of trade activity updated through March 2026.
A 2026 dataset from Rebrowser provides a satellite radio channel lineup and real-time track history across SiriusXM music channels. The full dataset contains 59.9 million records and is updated daily, though this HuggingFace version is a limited sample. It includes two primary entities: channel metadata and play-by-play track logs.
Maine's coastline from Cutts Island to Prouts Neck is covered by ortho-rectified mosaic tiles. The National Oceanic and Atmospheric Administration (NOAA) produced this data from imagery acquired June 5-7, 2011, using an Applanix Digital Sensor System (DSS). The final mosaic is derived from higher-resolution original aerial photographs.
NOAA's Integrated Ocean and Coastal Mapping initiative produced this ortho-rectified image mosaic. The source aerial photographs were captured with an Applanix Digital Sensor System between June and September 2011. The final mosaic covers ports in the Cape Cod region of Massachusetts.
Coastal Maine from Cutts Island to Prouts Neck is covered by ortho-rectified mosaic tiles from the NOAA Integrated Ocean and Coastal Mapping initiative. The source imagery was acquired on June 7, 2011, using an Applanix Digital Sensor System (DSS). The final ortho-rectified product is derived from higher-resolution original images.
NOAA's 2011 ortho-rectified mosaic tiles were created under the Integrated Ocean and Coastal Mapping initiative. The source imagery was acquired from June to September 2011 using an Applanix Digital Sensor System. The original aerial photographs were captured at a higher resolution than the final mosaic product.