Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,908 datasets
Structured metadata for Greek laΓ―ko music tracks, intended for research and machine learning. The dataset includes fields for emotion, era, and genre but does not contain audio files. It was created by author christosfouk and was last updated on 2026-04-16.
Librispeech-PC 44kHz Opus replaces the original Librispeech PC audio with higher-quality source material encoded as Opus at 64 kbps. Sampling rates are increased from 16kHz up to 48kHz, depending on the source. The dataset was created by mythicinfinity and last updated on March 28, 2026.
A SQLite database contains user votes and feedback from TTS Arena, a platform for comparing text-to-speech models. The dataset was created by Pendrokar and last updated in April 2026. It is designed to help developers identify model faults through community evaluation.
A 2016 geospatial dataset from NOAA characterizing macroalgae beds for oil spill sensitivity planning in Massachusetts and Rhode Island. Vector points represent vegetation beds, with associated tables containing species-specific abundance, seasonality, and life history information. The data is part of a larger Environmental Sensitivity Index (ESI) effort to map coastal resources.
National Oceanic and Atmospheric Administration (NOAA) data for Massachusetts and Rhode Island contains sensitive biological resource data for benthic species. Vector polygons represent submerged aquatic vegetation and macroalgae, with associated tables for species abundance, seasonality, and life history. This data is part of the Environmental Sensitivity Index (ESI) characterizing coastal environments by their sensitivity to oil spills.
A speech recognition dataset sourced from YouTube, likely containing audio and corresponding transcriptions. It was published by user 'veziriii' on Hugging Face and was last updated on May 23, 2026. The specific content and scale require verification after download.
SMAPVEX19-22 field campaign collected daily mosaicked UAVSAR images at three polarization configurations from April to July 2022 near Petersham, Massachusetts. The terrain-flattened gamma-corrected radar data targets forested land cover to validate satellite-derived soil moisture estimates. This dataset supports the Soil Moisture Active Passive Validation Experiment's goal of improving remote sensing accuracy in vegetated areas.
University of Rhode Island researchers collected 450 to 760-meter temperature-depth profiles using expendable bathythermographs from two ships during the Frontal Air-Sea Interaction Experiment. Data points were recorded at non-uniform 'inflection points' to accurately define the temperature curve, rather than at fixed depth intervals. This dataset supports the study of ocean fronts and air-sea interactions in the Northwest Atlantic from 1984 to 1986.
68,677 synthetic speech clips across 9 languages, generated using the Qwen3-TTS-12Hz-1.7B-Base model with zero-shot voice cloning from 5 reference speakers. The dataset was submitted to the Uncharted Data Challenge hosted by Adaption Labs and is authored by Reubencf. It was last updated on 2026-04-15.
North Atlantic Ocean data from the ATLANTIS II research vessel cruise 31AN19810612, collected between June 12 and July 8, 1981. The dataset contains discrete sample and profile measurements of dissolved oxygen, nitrate, nitrite, phosphate, silicate, salinity, and water temperature, gathered using CTD and bottle instruments. It is part of the GLODAPv2 compilation, contributed by Carl Wunsch of the Massachusetts Institute of Technology.
Majestrino Unified Detailed Captions is a filtered subset of the laion/majestrino-data collection, containing all samples with a unified_detailed_caption field. The dataset comprises 4,658,407 samples, packaged in approximately 932 tar files totaling around 1,017 GB. It was created by TTS-AGI and last updated on March 29, 2026.
Odia Indextts2 Processed is a dataset uploaded to HuggingFace by author Akira2049. The title suggests it contains processed data for text-to-speech (TTS) tasks in the Odia language, an Indian language spoken primarily in Odisha. The dataset was last updated on 2026-05-27, but specific details on size, format, and content are not provided in the metadata.
A personal cloud storage repository for synchronizing a local music player. The dataset, created by ZHIWEI666, likely contains music files, cover art, lyrics, and user metadata. It was last updated on May 1, III.
Copernicus provides 10-day composite GEOTIFF files measuring Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) anomalies for Saint Kitts and Nevis. These biophysical measurements, derived from the Visible Infrared Imaging Radiometer Suite (VIIRS), track vegetation health and agricultural drought impacts. The records are updated through March 2026.
Teacher demographics and text metadata for Music Teachers on Television. The dataset was authored by Hugh Gundlach and last updated on April 27, 2026. It is hosted on figshare under a CC-BY-4.0 license.
A synthetic medical speech dataset contains 101,475 audio-text pairs totaling 184.1 hours of 16 kHz mono speech. It was generated by IntelMedica using the Kokoro-82M TTS system with 19 voices across three English accent groups, focusing on clinical and nursing terminology. The dataset version was noted in April 2026.
A monolingual Hindi text-to-speech dataset containing 6,926 utterances from a single female speaker. The audio data is embedded in parquet files at a 48kHz sampling rate and was extracted from the IndicTTS project by SPRINGLab at IIT Madras. The dataset was uploaded to Hugging Face by the user 'somu9'.
The Cebuano Speech Dataset provides 108 hours of audio data across 807 files in MP3 and WAV formats. It was created by Speech-data and includes balanced voice data with 49% female and 51% male speakers aged 18 to 50+ years.
Food Prices for Saint Kitts and Nevis from the FAOSTAT bulk data service. The dataset covers categories including Consumer Price Indices, Deflators, Exchange rates, and Producer Prices. It is published by the Food and Agriculture Organization (FAO) of the United Nations and was last updated on 2026-03-16.
120 Korean speech sentences were generated using the Google Gemini gemini-2.5-pro-preview-tts model with the Zephyr voice. The dataset includes categories for pronunciation, prosody, emotion, and intonation. Audio files are in 24kHz, 16-bit, mono WAV format.