Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,908 datasets
U–Pb zircon dating from tuffs in the Canning Basin reveals a 1.7-million-year age conflict between palynological zones. The dataset documents an age of 267.04 ± 0.14 Ma for the Microbaculispora villosa Zone, challenging established biostratigraphic correlations. This research by Mory et al. (2017) presents isotopic and palynological evidence from the middle Permian.
SKT AI LABS sorted this multi-speaker bilingual speech synthesizer dataset. The dataset is intended for text-to-speech applications. It was last updated on 2026-05-19.
ViYT-Diar is a manually annotated audio dataset extracted from Vietnamese YouTube videos. It is designed as a test benchmark for evaluating Speaker Diarization models on in-the-wild data. The dataset was created by author tuanduy1612 and last updated on 2026-04-03.
A multi-source collection of German speech audio paired with transcriptions and English translations, curated by aman4014. The dataset is designed for training and evaluating Automatic Speech Recognition, Speech Translation, and Text-to-Speech systems. It was last updated on March 30, 2026.
GIS point data from the Massachusetts DEP Waterways Program shows locations licensed under Chapter 91 for public access. Each site includes hyperlinks to photos and licenses, as well as a list of amenities like walkways and boat ramps. The dataset supports the Commonwealth's goal to preserve public rights in tidelands and waterways.
Hourly time-series oceanographic data for the Massachusetts coast, collected by the USGS or used in its projects, is available online through the USGS Coastal Marine Time Series Browser. The data includes variables such as current, temperature, pressure, conductivity, and light transmission. Specific deployments range from July 1980 to the present, with a long-term observation series beginning in January 1990.
Structured metadata for Greek laïko music tracks is provided for research and machine learning. The dataset includes fields for emotion, era, and genre but does not contain audio files. It was created by author christosfouk and was last updated on 2026-04-16.
CommonVoice 22 speech data enhanced by Sidon and converted into DAC VAE latent representations. The dataset is provided by TTS-AGI and was last updated on March 22, 2026. Each sample includes original FLAC audio, a corresponding latent vector, and metadata.
Moore Speech Corpora provides aligned audio and text data for the Mooré language (ISO 639-3: mos), curated for low-resource speech processing. The dataset is cleaned and denoised to support text-to-speech and automatic speech recognition research. It was created by goaicorp and last updated in July 2025.
The Waxal dataset is a large-scale multilingual speech corpus specifically designed for African languages. It was created to facilitate research in improving the accuracy and fluency of speech and language technologies across the continent. The dataset supports both Automated Speech Recognition (ASR) and Text-to-Speech (TTS) tasks.
NCEI Accession 8400047 contains CTD and STD data from R/V OCEANUS Cruise 34, which took place from September 22 to October 3, 1977. The data were received from Dr. Carl Wunsch at MIT and processed by the Woods Hole Oceanographic Institution into the NODC standard High-Resolution F022 format. The dataset provides nearly continuous vertical profiles of temperature, salinity, density, and other parameters at depth intervals as fine as 1 meter, along with station metadata and environmental conditions.
Operation Legato contains 22,060 tokenized music score arrangements sourced from MuseScore. The dataset was created by user hidude562 and was last updated on the platform in April 2026. Each record includes arrangement metadata and alignment information with original audio references.
Documentation related to composer and pianist Hank Hehmsoth's MacDowell Colony Fellowship in 2011 and Norton Stevens Fellowship in 2012. Materials include official correspondence, photographs, musical scores, recordings, and related documentation associated with the residency and fellowship. The dataset was harvested by the Texas Data Repository from a Dataverse source.
Hank Hehmsoth's musical score for the composition Desert Dances, which contributed to his selection as a MacDowell Colony Fellow in 2011 and a Norton Stevens Fellow in 2012. The score is part of the permanent collection of the James Baldwin Library at MacDowell in Peterborough, New Hampshire. This repository copy is provided for scholarly, archival, and research purposes.
A Data Management and Sharing Plan outlines the strategy for handling scientific data from a perovskite solar cell research project. Authored by Jinsong Huang, the plan was last updated on May 11, 2026. It describes the data to be generated and the framework for its management and sharing.
615,000 hours of English speech audio from 239.7 million segments, aggregated from 11 source datasets. The corpus was constructed by KRAFTON from 8 public speech corpora and web-sourced recordings to train the RAON-OpenTTS model. The dataset page was last updated in April 2026.
Elaina Wanderingwitch Audio Ja is a collection of Japanese voice audio clips and corresponding text for the anime character Elaina from 'Majo no Tabitabi'. The dataset was created by user 'yeeko' and was last updated in April 2026.
Delay Doppler Maps (DDMs) calibrated into Power Received and Bistatic Radar Cross Section, collected by the eight-satellite CYGNSS constellation. The dataset includes daily files from up to 8 spacecraft, with a typical latency of approximately 6 days from measurement. This Version 2.1 science-quality release from NASA's POCLOUD supersedes Version 2.0 with improved calibration and coverage.
1 million hours of English audio-text data was collected from the public internet by AllenAI. The dataset includes a variety of speaking styles, accents, and audio setups, supporting the training of the OLMoASR speech recognition models.
Tadabur provides between 100,000 and 1,000,000 Qur'anic recitation audio records for Arabic speech research, released by Faisal Alherran in 2026. The collection supports specialized tasks such as tajwīd-aware speech processing and reciter modeling across diverse vocal styles.