Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,943 datasets
Four satellite transmitters tracked the long-distance movements of loggerhead sea turtles after nesting on Bald Head Island, North Carolina. The project, conducted by SCIOPS in July 2003, provides insights into migratory and foraging behavior in the Atlantic Ocean. Data collection concluded in 2003, with the dataset last updated in April 2005.
616 hours of English speech audio derived from the Emilia-Dataset. The audio events are classified using the Scribe v1 tool, which employs ElevenLabs' speech-to-text technology. The dataset is a version 1 release by MrDragonFox, last updated on April 14, 2025.
2006 tidal information for Massachusetts waters, derived from National Oceanic and Atmospheric Administration (NOAA) tidal current tables. The GIS datalayer contains areas where tidal current speeds exceed 3 knots, a threshold for tidal in-stream energy conversion devices. The dataset was created by SCIOPS and last updated in 1997.
Physical oceanographic data from a glider deployment in the Mid-Atlantic Bight coastal waters from May 18 to June 6, 2016. The dataset contains measurements of properties such as temperature, salinity, conductivity, and density. It was collected by the University of Massachusetts and archived by NOAA's National Centers for Environmental Information (NCEI) via the IOOS National Glider Data Assembly Center.
Mid-Atlantic Bight coastal waters were surveyed by a glider named 'blue' deployed by the University of Massachusetts - Dartmouth. The dataset contains physical oceanographic measurements collected from June 27 to July 17, 2015, and was archived by NOAA's National Centers for Environmental Information (NCEI). It represents the first of a planned series of yearly seasonal deployments.
A collection of Hausa language audio samples sourced from Common Voice, paired with transcriptions. The dataset was created by author mide7x and was last updated on May 19, 2025. It is designed for research and applications in speech and language technology.
The first public Rohingya language ASR dataset in AI history contains broadcast audio recordings from the Voice of America Rohingya Service. Each file represents a daily news segment, typically 30 minutes long, automatically segmented into 5–15 second chunks. The dataset was created by author 'freococo' and was last updated on June 14, 2025.
41,427 audio segments from 88 source datasets, merged into a single collection for Turkish speech. The dataset includes 222 speakers and provides transcriptions alongside emotion labels such as neutral, angry, sad, and happy.
25,900 audio samples totaling 100 hours of Vietnamese speech data, originally released by FPT Corporation in 2018. This is an unofficial mirror hosted by 'doof-ferb' after the official link became inactive. The data has been pre-processed to remove non-sense strings and four files missing transcriptions.
6,898,333 rows of chart images paired with text queries and labels, hosted on Hugging Face by ahmed-masry and last updated in March 2024. The dataset is structured for training multimodal models, with each row containing an image name, an input query, and an output label. Its primary use appears to be pretraining models for chart understanding and generation tasks.
EdAcc (The Edinburgh International Accents of English Corpus) is an automatic speech recognition dataset composed of 40 hours of English dyadic conversations. It was created by edinburghcstr and includes speakers with a diverse set of first and second-language English accents, along with linguistic background profiles. The dataset was last updated on February 22,我们发现了一个错误。
AnimeVox is an English Text-to-Speech corpus containing 11,020 audio clips from 19 distinct anime characters across popular series. Each clip includes a high-quality transcription, character name, and anime title. The dataset was created by author 'taresh18' and was last updated on May 27, 2025.
Geochemical Parameters to Evaluate Aquifer Storage and Recovery Reactions with Native Water and Aquifer Materials contains water-quality data for potential ASR source and receiving waters. The dataset was produced by the CEOS_EXTRA organization to support geochemical modeling for Everglades restoration projects. It was last updated in May 2000.
1982 aerial photography of penguin colonies on islands approximately 12km northeast of Brattstrand Bluff, Antarctica, was digitized into DXF files and later georeferenced into a shapefile. The dataset includes digitized colony boundaries and four supporting photographs from 2009. Work was contributed by Eric Woehler, John Cox, Tom Velthuis, and Ursula Harris of the Australian Antarctic Data Centre.
A multilingual speech corpus containing read speech data in three languages from Nigeria: Hausa, Shuwa Arabic, and Kanuri. It was created by CLEAR Global (formerly Translators without Borders) as part of the TWB Voice project to support automatic speech recognition development for underrepresented languages.
Vietnamese speech data comprising 100 hours of audio, released for the VLSP 2020 Automatic Speech Recognition challenge by VinBigData. The dataset is an unofficial mirror hosted on Hugging Face by the user 'doof-ferb', with the original source linked to a 2020 community-sharing event from the VinBigData Institute.
A dataset of MIDI images designed for use with diffusion models for music generation, music classification, and text-to-music tasks. The dataset was created by author asigalov61 and was last updated on August 6, 2025.
Neapolitan-Spoken-Corpus (NSC) is the first publicly available speech corpus for benchmarking Automatic Speech Recognition systems on the Neapolitan dialect. It includes 141 sentence-level audio recordings with gold-standard orthographic transcriptions. The dataset was created by anonymous-nsc-author to address the lack of computational resources for dialectological research.
CoRal V2 provides between 100,000 and 1,000,000 audio records of Danish speech for Automatic Speech Recognition (ASR) tasks. Created by the CoRal-project and updated in June 2025, the collection includes both conversational and read-aloud samples across diverse dialects, accents, and age groups.
A Web Feature Service (WFS) containing the development plan 'Mettstetter Weg Expansion 1' for the municipality of Schopfloch. The data was transformed according to INSPIRE standards and is based on an XPlanung dataset in version 5.0. The service was last updated on October 7, 2024, and is provided by the Bundesamt für Kartographie und Geodäsie.