Loading...
Loading...
Speech recognition, text-to-speech, speaker identification, music classification, audio event detection
1,938 datasets
July 2001 to November 2002 data collected during the Pittsburgh Supersite Program. The dataset contains meteorological measurements including temperature, relative humidity, precipitation, wind speed and direction, UV intensity, and solar intensity. It was produced by the NARSTO EPA partnership to characterize particulate matter and its links to public health.
Aggregating 227 hours of Spanish speech data recorded by native speakers from Spain, Mexico, and Venezuela via mobile phones. The recordings, made in quiet environments, cover fields like economy, entertainment, and news, with all texts manually transcribed to 95% sentence accuracy.
A three-dimensional numerical model simulates circulation in Massachusetts and Cape Cod Bays, driven by tides, wind, river runoff, and thermal forcing. The U.S. Geological Survey developed this model to study the transport of nutrients, contaminants, and red tide populations. The dataset was last updated in 1992.
Encompassing 9.5 hours of valid British English speech data from 349 native English speakers. Each speaker recorded about 50 sentences across multiple fields such as car, home, and voice assistant in a quiet environment.
A collection of 1,796 hours of German audio data recorded by 3,442 native speakers using mobile phones. The text prompts were designed by linguistic experts and manually proofread, covering categories like generic, interactive, on-board, and home scenarios.
216,284 Irish tunes in ABC notation, split into 214,122 for training and 2,162 for validation. The Irish Massive ABC Notation (IrishMAN) dataset was compiled from traditional music sources thesession.org and abcnotation.com. It was created by sander-wood and last updated on March 16,我们发现了一个问题。
The Zeroth-Korean dataset contains approximately 51.6 hours of training data and 1.2 hours of test data for Korean automatic speech recognition. It was created by kresnik and last updated in October 2024.
German Asr Mixed Whisper is a speech dataset created by user flozi00 and last updated on July 11, 2025. The dataset is a mixture of several German and multilingual speech datasets, including the TUDA-De German Speech Corpus and Mozilla Common Voice. The licensing for each component dataset is determined by its original author.
Comprising between 1,000 and 10,000 audio-transcript pairs for Air Traffic Control speech recognition, compiled by user jacktol in 2025. It merges the UWB ATC Corpus and the ATCO2 1-Hour Test Subset into a fine-tuning-ready format. The records consist of cleanly segmented 16kHz .wav files paired with text utterances.
UrduMegaSpeech-1M is a large-scale Urdu-English parallel speech corpus containing over one million audio-text samples. It provides high-quality audio recordings paired with Urdu transcriptions and English source text, created by author humair025 for tasks like automatic speech recognition and speech translation.
LJ Speech contains 13,100 short audio clips of a single speaker reading from seven non-fiction books, totaling approximately 24 hours of English speech. Released by Keith Ito, the dataset provides expert-generated transcriptions for every recording to support speech synthesis and recognition tasks.
A Korean multispeaker speech corpus project for text-to-speech research. It contains preprocessed speech–text pairs, metadata, and linguistic annotations for model training. The dataset was created by aanonyyy and last updated on Hugging Face in October 2025.
FLEURS Farsi contains the Persian (Farsi, fa_ir) portion of the FLEURS speech dataset created by Google. This version has been processed into a Hugging Face datasets compatible format by MohammadGholizadeh and was last updated on June 5, 2025. The dataset is designed for evaluating speech recognition systems, particularly in low-resource scenarios.
A French departmental map service identifies land sectors impacted by noise from major transport infrastructure, as mandated by national law. The dataset is based on a prefectural classification of roads with over 5,000 vehicles per day, intercity rail lines with over 50 trains daily, and public transport lines with over 100 buses. It was last updated by the Bureau de Recherches Géologiques et Minières on September 3, 2021.
A 2021 update of a geospatial dataset mapping the sound classification of railway and tramway infrastructure in the Hérault department of France. The classification, established by prefectoral decrees in 2014 and 2007, categorizes land transport infrastructure into five noise levels and defines affected areas on either side of the tracks. The data is provided by the Bureau de Recherches Géologiques et Minières (BRGM) as a Web Map Service (WMS).
NADI2025 Subtask2 is a benchmark for developing Automatic Speech Recognition systems that transcribe Arabic speech across multiple dialects. The dataset is hosted by UBC-NLP on Codabench and was last updated on June 12, 2025. It provides training and validation data for handling phonetic and dialectal variation, with a private test set to be released.
Published on huggingface by author akh99 and last updated on 2026-01-10 10:25:46. The dataset title 'Indictts Hinglish' suggests it contains text data, likely involving the Hinglish language variety. The platform tags indicate it is a text modality dataset stored in an optimized Parquet format.
siddiqiya's dataset is a specialized Arabic corpus combining the Quran and approximately 10,000 non-repetitive hadith from 14 books, including the 'magma'a el zawa'ed' compilation. It is intended to train and evaluate speech recognition systems to prevent AI from altering sacred scriptures. The dataset also incorporates other existing speech datasets like Common Voice, Fleurs, and Media Speech.
HiFiTTS-2 is a large-scale speech dataset from NVIDIA, containing metadata for approximately 36.7 thousand hours of audio derived from LibriVox audiobooks. The metadata includes estimated bandwidth and corresponds to audio from 5 thousand speakers, recorded at a 48 kHz sampling rate. The dataset was last updated on the platform in November 2025.
10K - 100K audio samples with transcriptions in Somali, designed for automatic speech recognition tasks. The dataset is hosted on Hugging Face by the author 'skydheere' and was last updated on 2025-05-09. It is provided in Parquet format under a CC-BY 4.0 license.