Sign in to view source links and access this dataset
Description
DNS5 Challenge data is a mirrored collection of audio files for speech enhancement tasks. It contains 245 hours of English, 95 hours of French, and 137 hours of German speech sourced from LibriVox, AudioSet, Freesound, OpenSLR26, and OpenSLR28. The dataset was converted to Opus format by user philgzl and last updated in May 2026.
Use Cases
Training speech denoising models based on the described speech and noise audio files.
Benchmarking noise suppression algorithms using the challenge dataset structure.
Developing multilingual speech processing tools based on the included English, French, and German speech splits.
Strengths
Contains 245 hours of English speech across 186,743 files, providing a substantial corpus.
Includes multilingual speech data with 95 hours of French and 137 hours of German audio.
Files are converted to Opus format, which likely reduces storage and streaming overhead.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
The dataset excludes several source components (VCTK, VocalSet, CREMA-D, VoxCeleb2, DEMAND) mentioned in the warning.
Provenance
Source
Mirror of the DNS5 Challenge data, sourced from LibriVox, AudioSet, Freesound, OpenSLR26, and OpenSLR28.
Collection Method
Original WAV files were converted to Opus format.
Freshness
Last updated 2026-05-12 15:14:22; freshness should be verified.
License is unknown and should be verified before use. The dataset is a mirror; users should refer to the original DNS5 challenge for authoritative context.