Name: Danish ASR Unified: 3.5 Million Speech Samples from 7 Sources
Creator: syvai
Published: 2026-04-06T08:32:13
Keywords: Audio Dataset, Parliament Speech, Broadcast Media, Audio, Danish Language, Speech Recognition

Description

A unified Danish speech recognition dataset combines approximately 3.5 million audio samples from seven distinct sources, totaling roughly 16,000 hours of speech. The collection includes European and Danish Parliament recordings, read-aloud and conversational speech, broadcast media, and crowd-sourced samples. It was created by syvai and last updated on the Hugging Face platform in April 2026.

Use Cases

Train Danish speech recognition models based on the large-scale, multi-source audio collection.
Fine-tune models for parliamentary speech recognition based on the VoxPopuli and ftspeech sources.
Develop models for conversational speech understanding based on the CoRal-v3 conversation subset.
Benchmark ASR performance on read-aloud speech based on the CoRal-v3 read_aloud and nst-da sources.
Create systems for transcribing broadcast media based on the nota source.

Strengths

Large scale with approximately 3.5 million audio samples.
Diverse speech sources covering parliamentary, conversational, read-aloud, and broadcast contexts.
Significant total duration of roughly 16,000 hours of Danish speech.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Specific file formats, licensing, and detailed sample structure are unknown from the provided metadata.

Provenance

Source: Combined from 7 sources: VoxPopuli, ftspeech, CoRal-v3 read_aloud, nst-da, CoRal-v3 conversation, nota, Common Voice 17.
Collection Method: Likely aggregated and unified from existing public speech datasets.
Freshness: Last updated 2026-04-06 13:10:45; freshness should be verified.
Geography: Denmark (Danish language focus).

License information is unknown and must be verified for each constituent source before commercial use.

Audio Audio Dataset Parliament Speech Broadcast Media Danish Language Speech Recognition

Danish ASR Unified: 3.5 Million Speech Samples from 7 Sources

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info