Audio recordings and orthographic transcriptions from the Norwegian Parliament categorized into Norwegian Bokmål and Norwegian Nynorsk written standards. The corpus serves as a benchmark for Norwegian Automatic Speech Recognition (ASR) systems using official parliamentary proceedings.
Use Cases
- Evaluate ASR word error rates (WER) using the provided orthographic transcriptions as ground truth.
- Compare speech recognition accuracy across Norwegian Bokmål and Norwegian Nynorsk written standards.
- Develop acoustic models for parliamentary domain speech using the audio and text pairs.
Strengths
- Contains orthographic transcriptions in Norwegian Bokmål and Norwegian Nynorsk.
- Features audio recordings from official Norwegian Parliament (Stortinget) sessions.
- Provides a dedicated test split for evaluating Norwegian ASR model performance.