140 hours of Norwegian speech recordings from 40 days of parliamentary meetings, transcribed into 65,000 sentences in both Bokmål and Nynorsk. The dataset includes 1.2 million words and links audio segments to speaker metadata such as gender, age, and dialect.
Use Cases
- Train automatic speech recognition (ASR) models for Norwegian using the orthographic transcriptions in Bokmål and Nynorsk
- Perform dialectal speech analysis by correlating audio features with the place of birth metadata
- Develop speaker identification systems using the speaker_id and associated demographic labels
- Conduct linguistic research on parliamentary discourse by linking audio segments to official records via the proceedings_id
Strengths
- 140 hours of audio recordings covering 40 full days of parliamentary sessions
- 65,000 orthographically transcribed sentences totaling 1.2 million words
- Metadata includes speaker_id linked to gender, age, and place of birth for dialect analysis
- Integration with official proceedings via a proceedings_id column