NPSC: Norwegian Parliament Speech Corpus

Name: NPSC: Norwegian Parliament Speech Corpus
Creator: NbAiLab
Published: 2022-03-02T23:29:22
Keywords: Source Datasetsoriginal, Language Creatorsfound, Licensecc0 10, Languageno, Annotations Creatorsno Annotation, Task Categoriesaudio Classification, Languagenn, Regionus, Speech Modeling, Task Categoriesautomatic Speech Recognition, Multilingualitymonolingual, Languagenb

by NbAiLabUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

140 hours of Norwegian speech recordings from 40 days of parliamentary meetings, transcribed into 65,000 sentences in both Bokmål and Nynorsk. The dataset includes 1.2 million words and links audio segments to speaker metadata such as gender, age, and dialect.

Use Cases

Train automatic speech recognition (ASR) models for Norwegian using the orthographic transcriptions in Bokmål and Nynorsk
Perform dialectal speech analysis by correlating audio features with the place of birth metadata
Develop speaker identification systems using the speaker_id and associated demographic labels
Conduct linguistic research on parliamentary discourse by linking audio segments to official records via the proceedings_id

Strengths

140 hours of audio recordings covering 40 full days of parliamentary sessions
65,000 orthographically transcribed sentences totaling 1.2 million words
Metadata includes speaker_id linked to gender, age, and place of birth for dialect analysis
Integration with official proceedings via a proceedings_id column

Source Datasetsoriginal Language Creatorsfound Licensecc0 10 Languageno Annotations Creatorsno Annotation Task Categoriesaudio Classification Languagenn Regionus Speech Modeling Task Categoriesautomatic Speech Recognition Multilingualitymonolingual Languagenb

Related Datasets

Quality Score

D36

Description

40

Source

36

Reputation

38

Access

22

Community

1.5K downloads

9 likes

0 views

Dataset Info

Author: NbAiLab
Created: Mar 2, 2022
Updated: Aug 14, 2024
Last synced: Apr 29, 2026

Access

22

Community

1.5K downloads

9 likes

0 views

Dataset Info

Author: NbAiLab
Created: Mar 2, 2022
Updated: Aug 14, 2024
Last synced: Apr 29, 2026

NPSC: Norwegian Parliament Speech Corpus

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info