DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

CoRal: Danish Conversational and Read-Aloud Speech with 100K+ Records | DataSalon

Home Speech & AudioCoRal: Danish Conversational and Read-Aloud Speech with 100K+ Records

Speech & Audio

CoRal: Danish Conversational and Read-Aloud Speech with 100K+ Records

Name: CoRal: Danish Conversational and Read-Aloud Speech with 100K+ Records
Creator: CoRal-project
Published: 2024-08-26T16:53:42
Keywords: Librarypolars, Librarydask, Modalityaudio, Languageda, Modalitytext, Size Categories100 Kn1 M, Librarymlcroissant, Task Categoriesaudio Classification, Licenseopenrail, Librarydatasets, Parquet, Regionus, Task Categoriesautomatic Speech Recognition

by CoRal-project·Updated 1y ago

Available on 1 platform

Description

CoRal provides between 100,000 and 1,000,000 Danish audio recordings and transcriptions for Automatic Speech Recognition (ASR) tasks. Created by the CoRal-project and updated in early 2025, the collection includes both conversational and read-aloud speech samples across various dialects and age groups.

Use Cases

Training ASR models using the audio and text transcription pairs
Classifying speaker demographics using the age and gender labels
Analyzing dialectal variations in Danish speech using the dialect and accent metadata

Strengths

Scale of 100,000 to 1,000,000 records
Dual modality featuring both conversational and read-aloud speech
Inclusion of diverse Danish dialects and accents

Limitations

Lack of detailed column documentation in the primary metadata
Potential for label noise if transcriptions were not expert-verified

Provenance

Source: CoRal-project
Collection Method: Recorded speech (conversational and read-aloud)
Freshness: Last updated January 2025.
Geography: Denmark

The dataset is licensed under OpenRAIL and stored in Parquet format, requiring compatible libraries like Polars, Dask, or Hugging Face Datasets for efficient processing.

Parquet Librarypolars Librarydask Modalityaudio Languageda Modalitytext Size Categories100 Kn1 M Librarymlcroissant Task Categoriesaudio Classification Licenseopenrail Librarydatasets Regionus Task Categoriesautomatic Speech Recognition

Related Datasets

Quality Score

D35

Description

Source

Reputation

Quality Score

D35

Description

Source

Reputation

Access

Community

388 downloads

20 likes

0 views

Dataset Info

Author: CoRal-project
Created: Aug 26, 2024
Updated: Jan 15, 2025
Last synced: Apr 29, 2026

Access

Community

388 downloads

20 likes

0 views

Dataset Info

Author: CoRal-project
Created: Aug 26, 2024
Updated: Jan 15, 2025
Last synced: Apr 29, 2026

CoRal: Danish Conversational and Read-Aloud Speech with 100K+ Records

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info