DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Mandarin Speech Recordings from Children Aged 3 to 5 | DataSalon

Home Speech & AudioMandarin Speech Recordings from Children Aged 3 to 5

Speech & Audio

Mandarin Speech Recordings from Children Aged 3 to 5

Name: Mandarin Speech Recordings from Children Aged 3 to 5
Creator: BAAI
Published: 2025-03-12T07:27:28
Keywords: Size Categories10 Kn100 K, Languagezh, Librarywebdataset, Licensecc By Nc Sa 40, Modalitytext, Librarymlcroissant, WEBDATASET, Librarydatasets, Regionus, Task Categoriesautomatic Speech Recognition, Arxiv240918584

by BAAI·Updated 1y ago

Available on 1 platform

Description

ChildMandarin is an open-source speech dataset containing Mandarin Chinese audio from young children aged 3 to 5. Created by BAAI, it addresses a lack of public resources for this demographic, enabling research in automatic speech recognition and speaker verification.

Use Cases

Train an automatic speech recognition (ASR) model on Mandarin audio from children aged 3-5.
Develop speaker verification (SV) systems using child speech samples.
Analyze phonetic and prosodic features unique to young Mandarin-speaking children.

Strengths

Dataset specifically targets the under-resourced demographic of children aged 3 to 5.
Focuses on Mandarin Chinese speech, a major world language.
Open-source availability facilitates academic and applied research.

Limitations

Sample size and recording duration are unknown, limiting assessment of scale.
Geographic origin and recording conditions are unspecified, potentially introducing bias.
Audio quality, speaker count, and label details are not provided in the input.

Provenance

Source: BAAI via Hugging Face.
Collection Method: null
Time Range: null
Freshness: Last updated May 19, 2025.
Geography: null

null

WEBDATASET Size Categories10 Kn100 K Languagezh Librarywebdataset Licensecc By Nc Sa 40 Modalitytext Librarymlcroissant Librarydatasets Regionus Task Categoriesautomatic Speech Recognition Arxiv240918584

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

80 downloads

35 likes

0 views

Dataset Info

Author: BAAI
Created: Mar 12, 2025
Updated: May 19, 2025
Last synced: Jun 8, 2026

Access

Community

80 downloads

35 likes

0 views

Dataset Info

Author: BAAI
Created: Mar 12, 2025
Updated: May 19, 2025
Last synced: Jun 8, 2026

Mandarin Speech Recordings from Children Aged 3 to 5

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info