FCaps: Fine-grained Speaking Style Captions

Name: FCaps: Fine-grained Speaking Style Captions
Creator: yfyeung
Published: 2026-01-09T09:59:59
Keywords: Languageen, Size Categories10 Mn100 M, Licensecc By Nc Sa 40, Arxiv260103065, Doi1057967hf7585, Regionus

by yfyeungUpdated 4mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

47,000 hours of speech audio and 19 million fine-grained speaking style captions categorized into splits like FCaps-PSCBase and FCaps-Emilia. The dataset provides open-ended descriptions of vocal characteristics for large-scale speech modeling and synthesis.

Use Cases

Train text-to-speech models using the fine-grained speaking style descriptions to control prosody and emotion.
Develop automated speech captioning systems to generate natural language descriptions of vocal styles.
Fine-tune audio-language models for cross-modal retrieval using the 19 million captions.

Strengths

47,000 hours of speech audio data
19 million open-ended and fine-grained speaking style captions
Includes FCaps-PSCBase, dev, test, and FCaps-Emilia subsets
Integrates English-language audio from the Emilia-Dataset

Languageen Size Categories10 Mn100 M Licensecc By Nc Sa 40 Arxiv260103065 Doi1057967hf7585 Regionus

Related Datasets

Quality Score

D38

Description

43

Source

36

Reputation

42

Access

22

Community

15 downloads

4 likes

0 views

Dataset Info

Author: yfyeung
Created: Jan 9, 2026
Updated: Feb 7, 2026
Last synced: Jun 11, 2026

Access

22

Community

15 downloads

4 likes

0 views

Dataset Info

Author: yfyeung
Created: Jan 9, 2026
Updated: Feb 7, 2026
Last synced: Jun 11, 2026

FCaps: Fine-grained Speaking Style Captions

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info