DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

CapTTS-SFT: Style-Captioned Text-to-Speech Training Data | DataSalon

Home Speech & AudioCapTTS-SFT: Style-Captioned Text-to-Speech Training Data

Speech & Audio

CapTTS-SFT: Style-Captioned Text-to-Speech Training Data

Name: CapTTS-SFT: Style-Captioned Text-to-Speech Training Data
Creator: OpenSound
Published: 2025-04-27T22:12:55
Keywords: Text To Speech, Speech Synthesis, Style Transfer, Text, Audio, Audio Generation

by OpenSound·Updated 10mo ago

Available on 1 platform

Description

OpenSound created this dataset for training CapTTS, EmoCapTTS, and AccCapTTS models, as described in the paper 'CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech'. The dataset was last updated on July 28, 2025. It contains audio-text pairs sourced from multiple original datasets.

Use Cases

Train text-to-speech models based on audio-text pairs.
Develop emotion-captioned speech synthesis based on the described EmoCapTTS task.
Build accent-captioned speech synthesis based on the described AccCapTTS task.
Research style transfer in speech synthesis based on the 'CapSpeech' paper framework.

Strengths

Created for a published academic paper, suggesting a research-grade purpose.
Designed for three specific, advanced text-to-speech tasks (CapTTS, EmoCapTTS, AccCapTTS).
Last updated on July 28, 2025, indicating recent maintenance.

Limitations

Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
The actual audio files are hosted separately, requiring additional steps for access.

Provenance

Source: OpenSound
Collection Method: Aggregated from multiple original datasets.
Freshness: Last updated 2025-07-28 02:34:38

Audio files are hosted separately from the metadata; the 'audio_path' column provides file paths.

Text Audio Text To Speech Speech Synthesis Style Transfer Audio Generation

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

141 downloads

3 likes

0 views

Dataset Info

Author: OpenSound
Created: Apr 27, 2025
Updated: Jul 28, 2025
Last synced: May 16, 2026

Access

Community

141 downloads

3 likes

0 views

Dataset Info

Author: OpenSound
Created: Apr 27, 2025
Updated: Jul 28, 2025
Last synced: May 16, 2026

CapTTS-SFT: Style-Captioned Text-to-Speech Training Data

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info