Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
OpenSound created this dataset for training CapTTS, EmoCapTTS, and AccCapTTS models, as described in the paper 'CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech'. The dataset was last updated on July 28, 2025. It contains audio-text pairs sourced from multiple original datasets.
Audio files are hosted separately from the metadata; the 'audio_path' column provides file paths.