Visual novel audio recordings paired with transcriptions and Gemini 2.5 Pro generated captions. The collection includes descriptive metadata tags such as emotion, speaker profile, and style to facilitate controllable speech synthesis.
Use Cases
- Train controllable Text-to-Speech models using the emotion and style metadata tags.
- Develop speaker-specific voice synthesis by leveraging the speaker profile descriptions.
- Perform audio-text alignment research using the provided transcriptions and audio pairs.
Strengths
- Includes audio files paired with transcriptions and Gemini 2.5 Pro generated captions.
- Features descriptive metadata tags for emotion, speaker profile, and style.
- Derived from the OOPPEENN/56697375616C4E6F76656C5F4461736574 visual novel dataset.