Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
SonoroNova-ES is a large-scale synthetic English-to-Spanish speech-to-speech translation dataset containing 329,764 utterances. It was constructed via cascade pipelines combining text-to-text translation models with neural text-to-speech engines, using source audio derived from the HiFiTTS-2 English audiobook corpus. The dataset features 1,315 unique speakers and provides a total of 961 hours of audio.
License is unknown; terms of use must be verified before application.