Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Comprising between 100,000 and 1,000,000 parallel speech pairs for Hindi-to-English translation, released by mahendraphd in 2026. It features natural English speech sourced from TED talks paired with synthetic Hindi speech to support research in low-resource speech-to-speech translation (S2ST).
The Hindi audio component is synthetic; users should be aware that models trained on this may require further fine-tuning for natural Hindi speech. The dataset is provided under a CC-BY-4.0 license.