Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
NVIDIA's Granary dataset provides approximately 1 million hours of high-quality speech data across 25 European languages for speech recognition and translation. Released in 2026, it consolidates multiple sources into a unified framework to support low-resource language modeling. The collection is designed for both Automatic Speech Recognition (ASR) and Automatic Speech Translation (AST) tasks.
Users should verify the specific licenses of the consolidated datasets within the unified framework as individual components may have varying restrictions.