Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VoxLingua107 is a speech dataset for training spoken language identification models. It contains 6628 hours of short speech segments sourced from YouTube videos, covering 107 languages. The dataset was created by SEACrowd and was last updated in June 2024.
License is unknown; terms of use must be verified before application.