Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
India's linguistic diversity across all districts is captured in this derived dataset from Project Vaani, a large-scale multilingual speech initiative by IISc Bangalore and ARTPARK. The dataset contains noise event timestamps and is actively being built, with a current subset of a planned corpus of approximately 167 hours of training data. The dataset page was last updated on 2026-06-05.
Dataset is actively being built; users should check the Hugging Face page for the latest updates and completeness.