A Text-to-Speech corpus for the Kashmiri language, derived from the IndicVoices-R and RASA speech datasets. It was created by GAASH-Lab and used to develop the Bolbosh neural TTS system, as documented in a 2026 paper.
Use Cases
- Training Kashmiri speech synthesis models based on the described speech corpus
- Benchmarking TTS systems for languages with specific orthographic challenges
- Studying multilingual speech data integration based on the combination of IndicVoices-R and RASA sources
- Developing open-source neural TTS systems for low-resource languages
Strengths
- Derived from two established speech data sources: IndicVoices-R and RASA
- Specifically curated for the Kashmiri language, a low-resource language
- Used to develop a documented, open-source neural TTS system (Bolbosh)
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- GAASH-Lab
- Collection Method
- Curated combination of Kashmiri speech data from IndicVoices-R and RASA datasets
- Freshness
- Last updated 2026-04-03 17:43:37; freshness should be verified
- Geography
- Kashmiri language region