700 Hours of Hindi English Hinglish TTS Audio

Name: 700 Hours of Hindi English Hinglish TTS Audio
Creator: adjaysagar
Published: 2026-02-08T13:51:27
Keywords: Regionus, Licenseapache 20

by adjaysagarUpdated 5mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

700 hours of processed speech data for Hindi, English, and Hinglish (code-mixed) text-to-speech applications. The dataset, created by adjaysagar, includes train and validation manifests and a preprocessing script. It was last updated in February 2026.

Use Cases

Train a TTS model on 700 hours of high-quality audio data for Hindi, English, and Hinglish speech synthesis.
Use the provided train.jsonl and val.jsonl manifests to manage data splits for model training and validation.
Apply the included preprocessing script to replicate the dataset preparation pipeline for custom TTS projects.
Develop multilingual or code-mixed speech synthesis systems leveraging the Hinglish audio content.

Strengths

Substantial 700-hour volume of audio data suitable for training TTS models.
Includes a preprocessing script, providing transparency into the data preparation methodology.
Covers three distinct linguistic categories: Hindi, English, and code-mixed Hinglish.

Limitations

The audio files are contained in a password-protected archive, requiring manual contact for access.
Specific details on audio quality metrics, speaker demographics, or recording conditions are not provided.
No information on file formats, sample rates, or licensing terms is available.

Provenance

Source: huggingface
Collection Method: Processed speech data; specific gathering method unknown.
Freshness: Last updated in February 2026.

The primary data archive (voice_data.zip) is password-protected; users must contact the dataset maintainer for access.

Regionus Licenseapache 20

Related Datasets

Quality Score

D36

Description

39

Source

36

Reputation

41

Access

22

Community

13 downloads

3 likes

0 views

Dataset Info

Author: adjaysagar
Created: Feb 8, 2026
Updated: Feb 8, 2026
Last synced: Apr 8, 2026

Access

22

Community

13 downloads

3 likes

0 views

Dataset Info

Author: adjaysagar
Created: Feb 8, 2026
Updated: Feb 8, 2026
Last synced: Apr 8, 2026

700 Hours of Hindi English Hinglish TTS Audio

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info