Gujarati Podcast ASR Dataset: 2,471 Hours of Processed Audio

Name: Gujarati Podcast ASR Dataset: 2,471 Hours of Processed Audio
Creator: InfoBayAI
Published: 2026-06-01T10:38:13
Keywords: Multilingual Speech, Audio, Large Scale, Gujarati Language, Podcast Audio, Speech Recognition

by InfoBayAIUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

InfoBayAI's Gujarati Podcast ASR Dataset is a large-scale collection of 2,471 hours of processed Gujarati podcast audio recordings. The broader collection contains 57,568 hours of processed audio across 12 languages, capturing real-world interactions across diverse topics and formats. The dataset was last updated on June 2, 2026.

Use Cases

Train automatic speech recognition (ASR) models based on the described large volume of Gujarati audio.
Develop conversational AI systems based on the real-world podcast interactions mentioned in the description.
Study natural speech patterns and speaker variability based on the authentic podcast environments described.
Build multilingual speech models based on the dataset's inclusion of 12 languages.

Strengths

Contains 2,471 hours of processed Gujarati podcast audio.
The broader collection includes 57,568 hours of audio across 12 languages.
Captures real-world interactions and natural speech patterns as described.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and specific file formats are unknown, which may limit suitability assessment.
Data may reflect geographic or topical bias inherent to the source podcast platforms.

Provenance

Source: InfoBayAI
Collection Method: Processed from podcast audio recordings, likely sourced from various platforms.
Freshness: Last updated 2026-06-02 06:25:47; freshness should be verified.

License is unknown; terms of use must be verified before application.

Audio Multilingual Speech Large Scale Gujarati Language Podcast Audio Speech Recognition

Related Datasets

Quality Score

C41

Description

51

Source

36

Reputation

41

Access

26

Community

55 downloads

1 likes

0 views

Dataset Info

Author: InfoBayAI
Created: Jun 1, 2026
Updated: Jun 2, 2026
Last synced: Jun 4, 2026

Access

26

Community

55 downloads

1 likes

0 views

Dataset Info

Author: InfoBayAI
Created: Jun 1, 2026
Updated: Jun 2, 2026
Last synced: Jun 4, 2026

Gujarati Podcast ASR Dataset: 2,471 Hours of Processed Audio

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info