Sign in to view source links and access this dataset
Description
ESpeech's Espeech Podcasts dataset contains 3,200 hours of processed audio segments extracted from various podcasts. The audio is in Russian, processed at a 44.1kHz sample rate, and is structured as segmented audio files with JSON metadata. The dataset was last updated on November 25, 2025.
Use Cases
Train text-to-speech (TTS) models based on the 3,200 hours of Russian speech audio.
Develop automatic speech recognition (ASR) systems based on the segmented podcast audio.
Conduct audio quality assessment research based on the processed 44.1kHz audio samples.
Strengths
Contains 3,200 hours of Russian speech audio, providing substantial volume for model training.
Audio is processed at a consistent 44.1kHz sample rate, which is a standard for high-quality audio.
Includes JSON metadata for each audio segment, suggesting structured information beyond raw audio.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect source bias inherent to the specific podcasts used.
Provenance
Source
ESpeech
Collection Method
Processed audio segments extracted from various podcasts.
Time Range
null
Freshness
Last updated 2025-11-25 11:16:26; freshness should be verified.
Geography
null
License is unknown; users must verify licensing terms before use.