Open STT: Open Speech-to-Text Dataset

Name: Open STT: Open Speech-to-Text Dataset
Creator: snakers4
Published: 2019-04-11T08:26:17
Keywords: Speech To Text, Russian, STT, Speech Recognition, Automatic Speech Recognition

by snakers4Updated 4y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

20,000+ hours of Russian speech audio paired with text transcriptions across domains like YouTube, audiobooks, and radio. The collection includes over 2 million utterances categorized by source and acoustic conditions.

Use Cases

Train acoustic models for Russian speech recognition using the audio files and corresponding text labels
Develop noise-tolerant speech systems by leveraging the variety of recording conditions and source types
Evaluate speech-to-text performance across different domains like audiobooks or radio broadcasts

Strengths

20,000+ hours of audio data provided in WAV or MP3 formats
Includes metadata mapping audio segments to text transcriptions and source categories
Covers diverse acoustic environments including studio recordings, phone calls, and noisy public spaces

Russian Speech To Text STT Speech Recognition Automatic Speech Recognition

Related Datasets

Quality Score

D19

Description

15

Source

19

Reputation

20

Access

27

Community

821 likes

0 views

Dataset Info

Author: snakers4
Created: Apr 11, 2019
Updated: Mar 11, 2022
Language: Python
Last synced: Jun 14, 2026

Access

27

Community

821 likes

0 views

Dataset Info

Author: snakers4
Created: Apr 11, 2019
Updated: Mar 11, 2022
Language: Python
Last synced: Jun 14, 2026

Open STT: Open Speech-to-Text Dataset

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info