UA_Corpus_labels_ASR: Ukrainian Speech Recognition Corpus
Available on 1 platform
Sign in to view source links and access this dataset
Description
A corpus of Ukrainian audio data with associated labels, likely for automatic speech recognition (ASR) tasks. The dataset is hosted on Kaggle, but its specific size, origin, and creation date are not provided in the available metadata. Columns suggest it contains audio files and corresponding text transcriptions.
Use Cases
Train an automatic speech recognition (ASR) model for Ukrainian (inferred from domain, verify after download)
Fine-tune a pre-trained multilingual ASR model on Ukrainian audio (inferred from domain, verify after download)
Benchmark speech-to-text accuracy for Ukrainian language models (inferred from domain, verify after download)
Strengths
Published on Kaggle, a major platform for sharing machine learning datasets.
Limitations
Metadata is minimal; actual content requires verification after download.
Row count, file size, and column definitions are unknown, limiting suitability assessment.
License, author, and last update information are unavailable.
Provenance
Source
Kaggle
Collection Method
Method of data gathering is unknown.
Time Range
Temporal coverage is unknown.
Freshness
Last updated date is unknown; freshness unverified.
Geography
Spatial coverage is unknown, but the Ukrainian language suggests a primary focus on Ukraine.
License restrictions are unknown; users must verify before commercial use.