Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Tajik ASR Corpus v0 is a deduplicated collection for automatic speech recognition assembled from multiple sources. The dataset, created by Peacockery, includes data from FLEURS-derived speech, Mozilla Common Voice 25 Tajik, and augmented data from Muhtasham Tajik ASR. Each data split is provided in TSV format with an audio directory, and a SQLite version includes additional normalized fields.
License is unknown, which is a critical restriction to clarify before use.