Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Hakka-language audio recordings and transcriptions form a pre-training dataset for the Taiwan-Tongues-ASR-CE project. The dataset is packaged in WebDataset format for direct use with PyTorch and Hugging Face libraries. It was created by the adi-gov-tw organization and last updated in December 2025.
Data is stored in WebDataset tar files; users must be familiar with this format or the corresponding libraries. License information is unavailable.