Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Offering audio and text data for pre-training Automatic Speech Recognition models, specifically for Taiwanese Mandarin. It is structured into training and test subsets using the WebDataset format for direct integration with PyTorch and Hugging Face tools. The dataset is published by the author 'adi-gov-tw' and was last updated in December 2025.
Data is stored in WebDataset tar files; users must be familiar with this format or the associated libraries to load it. The full description is hosted externally on Hugging Face.