Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Comprising between 1,000 and 10,000 audio-transcript pairs for Air Traffic Control speech recognition, compiled by user jacktol in 2025. It merges the UWB ATC Corpus and the ATCO2 1-Hour Test Subset into a fine-tuning-ready format. The records consist of cleanly segmented 16kHz .wav files paired with text utterances.
The dataset is provided in Parquet format and is compatible with the Hugging Face datasets library, Polars, and Dask.