Sign in to view source links and access this dataset
Description
A combined dataset from the ATCO2-ASR and ATCOSIM collections, likely containing air traffic control speech audio. The dataset was created by author jlvdoorn and last updated on July 7, 2023. It is split into 80% training and 20% validation partitions, with some files containing additional metadata.
Use Cases
Train automatic speech recognition models based on air traffic control communication audio.
Validate ASR model performance on a held-out validation set based on the described 80/20 split.
Analyze specialized vocabulary and speech patterns in aviation contexts based on the described domain.
Develop noise-robust speech processing techniques based on the likely real-world radio transmission environment.
Strengths
Provides a defined training and validation split (80%/20%), which is useful for machine learning.
Combines two established sources (ATCO2-ASR and ATCOSIM), potentially increasing data diversity.
Some files include supplementary metadata in an 'info' file, which may add context.
Limitations
Description metadata is limited; actual data quality, size, and column structure require manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and total size are unknown, which may limit suitability assessment for large-scale projects.
Provenance
Source
Combination of ATCO2-ASR and ATCOSIM datasets.
Collection Method
Files were selected randomly to create the 80/20 train/validation split.
Time Range
null
Freshness
Last updated 2023-07-07 07:06:05; freshness should be verified.
Geography
null
License is unknown; users must verify permissions before use.