Name: Unified Tag System For Speech Music And Environmental Sounds
Creator: AudenAI
Published: 2026-03-09T08:40:15
Keywords: Librarypolars, Speech Music Sounds, Modalityaudio, Languageen, Audio Captions, Modalitytext, Size Categories100 Kn1 M, Librarymlcroissant, Audio Tagging, Task Categoriesaudio Classification, Librarydatasets, Librarypandas, Unified Labeling, Audio, Regionus, Large Scale, Unified Tag System, Arxiv251116757, JSON, Audio Captioning, Licensemit, Synthetic, Multimodal

Description

UTS provides a unified tag vocabulary bridging speech, music, and environmental sounds derived from high-fidelity audio captions. It was created by AudenAI using Qwen3-Omni-Captioner and Qwen2.5-7B-Instruct models on a subset of CaptionStew. The dataset was last updated on March 11, 2026.

Use Cases

Training audio classification models on the unified tag vocabulary spanning speech, music, and environmental sounds.
Developing multi-modal models using the link between audio samples and their structured text captions.
Benchmarking model performance on a large-scale, data-driven label system derived from 400K audio caption subset.
Pre-training audio foundation models on a dataset that integrates multiple audio domains under a single taxonomy.

Strengths

Derived from a 400,000-sample subset of the CaptionStew dataset.
Unified label vocabulary created by parsing high-fidelity captions with the Qwen2.5-7B-Instruct model.

Limitations

Specific row count, column details, and file formats are unknown.
Geographic and temporal coverage of the source audio data is unspecified.
Potential label noise inherited from the automated caption parsing process.

Provenance

Source: CaptionStew 400K-subset, processed by AudenAI.
Collection Method: Audio captions generated by Qwen3-Omni-Captioner, parsed into structured tags using Qwen2.5-7B-Instruct.
Freshness: Last updated March 11, 2026.

License information is unknown and should be verified before use. The full description is hosted externally on the Hugging Face dataset page.

Unified Tag System For Speech Music And Environmental Sounds

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info