Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
UTS provides a unified tag vocabulary bridging speech, music, and environmental sounds derived from high-fidelity audio captions. It was created by AudenAI using Qwen3-Omni-Captioner and Qwen2.5-7B-Instruct models on a subset of CaptionStew. The dataset was last updated on March 11, 2026.
License information is unknown and should be verified before use. The full description is hosted externally on the Hugging Face dataset page.