A collection of short audio snippets extracted from publicly shared songs generated by the Suno AI model. All excerpts, ranging from 3 to 30 seconds, have been captioned using the Gemini Flash 2.0 model to produce human-readable audio descriptions. The dataset was created by laion and last updated on Hugging Face in November 2025.
Use Cases
- Train audio captioning models based on the described AI-generated music snippets.
- Evaluate the performance of audio description models based on the provided captions.
- Study the characteristics of AI-generated music based on the Suno-sourced audio content.
- Develop multimodal models linking audio and text based on the paired audio-caption structure.
Strengths
- Audio clips are specifically designed for training and evaluation tasks.
- Captions are generated by a high-quality AI model (Gemini Flash 2.0).
- Dataset is licensed under Apache 2.0, permitting commercial use.
Limitations
- Row count and total dataset size are unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- Data may reflect bias inherent to the source AI models (Suno and Gemini).
Provenance
- Source
- Clips are randomly cut from songs referenced in the nyuuzyou/suno repository on Hugging Face.
- Collection Method
- Audio snippets were extracted from Suno-generated songs and captioned using Gemini Flash 2.0.
- Time Range
- null
- Freshness
- Last updated 2025-11 08 21:36:55; freshness should be verified.
- Geography
- null