BASS is a benchmark dataset for evaluating music understanding and reasoning in audio language models. It comprises 2,658 questions across 12 tasks and 4 categories, covering 1,993 unique songs and over 138 hours of music. The dataset was created by author 'oreva' and last updated on 2026-04-08.
Use Cases
- Benchmark model performance on music structure reasoning based on the 12 defined tasks.
- Evaluate semantic reasoning capabilities on music based on the dataset's question categories.
- Train or fine-tune models for music information retrieval based on the 1,993 unique songs.
- Analyze model generalization across different musical genres and structures based on the over 138 hours of audio.
Strengths
- Contains 2,658 structured questions for systematic evaluation.
- Covers a diverse set of 1,993 unique songs.
- Includes over 138 hours of music audio for analysis.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- huggingface
- Collection Method
- Likely compiled for benchmarking purposes; specific gathering method is unknown.
- Time Range
- null
- Freshness
- Last updated 2026-04 08 07:16:10; freshness should be verified.
- Geography
- null