Sign in to view source links and access this dataset
Description
A large-scale labeled music dataset built on the Creative-Commons subset of the Free Music Archive. Every track has been automatically annotated with attributes like lyrics, genre, mood, instruments, tempo, and key using Google Gemini (gemini-flash-latest). It was submitted to the Uncharted Data Challenge hosted by Adaption Labs and last updated on 2026-04-15.
Use Cases
Train music genre classifiers based on the provided genre labels.
Develop mood detection models based on the mood annotations.
Analyze the relationship between lyrical content and musical attributes like tempo and key.
Build multi-task learning models for joint prediction of instruments, tempo, and key.
Strengths
Large-scale dataset built on a known Creative-Commons music source (Free Music Archive).
Automatically annotated with multiple attributes (lyrics, genre, mood, instruments, tempo, key) using a specific model (Google Gemini).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset size are unknown, which may limit suitability assessment.
Data may reflect bias inherent to the source archive and automated labeling process.
Provenance
Source
Creative-Commons subset of the Free Music Archive (FMA).
Collection Method
Automated annotation using Google Gemini (gemini-flash-latest).
Freshness
Last updated 2026-04-15 13:41:48; freshness should be verified.
License is unknown; users must verify the license terms on the dataset page before use.