Fattah Golden Superset is a large-scale supervised fine-tuning dataset built by Nomeda Labs for training the Fattah family of coding and agentic coding models. The dataset is described as a labeled superset with no baked-in training ratios, allowing researchers to filter on capability columns to create custom mixtures. The dataset was last updated on June 1, 2026.
Use Cases
- Training code generation models based on the labeled corpus of coding examples.
- Fine-tuning agentic coding models based on the annotated capability columns.
- Creating custom training mixtures for specific coding capabilities based on the boolean filter columns.
Strengths
- Dataset is described as a large-scale corpus.
- Dataset is designed as a model-agnostic superset, allowing flexible training mixtures.
- Dataset is described as a complete cleaned and annotated corpus.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- Nomeda Labs
- Collection Method
- Built as a labeled superset for supervised fine-tuning.
- Freshness
- Last updated 2026-06-01 23:21:05; freshness should be verified.