Within Us AI developed the Genesis AI Code 100K dataset, a frontier collection for AI code generation. It contains 100,000 examples split into 98,000 training and 2,000 validation records. The dataset was last updated on January 2, 2026.
Use Cases
- Training AI agents for code generation based on the described agentic loops (plan→edit→test→reflect)
- Implementing self-grading mechanisms for AI-generated code based on the 'tests-as-truth supervision patterns'
- Developing audit-aware AI systems based on the dataset's governance and policy-gate orientation
- Supervising AI tool-call execution based on the 'tool-call trace supervision' feature
Strengths
- Dataset size is explicitly stated as 100,000 total examples
- Clear split sizes: 98,000 training and 2,000 validation records
- Includes frontier features like tool-call traces and self-grading as described
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is known, but specific data format and structure details are unavailable
- Description metadata is limited; actual data quality requires manual inspection after download
Provenance
- Source
- WithinUsAI
- Freshness
- Last updated 2026-01-02 04:38:21; freshness should be verified