DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Genesis AI Code 100K: Frontier Dataset for Agentic AI Development | DataSalon

Home Government & LegalGenesis AI Code 100K: Frontier Dataset for Agentic AI Development

Government & Legal

Genesis AI Code 100K: Frontier Dataset for Agentic AI Development

Name: Genesis AI Code 100K: Frontier Dataset for Agentic AI Development
Creator: WithinUsAI
Published: 2026-01-02T04:29:52
Keywords: Self Grading, Text, Audit Governance, Ai Code Generation, Agentic Loops, Tool Call Traces

by WithinUsAI·Updated 6mo ago

Available on 1 platform

Description

Within Us AI developed the Genesis AI Code 100K dataset, a frontier collection for AI code generation. It contains 100,000 examples split into 98,000 training and 2,000 validation records. The dataset was last updated on January 2, 2026.

Use Cases

Training AI agents for code generation based on the described agentic loops (plan→edit→test→reflect)
Implementing self-grading mechanisms for AI-generated code based on the 'tests-as-truth supervision patterns'
Developing audit-aware AI systems based on the dataset's governance and policy-gate orientation
Supervising AI tool-call execution based on the 'tool-call trace supervision' feature

Strengths

Dataset size is explicitly stated as 100,000 total examples
Clear split sizes: 98,000 training and 2,000 validation records
Includes frontier features like tool-call traces and self-grading as described

Limitations

Column-level documentation is absent; field semantics must be inferred after download
Row count is known, but specific data format and structure details are unavailable
Description metadata is limited; actual data quality requires manual inspection after download

Provenance

Source: WithinUsAI
Freshness: Last updated 2026-01-02 04:38:21; freshness should be verified

Text Self Grading Audit Governance Ai Code Generation Agentic Loops Tool Call Traces

Related Datasets

Quality Score

D40

Description

Source

Reputation

Quality Score

D40

Description

Source

Reputation

Access

Community

144 downloads

3 likes

0 views

Dataset Info

Author: WithinUsAI
Created: Jan 2, 2026
Updated: Jan 2, 2026
Last synced: Apr 24, 2026

Access

Community

144 downloads

3 likes

0 views

Dataset Info

Author: WithinUsAI
Created: Jan 2, 2026
Updated: Jan 2, 2026
Last synced: Apr 24, 2026

Genesis AI Code 100K: Frontier Dataset for Agentic AI Development

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info