Sign in to view source links and access this dataset
Description
CodePit released OnchainPlanBench Seed, an early dataset for evaluating small open-weight models. The dataset tests a model's ability to critique, repair, reject, or approve Web3 AI-agent action plans before wallet execution. It was last updated on June 2, 2026.
Use Cases
Benchmarking model performance on plan critique tasks based on provided user intent and wallet context.
Training models to approve or reject action plans based on provided risk and privacy policies.
Developing tools for plan repair based on the available tools and policy constraints described in each row.
Strengths
Public seed release for the first official CodePit model track, CodePit PlanGuard.
Designed to test multiple safety functions: critique, repair, rejection, and approval.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
CodePit
Freshness
Last updated 2026-06-02 12:16:34; freshness should be verified.
License is unknown; terms of use must be verified before application.