daVinci-Agency is a high-quality dataset for training agents on long-horizon software engineering tasks. The dataset was created by GAIR and last updated on February 4,我们发现 2026. It provides trajectories mined from real-world Pull Request chains to model the software evolution process.
Use Cases
- Training reinforcement learning agents for software engineering based on structured supervision from PR chains.
- Developing models for long-horizon task planning in code generation based on trajectories of software evolution.
- Benchmarking agent performance on realistic software development workflows based on mined PR data.
Strengths
- Designed specifically for long-horizon tasks, which suggests complex, multi-step scenarios.
- Provides high-quality trajectories mined from real-world Pull Request chains, indicating realistic data.
- Explicitly models the software evolution process through three interlocking mechanisms as described.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- GAIR
- Collection Method
- Mined from real-world Pull Request (PR) chains.
- Time Range
- null
- Freshness
- Last updated 2026-02-04 06:17:04; freshness should be verified.
- Geography
- null