Agent execution traces from the EvoClaw benchmark, covering two trial modes: end-to-end and per-milestone. The data is organized by repository and agent, containing logs and metadata files. It was created by EvoClaw-Bench and last updated on May 7, 2026.
Use Cases
- Analyzing agent performance and failure modes based on execution logs.
- Comparing end-to-end versus milestone-based trial strategies mentioned in the description.
- Studying agent behavior across different open-source repositories.
- Benchmarking new AI agents against the EvoClaw benchmark traces.
Strengths
- Covers two distinct trial modes: end-to-end and per-milestone.
- Includes data from 7 different open-source repositories.
- Contains structured metadata files like agent_stats.json and trial_metadata.json.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Collection Method
- Agent execution traces collected from the EvoClaw benchmark.
- Freshness
- Last updated 2026-05-07 18:42:45; freshness should be verified.