DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

EvoCode-Bench: 26 Multi-Turn Coding Tasks for Agent Evaluation | DataSalon

Home Machine LearningEvoCode-Bench: 26 Multi-Turn Coding Tasks for Agent Evaluation

Machine Learning

EvoCode-Bench: 26 Multi-Turn Coding Tasks for Agent Evaluation

Name: EvoCode-Bench: 26 Multi-Turn Coding Tasks for Agent Evaluation
Creator: UnipatAI
Published: 2026-05-27T18:54:17
Keywords: Software Engineering, Benchmark, Text, Code Generation, Multi Turn Interaction

by UnipatAI·Updated 9d ago

Available on 1 platform

Description

EvoCode-Bench contains 26 executable tasks with 227 total rounds for evaluating coding agents in persistent software engineering interactions. The dataset, created by UnipatAI, uses the Harbor multi-step task format and includes workspaces, task metadata, and verification assets. It was last updated on June 20, 2026.

Use Cases

Benchmarking coding agents on multi-turn software tasks based on the Harbor task format.
Evaluating agent performance across 227 rounds of interaction described in the dataset.
Testing AI systems on executable verification assets included with each task.

Strengths

Contains 26 distinct executable tasks for structured evaluation.
Provides 227 total rounds of interaction for multi-turn analysis.
Includes task metadata, round-level instructions, and verification assets as described.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Last updated 2026-06-20 03:55:59; freshness should be verified.

Provenance

Source: UnipatAI via Hugging Face.
Collection Method: Likely constructed as a benchmark for evaluating coding agents.
Freshness: Last updated 2026-06-20 03:55:59.

License is unknown; terms of use must be verified.

Text Software Engineering Benchmark Code Generation Multi Turn Interaction

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

227 downloads

1 likes

0 views

Dataset Info

Author: UnipatAI
Created: May 27, 2026
Updated: Jun 20, 2026
Last synced: Jun 29, 2026

Access

Community

227 downloads

1 likes

0 views

Dataset Info

Author: UnipatAI
Created: May 27, 2026
Updated: Jun 20, 2026
Last synced: Jun 29, 2026

EvoCode-Bench: 26 Multi-Turn Coding Tasks for Agent Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info