TeamBench is a benchmark designed to evaluate whether LLM-based agent teams outperform single oracle agents on realistic software engineering tasks. It includes five ablation conditions for fine-grained measurement, such as oracle, restricted, full team, team without planning, and team without verification. The benchmark was created by ybkim95 and last updated on March 30, 2026.
Use Cases
- Benchmarking LLM-based agent teams against single agents based on the described ablation conditions.
- Measuring the Teamwork Necessity Index (TNI) to quantify when teamwork helps.
- Analyzing the impact of planning and verification components on team performance.
- Evaluating agent performance on realistic software engineering tasks.
Strengths
- Benchmark includes five distinct ablation conditions for detailed performance analysis.
- Focuses on realistic software engineering tasks, providing a practical evaluation context.
- Last updated on March 30, 2026, indicating recent maintenance.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- huggingface
- Freshness
- Last updated 2026-03-30 15:59:54