Name: SRA-Bench: A Benchmark for Skill-Retrieval-Augmented LLM Agents
Creator: WeihangSu
Published: 2026-04-22T10:22:04
Keywords: Llm Benchmark, Agent Testing, Benchmark, Text, Skill Retrieval, Reasoning Evaluation

Description

SRA-Bench is a benchmark dataset for evaluating skill-retrieval-augmented large language model agents, created by WeihangSu and last updated on April 22, 2026. It contains 5,400 test instances and a skill library of 26,262 skills, of which 636 are gold skills and 25,626 are web-collected distractors. The dataset includes sub-benchmarks like TheoremQA and LogicBench for specific reasoning tasks.

Use Cases

Benchmarking agent performance on theorem application tasks based on the TheoremQA sub-benchmark.
Evaluating logical reasoning patterns in agents based on the LogicBench sub-benchmark.
Testing skill retrieval accuracy in a noisy environment based on the library containing gold skills and distractors.
Developing and comparing skill-retrieval augmentation methods for LLMs based on the provided test instances.

Strengths

Provides 5,400 test instances for agent evaluation.
Includes a skill library of 26,262 skills, with a clear distinction of 636 gold skills.
Contains structured sub-benchmarks targeting specific capability types like theorem application and logical reasoning.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full dataset is unknown, which may limit suitability assessment.
The description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: WeihangSu on Hugging Face, with associated code at github.com/oneal2000/SR-Agents.
Collection Method: Likely constructed for research, with skills embedded in a library containing both gold skills and web-collected distractors.
Time Range: null
Freshness: Last updated 2026-04-22 10:28:53; freshness should be verified.
Geography: null

License is unknown; users should verify licensing terms before use.

Text Llm Benchmark Agent Testing Benchmark Skill Retrieval Reasoning Evaluation

SRA-Bench: A Benchmark for Skill-Retrieval-Augmented LLM Agents

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info