Sign in to view source links and access this dataset
Description
AFTER is a benchmark for studying skill evolution in agentic frameworks, measuring their ability to revise, specialize, and reuse skill instructions. The test split contains 129 tasks, with the full dataset scheduled for future release. It was created by DavydenkoGr and last updated on June 19, 2026.
Use Cases
Benchmarking agent skill improvement based on the described ability to revise and reuse skill instructions.
Evaluating skill transfer across roles and tasks as described in the dataset's purpose.
Studying agent performance in different execution contexts as mentioned in the abstract.
Strengths
The test split contains 129 tasks available for immediate use.
The dataset is designed for a specific, measurable research goal: evaluating skill evolution frameworks.
Limitations
The full dataset is not yet available, limiting current scope.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and data scale are unknown, which may limit suitability assessment.
Provenance
Source
DavydenkoGr on Hugging Face.
Freshness
Last updated 2026-06-19 11:24:43; freshness should be verified.
License is unknown; users should verify terms before use.