Sign in to view source links and access this dataset
Description
Agent trajectories from PostTrainBench, a benchmark measuring CLI agents' ability to post-train pre-trained LLMs. The dataset was created by aisa-group and last updated on March 16, 2026. Each agent is given a base LLM, an evaluation script, and 10 hours on an NVIDIA H100 80GB GPU to autonomously improve model performance.
Use Cases
Analyze agent decision-making patterns based on the described fine-tuning task and resource constraints.
Benchmark autonomous CLI agent performance based on the described PostTrainBench evaluation framework.
Study the application of post-training strategies like SFT, LoRA, or RLHF based on the agent's autonomous choices.
Evaluate the efficiency of GPU resource utilization within the described 10-hour time limit.
Strengths
Focuses on a specific, emerging research area: autonomous CLI agents for LLM fine-tuning.
Benchmark context provides a structured evaluation framework for the agent trajectories.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
Source
aisa-group via Hugging Face, originating from the PostTrainBench project on GitHub.
Collection Method
Collected from benchmark runs where autonomous CLI agents performed LLM fine-tuning tasks.
Time Range
null
Freshness
Last updated 2026-03-16 23:26:51; freshness should be verified.
Geography
null
License is unknown; terms of use must be verified before application.