DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Harbor Adapter: Agent Execution Trajectories from Large-Scale Evaluation | DataSalon

Home Machine LearningHarbor Adapter: Agent Execution Trajectories from Large-Scale Evaluation

Machine Learning

Harbor Adapter: Agent Execution Trajectories from Large-Scale Evaluation

Name: Harbor Adapter: Agent Execution Trajectories from Large-Scale Evaluation
Creator: kendx
Published: 2026-06-06T07:10:34
Keywords: Machine Learning Benchmark, Ai Evaluation, Benchmark, Tabular, Large Scale, Agent Execution, Agent Trajectories

by kendx·Updated 25d ago

Available on 1 platform

Description

Agent execution trajectories from a large-scale agentic evaluation. Each trajectory captures a single (agent, model) attempt at a task, including step logs, tool calls, model outputs, and verifier scoring. The dataset was authored by kendx and last updated on 2026-06-08.

Use Cases

Benchmarking agent performance across different models based on verifier scoring.
Analyzing agent decision-making patterns based on full step logs.
Studying tool usage and environment interaction patterns based on tool calls and environment returns.
Training or fine-tuning agent models based on execution trajectories.

Strengths

Captures full execution trajectories end-to-end for each (agent, model, task) combination.
Includes verifier scoring for each trajectory, providing a performance metric.
Designed for filtering to specific slices of interest before pulling data.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Collected from a large-scale agentic evaluation.
Freshness: Last updated 2026-06-08 01:19:29; freshness should be verified.

License is unknown, which may restrict usage.

Tabular Machine Learning Benchmark Ai Evaluation Benchmark Large Scale Agent Execution Agent Trajectories

Related Datasets

Quality Score

D39

Description

Source

Reputation

Quality Score

D39

Description

Source

Reputation

Access

Community

1.1K downloads

1 likes

0 views

Dataset Info

Author: kendx
Created: Jun 6, 2026
Updated: Jun 8, 2026
Last synced: Jun 22, 2026

Access

Community

1.1K downloads

1 likes

0 views

Dataset Info

Author: kendx
Created: Jun 6, 2026
Updated: Jun 8, 2026
Last synced: Jun 22, 2026

Harbor Adapter: Agent Execution Trajectories from Large-Scale Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info