Sign in to view source links and access this dataset
Description
Proactive Agent Dataset is a collection for training and evaluating proactive IDE assistants. It contains 10,489 synthetic samples from a gym simulation and 4,346 real IDE sessions split into training, validation, and test sets. The dataset was created by author jiarx and last updated on April 20, 2026.
Use Cases
Training proactive agent models based on synthetic and real IDE session data.
Evaluating assistant performance on a held-out test split of real user sessions.
Comparing model behavior between synthetic simulation data and real-world interaction data.
Analyzing agent triggers and actions based on the session and operation identifiers mentioned.
Strengths
Includes a substantial synthetic training set of 10,489 samples.
Provides a real-world evaluation benchmark with 759 test samples and 1,111 validation samples.
Clearly delineates data splits by source (synthetic vs. real) and purpose (train, test, val).
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full dataset is unknown, which may limit suitability assessment.
Data may reflect source bias inherent to the specific IDE sessions collected.
Provenance
Source
huggingface
Collection Method
Combines synthetic data from a gym simulation and real IDE sessions.
Time Range
null
Freshness
Last updated 2026-04-20 11:59:36; freshness should be verified.
Geography
null
License is unknown; restrictions should be verified before use.