Reinforcement learning bandit agents learn arm rewards over 120-step episodes. The dataset likely contains records of agent actions and outcomes per episode. Its origin and scale are unspecified.
Use Cases
- Analyze agent learning trajectories based on the 120-step episode structure
- Compare arm selection strategies across multiple agents
- Study emergent cooperation or competition based on the 'tragedy of the commons' theme
Strengths
- Episode length is defined as 120 steps, providing a consistent unit of analysis
- Focuses on a specific, well-known problem in multi-agent RL ('tragedy of the commons')
Limitations
- Row count is unknown, which may limit suitability assessment
- Column-level documentation is absent; field semantics must be inferred after download
Provenance
- Collection Method
- Likely generated from a reinforcement learning simulation.