Bandit Performance Arm Chosen Per Episode

Available on 1 platform

Sign in to view source links and access this dataset

Description

Reinforcement learning bandit agents learn arm rewards over 120-step episodes. The dataset likely contains records of agent actions and outcomes per episode. Its origin and scale are unspecified.

Use Cases

Analyze agent learning trajectories based on the 120-step episode structure
Compare arm selection strategies across multiple agents
Study emergent cooperation or competition based on the 'tragedy of the commons' theme

Strengths

Episode length is defined as 120 steps, providing a consistent unit of analysis
Focuses on a specific, well-known problem in multi-agent RL ('tragedy of the commons')

Limitations

Row count is unknown, which may limit suitability assessment
Column-level documentation is absent; field semantics must be inferred after download

Provenance

Collection Method: Likely generated from a reinforcement learning simulation.

Tabular Agent Performance Multi Armed Bandit Reinforcement Learning Tragedy of the commons

Related Datasets

Quality Score

D19

Description

18

Source

17

Reputation

18

Access

31

Community

0 views

Dataset Info

Last synced: May 31, 2026

Access

31

Community

0 views

Dataset Info

Last synced: May 31, 2026

Bandit Performance Arm Chosen Per Episode

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info