BazaarBench rollouts contain append-only SQLite event logs from a multi-agent C2C marketplace simulator. The dataset includes three escalating evaluation levels: L1 with 30-day persistent rollouts and 100 LLM agents per backbone, L2 with pressure-induced emergence, and L3. The dataset was created by BazaarBench and last updated on 2026-05-06.
Use Cases
- Analyzing spontaneous fraud-chain emergence based on 30-day persistent marketplace dynamics.
- Studying pressure-induced agent behavior changes based on adversarial prompts and agent swaps.
- Benchmarking LLM agent performance in simulated C2C marketplaces across different evaluation levels.
Strengths
- Includes three distinct evaluation levels (L1, L2, L3) with defined experimental conditions.
- L1 level contains data from 100 LLM agents per backbone over a 30-day persistent rollout.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count and overall data scale are unknown, which may limit suitability assessment.
Provenance
- Source
- BazaarBench
- Collection Method
- Generated from the BazaarBench multi-agent C2C marketplace simulator.
- Time Range
- Simulated timeframes include 30-day and 7-day continuations.
- Freshness
- Last updated 2026-05-06 20:13:19; freshness should be verified.