Aaronrose227 provides activation and scenario data for research on detecting multi-agent collusion through interpretability. The dataset includes activations from four large language models across specified layers, as a companion to a GitHub repository. The dataset was last updated on May 12, 2026.
Use Cases
- Analyzing model activation patterns for collusion detection based on the described scenario data.
- Benchmarking interpretability methods across different LLMs based on the listed model layers.
- Training classifiers to identify collusive behavior based on the multi-agent scenario context.
Strengths
- Includes activations from four distinct large language models: Qwen3-32B-AWQ, Llama-3.1-70B-Instruct-AWQ-INT4, DeepSeek-R1-Distill-Qwen-32B, and GPT-OSS-20B.
- Specifies the exact layers analyzed for each model, ranging from 10 to 37 layers.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- huggingface
- Freshness
- Last updated 2026-05-12 17:00:18; freshness should be verified.