Synthetic datasets for extracting emotion and emotion-deflection probes from large language models, built from the methodology described in Anthropic's 'Emotion Concepts and their Function in a Large Language Model' (Sofroniew et al., April 2026). The dataset includes files such as 'expression/stories.parquet' with 205,200 rows of emotional stories across 171 emotions and 100 topics, generated by models like Gemini 3.1 Pro Preview. It was authored by ryancodrai and last updated on 2026-04 08.
Use Cases
- Analyzing emotion concept representation in LLMs based on the described synthetic story methodology.
- Developing behavioral probes for emotion deflection based on the dataset's stated purpose.
- Training or evaluating models for emotion classification tasks based on the 171 emotion categories mentioned.
- Investigating the relationship between topics and emotional expression based on the 100 topics covered in the stories.
Strengths
- Includes a large subset of 205,200 rows for emotional stories.
- Covers a wide range of 171 distinct emotions and 100 topics.
- Built from a specific, documented methodology published by Anthropic in April 2026.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count for the full dataset is unknown, which may limit suitability assessment.
- Data is synthetic, generated by specific LLMs, which may not reflect natural human emotional expression.
Provenance
- Source
- huggingface
- Collection Method
- Synthetic generation by large language models (e.g., Gemini 3.1 Pro Preview) following a methodology from Anthropic research.
- Time Range
- Methodology published April 2026.
- Freshness
- Last updated 2026-04-08 11:50:35.
- Geography
- null