November 2025 collection of 60 short texts generated by four commercial LLMs (DeepSeek, Grok, Copilot, ChatGPT) in response to a minimal narrative prompt. The dataset was created by Juha Raipola, Maria Mäkelä, Samuli Björninen and Laura Piippo for a narratological study. Each model produced five responses per prompt, with all outputs retained without curation.
Use Cases
- Compare narrative defaults and stylistic tendencies across different LLMs based on the controlled prompt.
- Analyze the structure and coherence of AI-generated short stories based on the minimal scenario.
- Study the internal reasoning process of one model based on the included chain-of-thought monologue.
- Benchmark generative AI models on a constrained, zero-shot narrative task.
Strengths
- Controlled experimental design with a single prompt ('Someone makes a mistake') used across all models.
- Includes 60 complete outputs from four distinct LLM systems.
- Contains both reasoning text and final response for DeepSeek's chain-of-thought mode.
- Zero-shot prompting and minimal conditioning ensure comparable model behavior.
Limitations
- Row count is unknown, which may limit suitability assessment.
- Column-level documentation is absent; field semantics must be inferred after download.
- Dataset is small (113.1 KB), limiting statistical analysis.
Provenance
- Source
- figshare
- Collection Method
- Generated via zero-shot prompting of four commercial LLMs in November 2025.
- Time Range
- November 2025
- Freshness
- Last updated 2026-04-27 07:33:15; freshness should be verified.