Sign in to view source links and access this dataset
Description
Tiny-Ko-Stories is a dataset of 2,003,542 original Korean short stories, created by author psymon and last updated on June 13, 2026. Inspired by the English TinyStories dataset, it was generated from scratch in Korean to test if small models can demonstrate reasoning and creativity with limited, high-quality data. The dataset includes Korean-specific elements like native names, sentence rhythm, onomatopoeia, and small event structures.
Use Cases
Training small Korean language models based on the dataset's high-quality, original Korean narratives.
Evaluating model reasoning and creativity in Korean based on the dataset's short story structure.
Studying Korean linguistic features like onomatopoeia and native sentence rhythm mentioned in the description.
Benchmarking the performance of compact models on Korean text generation tasks.
Strengths
Contains 2,003,542 records, providing substantial volume for model training.
Features original Korean stories with native linguistic elements, not translations.
Designed specifically to test the TinyStories hypothesis of small-model capability with limited data.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but other structural details like file size and specific format are unknown.
Freshness should be verified as the last update timestamp is in the future (2026).
Provenance
Source
huggingface user psymon
Collection Method
Stories were newly generated in Korean and underwent multiple review stages, not translated.
Freshness
Last updated 2026-06-13 17:17:28; freshness should be verified.
License is unknown and should be verified before use.