Name: Tiny-Ko-Stories: 2 Million Original Korean Short Stories for Language Model Training
Creator: psymon
Published: 2026-06-13T11:30:09
Keywords: Tiny Stories, Korean Language, Text, Story Generation, Natural Language Processing, Text Corpus

Description

Tiny-Ko-Stories is a dataset of 2,003,542 original Korean short stories, created by author psymon and last updated on June 13, 2026. Inspired by the English TinyStories dataset, it was generated from scratch in Korean to test if small models can demonstrate reasoning and creativity with limited, high-quality data. The dataset includes Korean-specific elements like native names, sentence rhythm, onomatopoeia, and small event structures.

Use Cases

Training small Korean language models based on the dataset's high-quality, original Korean narratives.
Evaluating model reasoning and creativity in Korean based on the dataset's short story structure.
Studying Korean linguistic features like onomatopoeia and native sentence rhythm mentioned in the description.
Benchmarking the performance of compact models on Korean text generation tasks.

Strengths

Contains 2,003,542 records, providing substantial volume for model training.
Features original Korean stories with native linguistic elements, not translations.
Designed specifically to test the TinyStories hypothesis of small-model capability with limited data.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but other structural details like file size and specific format are unknown.
Freshness should be verified as the last update timestamp is in the future (2026).

Provenance

Source: huggingface user psymon
Collection Method: Stories were newly generated in Korean and underwent multiple review stages, not translated.
Freshness: Last updated 2026-06-13 17:17:28; freshness should be verified.

License is unknown and should be verified before use.

Text Tiny Stories Korean Language Story Generation Natural Language Processing Text Corpus

Tiny-Ko-Stories: 2 Million Original Korean Short Stories for Language Model Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info