Name: Turkish Light Novel Corpus with 58 Chapters and 31,924 Lines
Creator: soundstarrain
Published: 2026-03-22T15:08:44
Keywords: Size Categories10 Kn100 K, Licenseother, Librarypolars, Fan Translation, Literature, Curated, Modalitytext, Modalitytabular, Librarymlcroissant, Librarydatasets, Librarypandas, Light Novel, Text, Parquet, Languagetr, Regionus, Natural Language Processing, Turkish, Restricted, Text Corpus

Description

Turkish fan-translated light novels from the Baka-Tsuki project and other recoverable sources. The dataset contains 4 series, 58 chapters, and 31,924 line-level records, totaling 1,370,532 characters and 187,360 words. It was created by soundstarrain and last updated on Hugging Face in March 2026.

Use Cases

Train Turkish language models based on the 31,924 line-level records.
Analyze fan-translation style and vocabulary based on the 187,360 words.
Benchmark text tokenization methods using the provided 481,515 token count.
Study narrative structure in light novels based on the hierarchical series/volume/chapter organization.

Strengths

Contains 31,924 line-level records, providing a substantial text corpus.
Includes character (1,370,532), word (187,360), and token (481,515) counts for detailed analysis.
Organized into a hierarchical structure of 4 series and 58 chapters.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect source bias inherent to fan-translated content from specific projects.

Provenance

Source: Baka-Tsuki Turkish project pages and other recoverable Turkish fan-translation sources.
Collection Method: Built and cleaned from linked fan-translation sources.
Time Range: null
Freshness: Last updated 2026-03-22 16:16:22; freshness should be verified.
Geography: null

null

Turkish Light Novel Corpus with 58 Chapters and 31,924 Lines

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info