Turkish fan-translated light novels from the Baka-Tsuki project and other recoverable sources. The dataset contains 4 series, 58 chapters, and 31,924 line-level records, totaling 1,370,532 characters and 187,360 words. It was created by soundstarrain and last updated on Hugging Face in March 2026.
Use Cases
- Train Turkish language models based on the 31,924 line-level records.
- Analyze fan-translation style and vocabulary based on the 187,360 words.
- Benchmark text tokenization methods using the provided 481,515 token count.
- Study narrative structure in light novels based on the hierarchical series/volume/chapter organization.
Strengths
- Contains 31,924 line-level records, providing a substantial text corpus.
- Includes character (1,370,532), word (187,360), and token (481,515) counts for detailed analysis.
- Organized into a hierarchical structure of 4 series and 58 chapters.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Data may reflect source bias inherent to fan-translated content from specific projects.
Provenance
- Source
- Baka-Tsuki Turkish project pages and other recoverable Turkish fan-translation sources.
- Collection Method
- Built and cleaned from linked fan-translation sources.
- Time Range
- null
- Freshness
- Last updated 2026-03-22 16:16:22; freshness should be verified.
- Geography
- null