600 samples of Telugu poems sourced from publicly available online archives. Each sample contains a Telugu poem paired with its Telugu prose translation and English prose translation. The dataset was created by TeluguLLMResearch and last updated in June 2026.
Use Cases
- Training machine translation models for intra-lingual (Telugu-to-Telugu) poetry-to-prose conversion.
- Training machine translation models for cross-lingual (Telugu-to-English) poetry-to-prose conversion.
- Analyzing linguistic and stylistic differences between classical poetic and modern prose forms.
- Benchmarking NLP models on the translation of historical and culturally specific texts.
Strengths
- 600 curated samples provide a substantial foundation for model training.
- Each sample includes three aligned texts (original poem, Telugu prose, English prose), enabling multi-task learning.
- Poems are sourced from a specific historical period (13th-17th century), offering a focused temporal scope.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is known (600), but the total size, file formats, and specific column names are unknown.
- Data may reflect temporal and source bias inherent to the selected online archives.
Provenance
- Source
- Publicly available online archives.
- Collection Method
- Poems were sourced from archives.
- Time Range
- 13th-17th Century.
- Freshness
- Last updated 2026-06-03 08:47:31; freshness should be verified.