Sign in to view source links and access this dataset
Description
Seven volumes of Evliya Celebi's Seyahatname travelogue, processed via OCR and packaged for corpus exploration. The dataset was created by fatihburakkaragoz and last updated on 2026-05-12. It provides text from volumes 1, 3, 4, 6, 7, 9, and 10, with volumes 2, 5, and 8 missing from the sequence.
Use Cases
Corpus exploration based on the provided OCR-derived text volumes.
Language modeling based on historical Ottoman Turkish text.
OCR-quality analysis based on the page-level OCR status information.
Historical text analysis based on the content of the Seyahatname travelogue.
Strengths
Contains OCR-derived text from seven distinct volumes of a major historical work.
Offers two structured views: one per page and one per concatenated document.
Limitations
Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Provenance
Source
OCR processing of Evliya Celebi's Seyahatname.
Collection Method
Optical Character Recognition (OCR) applied to the source texts.
Freshness
Last updated 2026-05-12 01:03:17; freshness should be verified.
License is unknown; terms of use must be verified.