DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Evliya Celebi Seyahatname: OCR Text from Seven Travelogue Volumes | DataSalon

Home Computer VisionEvliya Celebi Seyahatname: OCR Text from Seven Travelogue Volumes

Computer Vision

Evliya Celebi Seyahatname: OCR Text from Seven Travelogue Volumes

Name: Evliya Celebi Seyahatname: OCR Text from Seven Travelogue Volumes
Creator: fatihburakkaragoz
Published: 2026-05-11T17:17:57
Keywords: Language Modeling, Historical Text, Text, Natural Language Processing, Ottoman Turkish, Ocr Corpus

by fatihburakkaragoz·Updated 1mo ago

Available on 1 platform

Description

Seven volumes of Evliya Celebi's Seyahatname travelogue, processed via OCR and packaged for corpus exploration. The dataset was created by fatihburakkaragoz and last updated on 2026-05-12. It provides text from volumes 1, 3, 4, 6, 7, 9, and 10, with volumes 2, 5, and 8 missing from the sequence.

Use Cases

Corpus exploration based on the provided OCR-derived text volumes.
Language modeling based on historical Ottoman Turkish text.
OCR-quality analysis based on the page-level OCR status information.
Historical text analysis based on the content of the Seyahatname travelogue.

Strengths

Contains OCR-derived text from seven distinct volumes of a major historical work.
Offers two structured views: one per page and one per concatenated document.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: OCR processing of Evliya Celebi's Seyahatname.
Collection Method: Optical Character Recognition (OCR) applied to the source texts.
Freshness: Last updated 2026-05-12 01:03:17; freshness should be verified.

License is unknown; terms of use must be verified.

Text Language Modeling Historical Text Natural Language Processing Ottoman Turkish Ocr Corpus

Related Datasets

Quality Score

D38

Description

Source

Reputation

Quality Score

D38

Description

Source

Reputation

Access

Community

56 downloads

1 likes

0 views

Dataset Info

Author: fatihburakkaragoz
Created: May 11, 2026
Updated: May 12, 2026
Last synced: May 21, 2026

Access

Community

56 downloads

1 likes

0 views

Dataset Info

Author: fatihburakkaragoz
Created: May 11, 2026
Updated: May 12, 2026
Last synced: May 21, 2026

Evliya Celebi Seyahatname: OCR Text from Seven Travelogue Volumes

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info