Sign in to view source links and access this dataset
Description
Textindiagrams is a dataset of 948 historical astronomical diagrams annotated with 10,940 oriented polygonal text regions. It spans ten centuries (8th to 18th) and covers seven major traditions: Arabic, Persian, Chinese, Byzantine, Latin, Hebrew, and Sanskrit. The dataset was introduced by author sonatbaltaci in a paper and is hosted on Hugging Face.
Use Cases
Train text region detection models based on the 10,940 annotated polygonal text regions.
Study the evolution of astronomical illustration and notation across different cultural traditions mentioned in the description.
Benchmark multilingual OCR systems on historical documents spanning ten centuries.
Analyze document layout and text-graphic relationships in historical scientific diagrams.
Strengths
Contains 948 annotated diagrams, providing a substantial corpus for training.
Includes 10,940 detailed text region annotations with oriented polygons.
Covers a wide temporal range of ten centuries (8th to 18th).
Represents seven distinct cultural and linguistic traditions, offering diversity.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect geographic, temporal, and source bias inherent to the selected historical diagrams.
Provenance
Source
Official repository of the paper "Text region detection in historical astronomical diagrams" by author sonatbaltaci.
Collection Method
Dataset creation method is not detailed in the provided input.
Time Range
Spans the 8th to 18th centuries.
Freshness
Last updated 2026-06-08 13:57:02; freshness should be verified.
Geography
Covers diagrams from Arabic, Persian, Chinese, Byzantine, Latin, Hebrew, and Sanskrit traditions.
License is unknown; users should verify terms of use before downloading.