SynthChartNet: 1.9M Synthetic Charts for Document AI

Name: SynthChartNet: 1.9M Synthetic Charts for Document AI
Creator: docling-project
Published: 2025-07-15T09:40:20
Keywords: Multimodal Ai, Chart Understanding, Computer Vision, Synthetic Data, Document Ai, Synthetic, Multimodal

by docling-projectUpdated 1y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

1,981,157 synthetically generated chart images with ground truth annotations form this multimodal dataset. Created by the docling-project and last updated in July 2025, it is designed for training the SmolDocling model on chart-based document understanding. Charts were rendered at 120 DPI using visualization libraries like Matplotlib, Seaborn, and Pyecharts.

Use Cases

Train chart recognition models based on synthetically generated images of line, bar, and pie charts.
Develop chart-to-text systems based on the OTSL-formatted ground truth annotations.
Benchmark document AI models on a large-scale, diverse set of chart visualizations.
Fine-tune vision-language models for tasks requiring comprehension of data visualizations.

Strengths

Contains 1,981,157 samples, providing a large-scale resource for model training.
Uses multiple visualization libraries (Matplotlib, Seaborn, Pyecharts), suggesting diversity in chart styles.
Specifically designed for training the SmolDocling model, indicating a clear application focus.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Data may reflect bias inherent to the synthetic generation process and libraries used.

Provenance

Source: docling-project
Collection Method: Synthetically generated using visualization libraries.
Freshness: Last updated 2025-07-15 14:16:24

License is unknown; users should verify terms of use before downloading.

Multimodal Multimodal Ai Chart Understanding Computer Vision Synthetic Data Document Ai Synthetic

Related Datasets

Quality Score

D39

Description

39

Source

36

Reputation

52

Access

26

Community

1.0K downloads

15 likes

0 views

Dataset Info

Author: docling-project
Created: Jul 15, 2025
Updated: Jul 15, 2025
Last synced: Jun 1, 2026

Access

26

Community

1.0K downloads

15 likes

0 views

Dataset Info

Author: docling-project
Created: Jul 15, 2025
Updated: Jul 15, 2025
Last synced: Jun 1, 2026

SynthChartNet: 1.9M Synthetic Charts for Document AI

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info