ChartVerse-SFT-600K contains 600,000 high-quality samples for chart reasoning, each annotated with a Chain-of-Thought (CoT) rationale. The dataset was developed by opendatalab as part of the ChartVerse project and was last updated on January 23, 2026. It is filtered to exclude trivial samples, ensuring every entry provides a meaningful learning signal for model training.
Use Cases
- Training models for chart-based question answering based on the described chart reasoning tasks.
- Fine-tuning models to generate step-by-step reasoning (Chain-of-Thought) based on the CoT annotations.
- Benchmarking model performance on non-trivial visual reasoning problems based on the filtered samples.
- Developing instruction-following capabilities for multimodal AI based on the supervised fine-tuning (SFT) nature of the dataset.
Strengths
- 600,000 samples provide a large-scale resource for training.
- Chain-of-Thought annotations are included for each sample.
- Samples are filtered by failure rate (r > 0) to ensure non-trivial learning challenges.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Data may reflect bias inherent to the source collection and annotation process.
Provenance
- Source
- opendatalab/ChartVerse project on Hugging Face.
- Collection Method
- Developed as part of the ChartVerse project; details on method are referenced to the project page.
- Time Range
- null
- Freshness
- Last updated 2026-01-23 03:20:07; freshness should be verified.
- Geography
- null