CSVQA is a Chinese multimodal benchmark designed to evaluate the STEM reasoning capabilities of Vision-Language Models. The dataset was created by Skywork and its associated paper was released on arXiv in June 2025. It focuses on scientific visual question answering, combining images with text in Chinese.
Use Cases
- Benchmarking VLM performance on Chinese STEM questions based on the described multimodal nature
- Training models for scientific visual question answering based on the described task focus
- Analyzing model reasoning in physics and chemistry based on the STEM education tags
- Developing educational AI tools for Chinese-language STEM learning based on the described domain
Strengths
- Dataset is explicitly designed for evaluating STEM reasoning, a specific and challenging domain
- Created by a named author (Skywork) with a published arXiv paper in June 2025
- Focuses on the Chinese language, addressing a specific linguistic context
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count, file formats, and license are unknown, which may limit suitability assessment
Provenance
- Source
- Skywork
- Collection Method
- Likely created for research benchmarking, as indicated by the arXiv paper and leaderboard references.
- Time Range
- null
- Freshness
- Last updated 2025-06-20 03:36:37
- Geography
- null