440 expert-verified scientific questions spanning 22 disciplines, partitioned into Q-Mirror-Expert (310 questions) and Q-Mirror-Grad (130 questions). This benchmark dataset, created by Q-mirror, is for evaluating the transformation of text-only QA pairs into multi-modal QA pairs and includes JSONL annotation files and generated PNG images. It was last updated on 2026-05-06.
Use Cases
- Benchmarking multi-modal question-answering models based on the 440 expert-verified scientific questions
- Evaluating text-to-multi-modal transformation techniques based on the provided text-only QA pairs and generated images
- Training or fine-tuning vision-language models for scientific domains based on the annotated multi-modal QA pairs
Strengths
- Contains 440 questions verified by experts, ensuring a quality benchmark
- Covers 22 distinct scientific disciplines, providing broad domain coverage
- Includes a partition into expert-level (310) and graduate-level (130) questions for targeted evaluation
Limitations
- Column-level documentation is absent; field semantics must be inferred after download
- Row count is unknown, which may limit suitability assessment for large-scale training
Provenance
- Source
- Q-mirror
- Collection Method
- Expert verification and generation for the NeurIPS 2026 Evaluations and Datasets Track
- Time Range
- 2026
- Freshness
- Last updated 2026-05-06 16:17:51; freshness should be verified
- Geography
- null