FlipVQA-85K is a high-fidelity reasoning benchmark curated from a corpus of 544 college-level educational PDF documents, including expert-authored textbooks and exercise sets. The collection spans 11 academic disciplines, primarily in STEM domains where problems involve rigorous and verifiable reasoning processes. It was created by OpenDCAI and last updated on the platform in April 2026.
Use Cases
- Benchmarking multimodal reasoning models based on problems from college-level textbooks.
- Training AI for visual question answering on STEM content derived from educational PDFs.
- Evaluating the step-by-step reasoning capabilities of large language models using verifiable processes mentioned in the description.
Strengths
- Curated from 544 expert-authored college-level PDF documents.
- Spans 11 academic disciplines, with a focus on STEM domains.
- Problems are designed for rigorous and verifiable reasoning processes.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- Description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- OpenDCAI
- Collection Method
- Curated from a corpus of 544 college-level educational PDF documents.
- Time Range
- null
- Freshness
- Last updated 2026-04-04 04:54:56; freshness should be verified.
- Geography
- null