FlipVQA-85K: A Multimodal Reasoning Benchmark from 544 College-Level STEM Documents

Name: FlipVQA-85K: A Multimodal Reasoning Benchmark from 544 College-Level STEM Documents
Creator: OpenDCAI
Published: 2026-03-30T06:37:54
Keywords: Benchmark, Stem Education, Reasoning Benchmark, Natural Language Processing, Multimodal Assessment, Visual Question Answering, Multimodal

by OpenDCAIUpdated 3mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

FlipVQA-85K is a high-fidelity reasoning benchmark curated from a corpus of 544 college-level educational PDF documents, including expert-authored textbooks and exercise sets. The collection spans 11 academic disciplines, primarily in STEM domains where problems involve rigorous and verifiable reasoning processes. It was created by OpenDCAI and last updated on the platform in April 2026.

Use Cases

Benchmarking multimodal reasoning models based on problems from college-level textbooks.
Training AI for visual question answering on STEM content derived from educational PDFs.
Evaluating the step-by-step reasoning capabilities of large language models using verifiable processes mentioned in the description.

Strengths

Curated from 544 expert-authored college-level PDF documents.
Spans 11 academic disciplines, with a focus on STEM domains.
Problems are designed for rigorous and verifiable reasoning processes.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: OpenDCAI
Collection Method: Curated from a corpus of 544 college-level educational PDF documents.
Time Range: null
Freshness: Last updated 2026-04-04 04:54:56; freshness should be verified.
Geography: null

null

Multimodal Benchmark Stem Education Reasoning Benchmark Natural Language Processing Multimodal Assessment Visual Question Answering

Related Datasets

Quality Score

D38

Description

39

Source

39

Reputation

41

Access

26

Community

41 downloads

1 likes

0 views

Dataset Info

Author: OpenDCAI
Created: Mar 30, 2026
Updated: Apr 4, 2026
Last synced: Apr 27, 2026

Access

26

Community

41 downloads

1 likes

0 views

Dataset Info

Author: OpenDCAI
Created: Mar 30, 2026
Updated: Apr 4, 2026
Last synced: Apr 27, 2026

FlipVQA-85K: A Multimodal Reasoning Benchmark from 544 College-Level STEM Documents

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info