Name: Wethink Multimodal Reasoning 120K
Creator: WeThink
Published: 2025-05-15T12:14:11
Keywords: Vision Language, Question Answering, Computer Vision, Multimodal Reasoning, Vqa, Multimodal

Description

A multimodal dataset containing approximately 120,000 image-text pairs for reasoning tasks, created by WeThink and last updated on May 15, 2025. The description indicates it aggregates images from multiple established sources including COCO, Visual Genome, and TextVQA. It is hosted on the Hugging Face platform.

Use Cases

Training vision-language models for general image understanding based on the COCO and Visual Genome subsets.
Developing specialized models for text-intensive image analysis based on the TextVQA, DocVQA, and OCR-VQA components.
Benchmarking scientific and technical reasoning capabilities based on the ScienceQA and AI2D subsets.
Fine-tuning models for chart and diagram interpretation based on the ChartQA and AI2D components.

Strengths

Aggregates data from over 15 established public datasets, including COCO (25,344 images) and ChartQA (21,781 images).
Covers diverse image types such as general scenes, text-intensive documents, and scientific diagrams.
The dataset page was updated on 2025-05-15, suggesting recent maintenance.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The primary image data is hosted in a separate repository, requiring users to manage multiple sources.

Provenance

Source: Aggregated from multiple public datasets including COCO, Visual Genome, TextVQA, ScienceQA, and others.
Collection Method: Likely a curated collection and combination of existing datasets.
Time Range: null
Freshness: Last updated 2025-05-15 12:42:22.
Geography: null

The image data is stored in a separate Hugging Face repository (Xkev/LLaVA-CoT-100k), requiring users to access two datasets.

Multimodal Vision Language Question Answering Computer Vision Multimodal Reasoning Vqa

Wethink Multimodal Reasoning 120K

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info