Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A multimodal dataset containing approximately 120,000 image-text pairs for reasoning tasks, created by WeThink and last updated on May 15, 2025. The description indicates it aggregates images from multiple established sources including COCO, Visual Genome, and TextVQA. It is hosted on the Hugging Face platform.
The image data is stored in a separate Hugging Face repository (Xkev/LLaVA-CoT-100k), requiring users to access two datasets.