Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Introduced in the paper 'LLaVA-CoT: Let Vision Language Models Reason Step-by-Step', this dataset is designed to enable Vision-Language Models to perform autonomous multistage reasoning. It integrates 100,000 samples from various visual question-answering sources with structured reasoning annotations. The dataset was authored by Xkev and last updated on the Hugging Face platform in December 2025.
License is unknown; terms of use must be verified before application.