Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A multimodal dataset from the LLaVA-CoT project, likely containing image-question-answer pairs structured for visual reasoning tasks. The dataset includes a train.jsonl file with conversation data linking images to questions and answers, suggesting a format for training or evaluating vision-language models. It was authored by 'berhaan' and last updated on 2026-01-17.
The image data is referenced as a zip file requiring concatenation of parts ('cat image.zip.part-* > image.zip'), which may complicate initial setup.