Name: ULVR v2 Clean: Universal Latent Visual Reasoning Training Data
Creator: RuoliuYang
Published: 2026-05-29T08:39:22
Keywords: Question Answering, Multimodal Training, Computer Vision, Vqa, Multimodal, Visual Reasoning

Description

Over 1.2 million samples across eight categories comprise ULVR_v2_clean, a cleaned dataset for visual reasoning. Each sample includes an input image and a question, with an assistant's response containing a visual token, intermediate steps, and a boxed answer. The dataset was created by RuoliuYang and was last updated on HuggingFace in June 2026.

Use Cases

Training visual question answering models based on image-question-answer triplets.
Developing models that generate intermediate visual reasoning steps.
Benchmarking multimodal AI performance on tasks like object detection, segmentation, and scene graph generation implied by the subset names.
Fine-tuning large language models to incorporate visual reasoning tokens and structured outputs.

Strengths

Dataset is organized into eight distinct subsets, including 'text_cot', 'bbox_highlight', and 'segmentation'.
Provides separate train and validation splits for each subset, with the largest training split ('helper_interleaved') containing over 340,000 samples.
The description specifies a structured output format for each sample, including intermediate visual steps.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count for the full dataset is not aggregated, which may limit suitability assessment.
The 'scene_graph' subset row count is truncated in the description, obscuring its full size.

Provenance

Source: HuggingFace dataset repository by RuoliuYang.
Collection Method: Method of gathering is not specified in the provided input.
Freshness: Last updated 2026-06-02 11:54:48; freshness should be verified.

License is unknown; users must verify terms of use before downloading.

Multimodal Question Answering Multimodal Training Computer Vision Vqa Visual Reasoning

ULVR v2 Clean: Universal Latent Visual Reasoning Training Data

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info