Name: CapRL QA 75K: 75,285 Image-Question Pairs for Captioning Model Training
Creator: internlm
Published: 2026-04-16T09:16:05
Keywords: Multimodal Training, Computer Vision, Image Captioning, Visual Question Answering, Multimodal

Description

75,285 samples of images paired with multiple-choice question-answer items, forming a training dataset for the CapRL-3B image captioning model. The dataset was created by internlm and was last updated on April 16, 2026. It is designed for a two-stage training objective where caption quality is evaluated through the answerability of visual questions.

Use Cases

Training image captioning models based on the described two-stage CapRL objective.
Evaluating caption quality through visual question answerability as described in the dataset's purpose.
Fine-tuning lightweight vision-language models initialized from architectures like Qwen2.5-VL-3B.
Research on the relationship between image captions and visual question answering performance.

Strengths

Contains 75,285 carefully filtered samples.
Designed for a specific, documented two-stage training objective for captioning.
The QA construction pipeline is stated to be fully open-sourced.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is known, but other metadata like file formats, size, and license are unknown.
Data may reflect bias inherent to the source collection and filtering pipeline.

Provenance

Source: internlm via Hugging Face
Collection Method: Carefully filtered from an unspecified source collection; the QA construction pipeline is open-sourced.
Time Range: null
Freshness: Last updated 2026-04-16 13:43:37; freshness should be verified.
Geography: null

License is unknown, which may restrict commercial use or redistribution.

Multimodal Multimodal Training Computer Vision Image Captioning Visual Question Answering

CapRL QA 75K: 75,285 Image-Question Pairs for Captioning Model Training

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info