Loading...
Loading...
Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data
1,540 datasets
SuperviseLab provides professional video annotation data for training multimodal AI models. This public sample dataset demonstrates annotation methodology and output quality across diverse video content categories. All visual assets have been abstracted to protect source privacy, and identifiable metadata has been removed.
Spectra is a multimodal question-answering training dataset designed for vision-language models. It combines graduate-level science questions from TQA and ScienceQA with open-world knowledge questions from OKVQA and science questions across physics, chemistry, math, and biology from AI2D. The dataset was created by Tamalmajumder and was last updated on April 18, 2026.