Loading...
Loading...
Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data
1,534 datasets
1,889 images of Vincent van Gogh's works, curated from the larger OpenBrush-75K collection. All images are paired with structured captions generated by the Qwen3-VL-30B-A3B vision-language model. The dataset was created by jaddai and last updated on May 27, 2026.
221 trials from 11 novice participants manipulating objects with 20 visually encoded fragility levels (50–1000 gf). The dataset includes synchronized multimodal observations: robot joint trajectories, gripper force signals, multi-view RGB video, users' perceived fragility, confidence ratings, and trial outcomes. It was released by Jin Ong on figshare in April 2026.
A curated subset of 1,334 Claude Monet artworks from the OpenBrush-75K collection. The images are paired with structured captions generated by the Qwen3-VL-30B-A3B vision-language model. This dataset was created by jaddai and last updated on May 27, 2026.
188 septic patient records, including 89 with sepsis-associated acute kidney injury, integrate clinical variables with transcriptomic-guided blood biomarkers. Weiqin Wu developed this dataset to build a multimodal predictive model, with data last updated in April 2026. It features predictors like SOFA score, MAP, BUN, CRP, and the immune biomarkers CD177 and IL18R1.
Data Sheet 3 presents multimodal data from a study of 110 participants with bulimia nervosa, binge eating disorder, and matched controls. The dataset integrates task-based fMRI, intrinsic connectivity, voxel-based morphometry, neuropsychological assessments, and peripheral blood biomarkers. It was authored by Lena Rommerskirchen and last updated on figshare in April 2026.
A 2026 study by Lena Rommerskirchen applied machine learning to multimodal data from 110 participants with bulimia nervosa, binge eating disorder, and matched controls. The dataset integrates task-based fMRI, intrinsic connectivity, voxel-based morphometry, neuropsychological assessments, and peripheral blood biomarkers. It was used to classify diagnostic groups and predict individual symptom variation.
OpenBrush Rembrandt is a curated subset of 776 images of Rembrandt's works from the larger OpenBrush-75K collection. The dataset includes paintings, etchings, and sketches, all with AI-generated captions. It was created by jaddai and last updated on Hugging Face in May 2026.
A curated subset of 1,400 works by Pierre-Auguste Renoir from the OpenBrush-75K collection. The dataset includes structured visual language model captions generated by Qwen3-VL-30B-A3B, focusing on the artist's figure-and-portrait style. It was created by jaddai and last updated on Hugging Face in May 2026.
NVIDIA created a large-scale synthetic video dataset containing 236,937 clips totaling approximately 5,841 hours. The dataset features digital humans rendered in diverse indoor and outdoor 3D environments, with each sample being a temporally coherent 60-120 second video clip at 1080p and 30 fps. It was last updated on May 29, 2026.
A 16.7 KB document details a single case of a 14-year-old girl with late-stage Bockenheimer disease, a rare venous malformation. The case report, authored by Zilu Wang and last updated in April 2026, describes multimodal therapy including sclerotherapy, anticoagulation, and molecular targeted medication over a 9-month follow-up period. The text discusses the patient's presentation with severe anemia and coagulopathy, the treatment protocol, and outcomes including limb volume reduction and complication of elbow contracture.
Xinya Liang created this dataset for a manuscript submitted on June 2, 2026. The data supports research on reconstructing bedforms using RGB-D sensing and foundation models. It is a small dataset, 27.7 KB in size, and is shared under a CC-BY-4.0 license.
BALLADEER integrates EEG, eye tracking, and physiological signals from children and adolescents with ADHD and neurotypical controls. Its controlled protocol uses gamified cognitive tasks like Attention Slackline and CogniFit to elicit responses in attentional control and cognitive flexibility. This dataset supports the development of machine learning models for ADHD classification and the research of digital biomarkers.
Indian multilingual document images and OCR transcriptions curated by MILA: MULTILINGUAL INDIC LANGUAGE ARCHIVE. This representative subset contains samples spanning 19 Indian languages and scripts, focusing on real-world documents with complex layouts and noisy scans. The full dataset, covering all 22 official languages, is scheduled for release upon paper acceptance.
Behavioral data on a large flock of flamingos collected by animal care staff after a change in their enclosure. The dataset includes a blank template for others to use. It is a 17.9 KB XLSX file authored by Paul Rose and last updated on 2026-05-19.
Jiazhe Ma's dataset contains raw experimental data supporting a published article on cholesteric liquid crystal elastomer hollow fibers. The 72.0 MB dataset is organized by figure number from the manuscript, with each dataset presented as an Excel file or image. It was last updated on 2026-05-20 and is available under a CC-BY-4.0 license.
A multimodal dataset derived from the LLaVA-Instruct-150K source, containing synthetic annotations for tasks involving text, images, and speech. It is licensed under CC-BY-4.0 and was uploaded by author dreyn74. The dataset's size is indicated to be between 10,000 and 100,000 samples.
CapRL-Video-178K is a dataset providing file paths to over 97,000 video clips. The dataset is hosted by internlm on Hugging Face and was last updated on 2026-05-25. It serves as an index for video data from the LLaVA-Video-178K collection, which includes clips from sources like YouTube and ActivityNet.
GUIDE (GUI User Intent Detection Evaluation) is a benchmark for evaluating multimodal models on perceiving user behavior and inferring intent in open-ended GUI tasks. It consists of 67.5 hours of screen recordings from 120 novice user demonstrations with think-aloud narrations, across 10 software applications. The dataset was created by Saelyne Yang, Jaesang Yu, Yi-Hao Peng, Kevin, and others, and was last updated on Hugging Face in June 2026.
MMTutorBench is the first multimodal benchmark for AI math tutoring, containing 770 carefully curated problems paired with 1,414 images. The dataset provides structured reference answers and per-instance rubrics for evaluating large language models along three pedagogical axes: Insight, Operation Formulation, and Operation Execution. It was created by Tangchiu and last updated on May 22, 2026.
VLM Eval Videos is a benchmark dataset containing 693 short MP4 video clips for evaluating Vision-Language Models. The dataset, created by author gnitoahc, is organized into five categories, with each clip paired with a fixed question and a ground-truth short-sentence answer. It was last updated on the Hugging Face platform in June 2026.