DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,929 datasets

Multimodal & LLM

OpenBrush Rembrandt: 776 Artworks with AI-Generated Captions

OpenBrush Rembrandt is a curated subset of 776 images of Rembrandt's works from the larger OpenBrush-75K collection. The dataset includes paintings, etchings, and sketches, all with AI-generated captions. It was created by jaddai and last updated on Hugging Face in May 2026.

ImageMultimodalImage CaptionsArt HistoryBaroque ArtRembrandt+1

0 views

Multimodal & LLM

Hard Intersection Multimodal Sample: Accident-Prone Urban Intersections in Japan

Hard Intersection Multimodal Samples is a curated multimodal dataset of accident-prone urban intersections in Japan for autonomous driving research. It provides multi-camera images, trajectory data, HD maps, semantic annotations, point cloud data, and 3DGS assets. The dataset was created by dynamic-maps and was last updated on June 10, 2026.

Point CloudMultimodal🇯🇵 JapanRoad SafetyAutonomous DrivingUrban Traffic+1

0 views

Multimodal & LLM

Openbrush Renoir: 1,400 Impressionist Paintings with Structured VLM Captions

A curated subset of 1,400 works by Pierre-Auguste Renoir from the OpenBrush-75K collection. The dataset includes structured visual language model captions generated by Qwen3-VL-30B-A3B, focusing on the artist's figure-and-portrait style. It was created by jaddai and last updated on Hugging Face in May 2026.

MultimodalImpressionismArt HistoryComputer VisionImage Captioning+1

0 views

Multimodal & LLM

MMU Apogee DR17: HATS Catalog Collection from the Multimodal Universe

The Multimodal Universe paper describes a large-scale collection of 100TBs of astronomical scientific data. This dataset is part of that collection, specifically representing the mmu_apogee_dr17 HATS catalog. It was authored by hugging-science and last updated on 2026-05-29.

MultimodalHats CatalogAstronomyMultimodal UniverseLarge ScaleApogee Dr17+1

0 views

Multimodal & LLM

RTI Stratified Data: Stably Stratified Rayleigh-Taylor Instability Simulation

Stably stratified Rayleigh-Taylor instability (RTI) evaluation data used in the research paper 'Emergent Transfer of a Physics Foundation Model from Simulation to Laboratory Turbulence' (arXiv:2606.01470). The dataset is hosted by author 'pmukhop' on Hugging Face and was last updated on 2026-06-02. It is stored in an HDF5 file named 'rti_stratified.h5'.

MultimodalPhysics SimulationBenchmarkTurbulenceFluid dynamicsRayleigh Taylor Instability+1

0 views

Multimodal & LLM

Kine2Go: Kinematic Motions for the Unitree Go2 Quadruped Robot

MIMUW-Robotics created a kinematic motion dataset for the Unitree Go2 quadruped robot. Forty reference clips from dog, horse, and synthetic robot motions were retargeted to the robot's morphology. The dataset, last updated in June 2026, includes per-clip imitation-learning policies and rendered video rollouts.

MultimodalQuadruped RobotImitation LearningRoboticsMotion CaptureKinematic DataSynthetic+1

0 views

Multimodal & LLM

SDG-SynHuman: Large-Scale Synthetic Video of Digital Humans

NVIDIA created a large-scale synthetic video dataset containing 236,937 clips totaling approximately 5,841 hours. The dataset features digital humans rendered in diverse indoor and outdoor 3D environments, with each sample being a temporally coherent 60-120 second video clip at 1080p and 30 fps. It was last updated on May 29, 2026.

VideoDigital HumansComputer VisionAi TrainingLarge ScaleSynthetic VideoSynthetic+1

0 views

Multimodal & LLM

Ayn-VQA-ArabicNLP26: Culturally Grounded Arabic Vision-Language Evaluation Dataset

Ayn-VQA-ArabicNLP26 is a multimodal evaluation dataset designed to test AI models on culturally specific Arabic image understanding. It is part of the ImageEval 2026 Shared Task at ArabicNLP 2026 and was created by QCRI. The dataset presents tasks in both English and Modern Standard Arabic language tracks.

MultimodalHallucination DetectionVision LanguageBenchmarkCultural GroundingComputer VisionArabic NlpMultimodal Evaluation+1

0 views

Multimodal & LLM

BiComp: Large-Scale Text-to-Image Preference Dataset with Region-Level Annotations

BiComp is a large-scale, high-quality text-to-image preference dataset containing 57,474 original and 94,502 edited images. The dataset is annotated with region-level information and filtered through a VQA-based quality control step. It was introduced by anzeameol in a 2026 paper on compositional text-to-image generation.

MultimodalPreference LearningMultimodal AiText To ImageComputer VisionImage EditingLarge Scale+1

0 views

Multimodal & LLM

SuperMemory-VQA: Egocentric Visual Question Answering for AR Assistants

SuperMemory-VQA is a benchmark dataset containing 4,853 human-verified question-answer pairs for evaluating long-horizon memory in augmented reality assistants. The dataset is designed around practical questions a person might ask a wearable memory assistant, such as locating objects or recalling events. It was created by OSU-AIoT-MLSys-Lab and was last updated on June 5, 2026.

MultimodalLong Term MemoryBenchmarkEgocentric VisionAugmented RealityVisual Question Answering+1

0 views

Multimodal & LLM

CAPRI: Cultural and Pragmatic Response Inference Dataset

CAPRI is a multimodal dataset for studying whether large language models act as pragmatic speakers by tailoring answers to a user's perceived cultural background. It was created by yisongmiao and last updated on June 17, 2026. Each item is a short conversation with varying cultural cues, followed by a visual question about an image.

MultimodalMultimodal LlmPragmatic InferenceComputer VisionHuman Ai InteractionCultural Ai+1

0 views

Multimodal & LLM

KITScenes Multimodal: European Urban Driving Data with 360° Sensor Coverage

A high-fidelity European urban autonomous-driving dataset built for the FiftyOne platform. Each frame contains synchronized data from a full robotaxi sensor suite, including nine global-shutter cameras, seven long-range lidars, and three 4D imaging radars. The dataset is packaged by Voxel51 and was last updated on June 8, 2026.

Point CloudMultimodalUrban ScenesHd MapsComputer VisionAutonomous DrivingMultimodal Sensor+1

0 views

Multimodal & LLM

Case Report: Multimodal Management of Late-Stage Bockenheimer Disease

A 16.7 KB document details a single case of a 14-year-old girl with late-stage Bockenheimer disease, a rare venous malformation. The case report, authored by Zilu Wang and last updated in April 2026, describes multimodal therapy including sclerotherapy, anticoagulation, and molecular targeted medication over a 9-month follow-up period. The text discusses the patient's presentation with severe anemia and coagulopathy, the treatment protocol, and outcomes including limb volume reduction and complication of elbow contracture.

TextCase ReportHealthcareVenous MalformationBockenheimer DiseaseMedical ManagementHematologic Complications+1

0 views

Multimodal & LLM

RGB-D and Foundation Model Data for Bedform Reconstruction with Artificial Vegetation

Xinya Liang created this dataset for a manuscript submitted on June 2, 2026. The data supports research on reconstructing bedforms using RGB-D sensing and foundation models. It is a small dataset, 27.7 KB in size, and is shared under a CC-BY-4.0 license.

MultimodalExcelArtificial VegetationFoundation ModelsBedform ReconstructionSynthetic+1

0 views

Multimodal & LLM

MOJITOO: Benchmarking Data for Multimodal Single-Cell Integration

MOJITOO benchmarking data for multimodal single-cell analysis, likely containing Seurat R objects. The dataset is associated with a method developed by Mingbo Cheng from RWTH Aachen University and is published on the paperswithcode platform. The specific temporal coverage and data volume are not detailed in the provided metadata.

MultimodalBenchmarkingSingle Cell OmicsBioinformaticsMultimodal Integration+1

0 views

Multimodal & LLM

Design FTO Bench: Cross-Modal Image Search for Patent Infringement Risk

PatSnap Design FTO Bench provides a benchmark for evaluating systems that retrieve design patents via product images. Each sample includes a query product image and a ground truth set of infringing design patents confirmed by legal proceedings. The dataset is maintained by PatSnap and was last updated in June 2026.

MultimodalCross Modal RetrievalComputer VisionImage SearchDesign PatentsPatent SearchIntellectual Property+1

0 views

Multimodal & LLM

Meerkat-Safe: Implicit Cross-Modal Risk Dataset for MLLMs

The first training dataset for implicit cross-modal risks in Multimodal LLMs, introduced in the ICML 2026 paper 'Meerkat-VL: Implicit Risk Safety Alignment in MLLMs via Perceptual Reasoning and Self-Verification'. It targets implicit risks by pairing benign images with potentially harmful text, contrasting with existing datasets focused on explicit risks. The dataset was uploaded by Tunanzzz on June 16, 2026.

MultimodalMultimodal LlmImplicit RiskSafety AlignmentPerceptual ReasoningComputer Vision+1

0 views

Multimodal & LLM

ImageEval2026 Task1: Culturally Grounded Arabic Visual Question Answering

Ayn-VQA is a multimodal evaluation dataset designed to test whether AI models can interpret culturally specific images based on Arabic questions. It is part of the ImageEval 2026 Shared Task at ArabicNLP 2026 and was created by QCRI. The dataset was last updated on June 8, 2026.

MultimodalHallucination DetectionArabic LanguageBenchmarkCultural GroundingComputer VisionVisual Question AnsweringMultimodal Evaluation+1

0 views

Multimodal & LLM

UniSER Haze Dataset: 2 Million Synthetic Haze Renderings

Released in 2026 with the CVPR paper 'UniSER: A Foundation Model for Unified Soft Effects Removal', this dataset contains approximately 80,000 unique clean images paired with around 2 million synthetic renderings of haze, fog, and smoke. It covers homogeneous, non-homogeneous, indoor, outdoor, daytime, and dense atmospheric conditions for training and benchmarking single-image dehazing models.

ImageImage SynthesisComputer VisionLarge ScaleDehazingSyntheticAtmospheric Effects+1

0 views

Multimodal & LLM

BALLADEER: Multimodal Neurophysiological Data for ADHD Research

BALLADEER integrates EEG, eye tracking, and physiological signals from children and adolescents with ADHD and neurotypical controls. Its controlled protocol uses gamified cognitive tasks like Attention Slackline and CogniFit to elicit responses in attentional control and cognitive flexibility. This dataset supports the development of machine learning models for ADHD classification and the research of digital biomarkers.

Time SeriesMultimodalZIPADHDPhysiological SignalsGamificationHealthcareEye TrackingEegNeurophysiology+1

0 views

PreviousPage 8 of 97Next