DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,929 datasets

Multimodal & LLM

RoboShackles: 1,200 Safety-Critical Robotic Video Clips for Testing Embodied AI

RoboShackles is a safety benchmark for evaluating Embodied Foundation Models. The public test split contains 1,200 safety-critical robotic video clips, with 200 videos per category. The dataset was created by YZW00 and last updated on Hugging Face in June 2026.

TabularVideoMultimodalMachine LearningBenchmarkRoboticsSafety BenchmarkVideo ClipsSimulationEmbodied Ai+1

0 views

Multimodal & LLM

OpenBrush Landscapes: 12,612 Landscape Paintings Across Artistic Movements

OpenBrush Landscapes is a curated subset of the OpenBrush-75K dataset containing every landscape painting from the parent collection. It includes 12,612 images across all artists, movements, and centuries, curated so users do not need to download the full 75,313-image dataset. The subset was created by jaddai and was last updated on May 27, 2026.

MultimodalLandscape PaintingsArt HistoryComputer VisionOpen Data+1

0 views

Multimodal & LLM

OpenBrush Religious Art: 6,119 Paintings from Medieval to Baroque Eras

6,119 religious paintings curated from the OpenBrush-75K collection. The dataset focuses on saints, biblical scenes, and devotional works, with a heavy emphasis on Renaissance and Baroque eras. It was created by jaddai and last updated on May 27, 2026.

MultimodalPaintingRenaissanceArt HistoryReligious Art+1

0 views

Multimodal & LLM

Thermal Recordings of High-Impedance Faults in Medium-Voltage Covered Conductors

4.3 GB of thermal recordings from controlled laboratory experiments on high-impedance faults in medium-voltage covered conductors. Diogo Biasuz Dahlke created the dataset, which includes thermal video files (MP4) and native radiometric files (HRV) capturing temperature evolution during fault initiation. The dataset was last updated on 2026-05-05.

Time SeriesVideoMultimodalZIPHigh Impedance FaultsThermal ImagingElectrical TestingMedium Voltage+1

0 views

Multimodal & LLM

OpenBrush Portraits: 13,059 Portrait Paintings Across Art Movements

13,059 portrait paintings curated from the OpenBrush-75K dataset, spanning artistic movements from Renaissance to Realist. The subset was created by jaddai using the Qwen3-VL-30B-A3B vision-language model and last updated on May 27, 2026. It provides a focused collection of portraits under a CC0 license.

ImageMultimodalOpenbrushArt HistoryPortrait Painting+1

0 views

Multimodal & LLM

IndustryBench-MIPU: Multi-Image Attribute Extraction Benchmark for Industrial Products

IndustryBench-MIPU is a benchmark dataset for evaluating multimodal large language models on extracting product specifications from multiple heterogeneous images. The dataset, created by alibaba-multimodal-industrial-ai, tests model capabilities in text recognition, visual reasoning, domain knowledge, and cross-image evidence integration. It was last updated on June 15, 2026.

MultimodalMachine LearningProduct SpecificationsAi EvaluationMultimodal LlmIndustrial BenchmarkComputer VisionAttribute Extraction+1

0 views

Multimodal & LLM

EAC-Agent: Multimodal Emotion Recognition and Response Generation Results

A research dataset containing performance metrics for a multimodal conversational agent named EAC-Agent. The dataset likely contains results from validation on benchmark datasets IEMOCAP and MELD. It was uploaded by Shahid Jamil to figshare on 2026-04-17.

AudioMultimodalExcelBenchmarkEmotion RecognitionMultimodal ConversationBenchmark Datasets+1

0 views

Multimodal & LLM

Dd3: Ti10Mo6Cu LPBF Visual Question Answering Dataset for Quality Assessment

A multimodal dataset for visual question answering in additive manufacturing, focusing on quality assessment of Ti10Mo6Cu alloy parts produced via Laser Powder Bed Fusion. The dataset was created by AI4Manufacturing and was last updated on July 16, 2026. Each row contains fields for a query, an image, an annotation, reasoning, category, task, and metadata.

TabularMultimodalQuality AssessmentComputer VisionAdditive manufacturingVisual Question AnsweringMaterials Science+1

0 views

Multimodal & LLM

OpenBrush Baroque: 4,240 Baroque Art Images with Captions

4,240 Baroque-era artworks curated from the larger OpenBrush-75K collection. The subset focuses on the canonical Baroque visual language from approximately 1600 to 1750, characterized by chiaroscuro and dramatic lighting. It was created by jaddai and last updated on May 27, 2026.

MultimodalImage CaptionsArt HistoryFine ArtComputer VisionBaroque Art+1

0 views

Multimodal & LLM

UniCure: Multi-modal Datasets and Weights for Personalized Cancer Therapy Prediction

UniCure is a multi-modal framework integrating omics and chemical foundation models to predict transcriptomic drug responses. This repository contains the pre-processed datasets, configuration files, and pre-trained model weights required to reproduce the results. The archive is 12.4 GB and was last updated on 2026-04-23 by Zexi Chen.

MultimodalTranscriptomicsCancer TherapyBenchmarkMulti Modal AiOmics DataDrug Response+1

0 views

Multimodal & LLM

Verify-or-Trust: Benchmark Data for LLM Orchestration of a Biology Foundation Model

A benchmark dataset for evaluating whether a Large Language Model correctly allocates verification when orchestrating a fallible biology foundation model. The dataset, created by jang1563, includes a substrate table from the GEARS/Norman experiment. It was last updated on June 17, 2026.

TabularLlm BenchmarkBiology Foundation ModelBenchmarkPerturbation EffectVerification Trust+1

0 views

Multimodal & LLM

CRYSTAL: Diagnostic Benchmark for Multimodal Step-by-Step Reasoning

CRYSTAL is a diagnostic benchmark for evaluating multimodal reasoning step by step, not just by the final answer. Each instance pairs an image and a question with an ordered sequence of natural-language reference reasoning steps, enabling step-level metrics like Match F1 and Ordered Match F1 alongside answer accuracy. The dataset was created by author waybarrios and was last updated on the platform in June 2026.

MultimodalVision LanguageBenchmarkComputer VisionMultimodal ReasoningStep By Step EvaluationDiagnostic Benchmark+1

0 views

Multimodal & LLM

OpenBrush Van Gogh: 1,889 Artworks with Structured VLM Captions

1,889 images of Vincent van Gogh's works, curated from the larger OpenBrush-75K collection. All images are paired with structured captions generated by the Qwen3-VL-30B-A3B vision-language model. The dataset was created by jaddai and last updated on May 27, 2026.

ImageMultimodalArt HistoryPost ImpressionismImage CaptioningDigital Humanities+1

0 views

Multimodal & LLM

Multimodal Teleoperation Dataset for Fragile Object Manipulation with 221 Trials

221 trials from 11 novice participants manipulating objects with 20 visually encoded fragility levels (50–1000 gf). The dataset includes synchronized multimodal observations: robot joint trajectories, gripper force signals, multi-view RGB video, users' perceived fragility, confidence ratings, and trial outcomes. It was released by Jin Ong on figshare in April 2026.

MultimodalHDF5Robot ManipulationShared AutonomyTeleoperationHuman BehaviorPerception Modeling+1

0 views

Multimodal & LLM

LeafBench: Visual Question Answering Benchmark for Plant Disease Diagnosis

LeafBench is a visual question answering benchmark derived from the LeafNet dataset. It is designed to evaluate Vision-Language Models on six hierarchical diagnostic tasks for plant diseases. The dataset was created by author 'enalis' and was last updated on June 20, 2026.

MultimodalVision Language ModelsAgriculture AiBenchmarkHealthcareComputer VisionLarge ScalePlant DiseaseVisual Question Answering+1

0 views

Multimodal & LLM

OpenBrush Monet: Claude Monet Artworks with Structured VLM Captions

A curated subset of 1,334 Claude Monet artworks from the OpenBrush-75K collection. The images are paired with structured captions generated by the Qwen3-VL-30B-A3B vision-language model. This dataset was created by jaddai and last updated on May 27, 2026.

MultimodalImage CaptionsImpressionismArt HistoryMonet+1

0 views

Multimodal & LLM

Sepsis-AKI Predictors with CD177 and IL18R1 Biomarkers

188 septic patient records, including 89 with sepsis-associated acute kidney injury, integrate clinical variables with transcriptomic-guided blood biomarkers. Weiqin Wu developed this dataset to build a multimodal predictive model, with data last updated in April 2026. It features predictors like SOFA score, MAP, BUN, CRP, and the immune biomarkers CD177 and IL18R1.

TabularTime SeriesGene ExpressionBiomarkersClinical PredictorsHealthcareSepsis AkiIcu Outcomes+1

0 views

Multimodal & LLM

Multimodal Neurobiological Data for Binge-Type Eating Disorder Classification

A 2026 study by Lena Rommerskirchen applied machine learning to multimodal data from 110 participants with bulimia nervosa, binge eating disorder, and matched controls. The dataset integrates task-based fMRI, intrinsic connectivity, voxel-based morphometry, neuropsychological assessments, and peripheral blood biomarkers. It was used to classify diagnostic groups and predict individual symptom variation.

MultimodalMachine LearningEating DisordersBiomarkersNeuroimagingPsychiatry+1

0 views

Multimodal & LLM

Multimodal Neurobiological Data for Binge-Type Eating Disorders, 110 Participants

Data Sheet 3 presents multimodal data from a study of 110 participants with bulimia nervosa, binge eating disorder, and matched controls. The dataset integrates task-based fMRI, intrinsic connectivity, voxel-based morphometry, neuropsychological assessments, and peripheral blood biomarkers. It was authored by Lena Rommerskirchen and last updated on figshare in April 2026.

MultimodalMachine LearningEating DisordersBiomarkersFmriNeuroimaging+1

0 views

Multimodal & LLM

MNIST-VQA: Synthetic Visual Question Answering Dataset with MNIST Digits

MNIST-VQA is a synthetic Visual Question Answering dataset generated from MNIST digits placed on a 3x3 grid. It is designed to test spatial reasoning, object localization, counting, and existence verification capabilities of VQA models. The dataset was created by author star092304 and was last updated on the Hugging Face platform in June 2026.

MultimodalMnistSpatial ReasoningComputer VisionSynthetic DataVisual Question AnsweringSynthetic+1

0 views

PreviousPage 7 of 97Next