DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,929 datasets

Multimodal & LLM

Trendyol Cybersecurity Instruction Tuning Dataset: 53,202 Examples for AI Assistants

53,202 instruction-tuning examples were curated by the Trendyol Security Team for training defensive cybersecurity AI assistants. The dataset covers over 200 specialized cybersecurity domains, including cloud-native threats and AI/ML security. It was expanded from an earlier version of 21,000 rows and was last updated on June 20, 2026.

TextCybersecurityCloud SecurityInstruction TuningLarge Language ModelsDefensive Security+1

0 views

Multimodal & LLM

SWITCH: Benchmark for Tangible Interface Actions in Egocentric Scenarios

SWITCH (Semantic World Interface Tasks for Control & Handling) is a multimodal embodied-interaction benchmark created by BAAI-Agents. It focuses on understanding, modeling, and evaluating actions over Tangible Control Interfaces (TCIs) like appliance panels and lighting controls in real-world, egocentric scenarios. The dataset was last updated on 2026-06-18.

MultimodalTangible InterfacesBenchmarkEgocentric VisionEmbodied AiMultimodal Benchmark+1

0 views

Multimodal & LLM

HakushoBench: Japanese Chart and Table VQA Benchmark from Government White Papers

HakushoBench is a Japanese visual question answering benchmark built from 33 governmental white papers. It contains 2,053 images spanning over 10 chart and table types, with manually annotated QA pairs. The dataset was created by llm-jp and last updated on Hugging Face in June 2026.

MultimodalGovernment DocumentsChart UnderstandingBenchmarkComputer VisionJapanese LanguageVisual Question Answering+1

0 views

Multimodal & LLM

MOTOR: Multi-View Two-Wheeler Rider Behavior in Dense Indian Traffic

India is the source for this dataset of two-wheeler rider behavior in dense, unstructured traffic. The full dataset comprises 1,629 annotated sequences (~25 hours) from 16 riders, collected across diverse traffic scenarios. It was created by Voxel51 and is a multi-view, multimodal dataset.

Multimodal🇮🇳 IndiaTwo WheelerComputer VisionLarge ScaleTraffic Behavior+1

0 views

Multimodal & LLM

DVD-Bench: A Benchmark for Dialogue-Centric Video Description

DVD-Bench is a benchmark for evaluating dialogue-centric video description, focusing on 'When, Who, and What is Said'. The dataset was created by tsinghua-ee and was last updated on 2026-06-24 10:56:52. It contains video files and corresponding annotation data in a parquet format for the English test split.

VideoMultimodalBenchmarkVideo DescriptionMultimodal EvaluationDialogue Analysis+1

0 views

Multimodal & LLM

GridVQA-X: Diagnostic Framework for Evaluating Cross-Modal Explainers

GridVQA-X is a diagnostic framework for evaluating the faithfulness of post-hoc cross-modal explainers. It features S × S visual grids populated by geometric objects paired with questions, using a closed-world synthesis logic with mathematically guaranteed unique ground-truth explanations. The dataset was created by Aikyam-Lab and was last updated on Hugging Face in June 2026.

MultimodalSynthetic DataCross Modal ExplainabilityDiagnostic BenchmarkVisual Question Answering+1

0 views

Multimodal & LLM

EN-ES financial multimodal translation model (DIMT): gemma-4-E4B LoRA adapters (EN image -

Spanish IBEX 35 companies' annual reports form the basis for this fine-tuned model for Document Image Machine Translation (DIMT). The model, developed by Torterolo Orta et al., translates directly from English page images to Spanish text using LoRA adapters for the google/gemma-4-E4B-it model. The work is associated with a paper to be published in late 2026.

MultimodalMachine TranslationSpanish-languageMultimodal AiBenchmarkComputer VisionFinanceFinancial DocumentsFine Tuned Model+1

0 views

Multimodal & LLM

RoboShackles: 1,200 Safety-Critical Robotic Video Clips for Testing Embodied AI

RoboShackles is a safety benchmark for evaluating Embodied Foundation Models. The public test split contains 1,200 safety-critical robotic video clips, with 200 videos per category. The dataset was created by YZW00 and last updated on Hugging Face in June 2026.

TabularVideoMultimodalMachine LearningBenchmarkRoboticsSafety BenchmarkVideo ClipsSimulationEmbodied Ai+1

0 views

Multimodal & LLM

OpenBrush Landscapes: 12,612 Landscape Paintings Across Artistic Movements

OpenBrush Landscapes is a curated subset of the OpenBrush-75K dataset containing every landscape painting from the parent collection. It includes 12,612 images across all artists, movements, and centuries, curated so users do not need to download the full 75,313-image dataset. The subset was created by jaddai and was last updated on May 27, 2026.

MultimodalLandscape PaintingsArt HistoryComputer VisionOpen Data+1

0 views

Multimodal & LLM

OpenBrush Religious Art: 6,119 Paintings from Medieval to Baroque Eras

6,119 religious paintings curated from the OpenBrush-75K collection. The dataset focuses on saints, biblical scenes, and devotional works, with a heavy emphasis on Renaissance and Baroque eras. It was created by jaddai and last updated on May 27, 2026.

MultimodalPaintingRenaissanceArt HistoryReligious Art+1

0 views

Multimodal & LLM

Thermal Recordings of High-Impedance Faults in Medium-Voltage Covered Conductors

4.3 GB of thermal recordings from controlled laboratory experiments on high-impedance faults in medium-voltage covered conductors. Diogo Biasuz Dahlke created the dataset, which includes thermal video files (MP4) and native radiometric files (HRV) capturing temperature evolution during fault initiation. The dataset was last updated on 2026-05-05.

Time SeriesVideoMultimodalZIPHigh Impedance FaultsThermal ImagingElectrical TestingMedium Voltage+1

0 views

Multimodal & LLM

OpenBrush Portraits: 13,059 Portrait Paintings Across Art Movements

13,059 portrait paintings curated from the OpenBrush-75K dataset, spanning artistic movements from Renaissance to Realist. The subset was created by jaddai using the Qwen3-VL-30B-A3B vision-language model and last updated on May 27, 2026. It provides a focused collection of portraits under a CC0 license.

ImageMultimodalOpenbrushArt HistoryPortrait Painting+1

0 views

Multimodal & LLM

IndustryBench-MIPU: Multi-Image Attribute Extraction Benchmark for Industrial Products

IndustryBench-MIPU is a benchmark dataset for evaluating multimodal large language models on extracting product specifications from multiple heterogeneous images. The dataset, created by alibaba-multimodal-industrial-ai, tests model capabilities in text recognition, visual reasoning, domain knowledge, and cross-image evidence integration. It was last updated on June 15, 2026.

MultimodalMachine LearningProduct SpecificationsAi EvaluationMultimodal LlmIndustrial BenchmarkComputer VisionAttribute Extraction+1

0 views

Multimodal & LLM

EAC-Agent: Multimodal Emotion Recognition and Response Generation Results

A research dataset containing performance metrics for a multimodal conversational agent named EAC-Agent. The dataset likely contains results from validation on benchmark datasets IEMOCAP and MELD. It was uploaded by Shahid Jamil to figshare on 2026-04-17.

AudioMultimodalExcelBenchmarkEmotion RecognitionMultimodal ConversationBenchmark Datasets+1

0 views

Multimodal & LLM

Dd3: Ti10Mo6Cu LPBF Visual Question Answering Dataset for Quality Assessment

A multimodal dataset for visual question answering in additive manufacturing, focusing on quality assessment of Ti10Mo6Cu alloy parts produced via Laser Powder Bed Fusion. The dataset was created by AI4Manufacturing and was last updated on July 16, 2026. Each row contains fields for a query, an image, an annotation, reasoning, category, task, and metadata.

TabularMultimodalQuality AssessmentComputer VisionAdditive manufacturingVisual Question AnsweringMaterials Science+1

0 views

Multimodal & LLM

OpenBrush Baroque: 4,240 Baroque Art Images with Captions

4,240 Baroque-era artworks curated from the larger OpenBrush-75K collection. The subset focuses on the canonical Baroque visual language from approximately 1600 to 1750, characterized by chiaroscuro and dramatic lighting. It was created by jaddai and last updated on May 27, 2026.

MultimodalImage CaptionsArt HistoryFine ArtComputer VisionBaroque Art+1

0 views

Multimodal & LLM

UniCure: Multi-modal Datasets and Weights for Personalized Cancer Therapy Prediction

UniCure is a multi-modal framework integrating omics and chemical foundation models to predict transcriptomic drug responses. This repository contains the pre-processed datasets, configuration files, and pre-trained model weights required to reproduce the results. The archive is 12.4 GB and was last updated on 2026-04-23 by Zexi Chen.

MultimodalTranscriptomicsCancer TherapyBenchmarkMulti Modal AiOmics DataDrug Response+1

0 views

Multimodal & LLM

Verify-or-Trust: Benchmark Data for LLM Orchestration of a Biology Foundation Model

A benchmark dataset for evaluating whether a Large Language Model correctly allocates verification when orchestrating a fallible biology foundation model. The dataset, created by jang1563, includes a substrate table from the GEARS/Norman experiment. It was last updated on June 17, 2026.

TabularLlm BenchmarkBiology Foundation ModelBenchmarkPerturbation EffectVerification Trust+1

0 views

Multimodal & LLM

CRYSTAL: Diagnostic Benchmark for Multimodal Step-by-Step Reasoning

CRYSTAL is a diagnostic benchmark for evaluating multimodal reasoning step by step, not just by the final answer. Each instance pairs an image and a question with an ordered sequence of natural-language reference reasoning steps, enabling step-level metrics like Match F1 and Ordered Match F1 alongside answer accuracy. The dataset was created by author waybarrios and was last updated on the platform in June 2026.

MultimodalVision LanguageBenchmarkComputer VisionMultimodal ReasoningStep By Step EvaluationDiagnostic Benchmark+1

0 views

Multimodal & LLM

OpenBrush Van Gogh: 1,889 Artworks with Structured VLM Captions

1,889 images of Vincent van Gogh's works, curated from the larger OpenBrush-75K collection. All images are paired with structured captions generated by the Qwen3-VL-30B-A3B vision-language model. The dataset was created by jaddai and last updated on May 27, 2026.

ImageMultimodalArt HistoryPost ImpressionismImage CaptioningDigital Humanities+1

0 views

PreviousPage 6 of 96Next

Multimodal & LLM Datasets | DataSalon