DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,932 datasets

Multimodal & LLM

Northeast Recreational Fishing Economic Data from 1994

Revealed preference data on recreational angler behavior and trip valuation, collected by the National Oceanic and Atmospheric Administration. The dataset includes variables such as trip length, household income, and trip purpose, and is available in PDF and JSON formats. It was last updated on the platform in April 2026.

TabularJSONRecreational FishingNoaaEconomic ModelingFinanceRevealed Preference+1

0 views

Multimodal & LLM

WorldMemArena: A Large-Scale Multimodal Memory Benchmark for AI Systems

WorldMemArena is a large-scale multimodal benchmark designed to evaluate AI system memory across extended, multi-session interactions. It contains over 400 sessions, 16,000 conversational turns, and thousands of images and memory points across different interaction modes. The dataset was created by LCZZZZ and was last updated on Hugging Face in May 2026.

MultimodalReal World ScenariosAi SystemsBenchmarkMemory EvaluationLarge ScaleMultimodal Benchmark+1

0 views

Multimodal & LLM

Multimodal Driver Response Data with Facial Video and EEG

EmoRoad is a multimodal dataset containing psychological, physiological, and behavioral responses collected from human subjects in diverse driving scenarios. The dataset includes identifiable facial video recordings, EEG signals, eye tracking, and other measures, requiring a signed Data Usage Agreement for controlled access. It was published by RCFCM Hong Kong and last updated in April 2026.

Time SeriesMultimodalDriver BehaviorPhysiological SignalsAffective ComputingMultimodal FusionHuman Factors+1

0 views

Multimodal & LLM

EmoRoad Multimodal Dataset of Driver Responses Under Controlled Access

EmoRoad is a multimodal dataset containing psychological, physiological, and behavioral data from human subjects in diverse driving scenarios. The dataset includes identifiable facial video recordings, EEG signals, and other measures, requiring a Data Usage Agreement for access. It was created by Stephen Jia Wang and made available in 2026.

Time SeriesMultimodalDriver BehaviorPhysiological SignalsAffective ComputingMultimodal FusionHuman Factors+1

0 views

Multimodal & LLM

CapRL-Video-QA-20K: 20,000 Video Question-Answer Pairs

A subset of 20,000 video question-answer pairs from the LLaVA-Video-178K dataset, hosted by internlm. The dataset was last updated on 2026-05-22. It provides relative file paths to video clips, likely intended for training or evaluating multimodal AI models.

MultimodalMultimodal QaVideo CaptioningLlm TrainingVideo Qa+1

0 views

Multimodal & LLM

CL Vista: Multimodal Continual Instruction Tuning Benchmark

MCITlib is a unified library and benchmark for continual instruction tuning of multimodal large language models. The dataset, hosted by MLLM-CL, was last updated on May 18, 2026. It integrates diverse continual learning methods into a single framework.

MultimodalMultimodal LlmBenchmarkContinual LearningInstruction Tuning+1

0 views

Multimodal & LLM

Trendyol Cybersecurity Instruction Tuning Dataset with 53,202 Examples

53,202 instruction-tuning examples were curated by the Trendyol Security Team for training defensive security AI assistants. The dataset covers over 200 specialized cybersecurity domains, including cloud-native threats, AI/ML security, and quantum computing risks. It was expanded from an earlier version of 21,000 rows and last updated on May 17, 2026.

TextCybersecurityAi AssistantsDefensive Security+1

0 views

Multimodal & LLM

Aesthetic Image Captions 10K for Diffusion Model Training

A curated set of 10,000 high-resolution image-caption pairs for training and research. Images were sourced from Pexels, and captions were generated with JoyCaption before being cleaned for use. The dataset was created by edwixx and was last updated on 2026-05-16.

MultimodalImage CaptionsGenerative AiDiffusion TrainingAesthetic ImagesComputer VisionSynthetic+1

0 views

Multimodal & LLM

Lithuanian Bus Stop Images with Detailed Lithuanian Captions

Autobusu Stoteles is a multimodal dataset containing 102 PNG screen images of bus stops. The dataset includes detailed Lithuanian captions for each image, likely intended for visual language model tasks. It was created by author dzeveckij and last updated on May 20, 2026.

ImageMultimodalMultilingualImage CaptionsTransportationVisual language modelBus Stops+1

0 views

Multimodal & LLM

Multimodal Fiber Material Recognition via FESEM, Raman, and NIR Spectroscopy

A multimodal dataset for material recognition, likely containing images and spectral data of fibers. Visual images were acquired using a field-emission scanning electron microscope (FESEM), Raman spectra using a Raman spectrometer, and near-infrared spectra using an NIR spectrometer. The dataset is 179.2 MB in size, authored by Weiqin zhu, and was last updated on 2026-05-01.

MultimodalZIPMaterial ScienceMultimodal DataMicroscopySpectroscopy+1

0 views

Multimodal & LLM

India-Centric Image–Text Pairs for OCR and Document-VLM Research

India-Centric Image–Text Pairs Dataset is a multilingual collection of document images paired with OCR transcriptions. It includes samples from 22 Indian languages, such as Bengali, Hindi, Kannada, Malayalam, Marathi, Sanskrit, Tamil, and Telugu. The dataset was created by MILA: MULTILINGUAL INDIC LANGUAGE ARCHIVE and last updated on 2026-05-07.

ImageTextMultilingualVision LanguageBenchmarkDocument ImagesComputer VisionOCR+1

0 views

Multimodal & LLM

OpenMedReason: Medical Visual Question Answering with Structured Reasoning Traces

OpenMedReason is a multimodal dataset for medical visual question answering (VQA) containing structured chain-of-thought reasoning traces. It includes 192,619 training examples and 1,500 test examples. The dataset was created by author 'neginb' and was last updated on the Hugging Face platform in May 2026.

MultimodalMedical ImagingChain Of ThoughtHealthcareComputer VisionMultimodal ReasoningMedical Vqa+1

0 views

Multimodal & LLM

MedCTA: A Benchmark for Clinical Tool Agents

MedCTA is a benchmark dataset for evaluating clinical tool agents, created by IVUL-KAUST. Each example contains a clinical image, a user query, a reference tool-use trajectory, and a ground-truth answer. The dataset was last updated on May 24, 2026.

MultimodalMedical ImagingMultimodal AiBenchmarkTool Use EvaluationHealthcareComputer VisionClinical Benchmark+1

0 views

Multimodal & LLM

PubMedVision-Alignment-VQA: Biomedical Visual Question Answering Dataset

PubMedVision-Alignment-VQA is a processed subset of the PubMedVision dataset, re-exported for easier downstream use. The dataset likely contains biomedical images paired with question-answer conversations, with single-image rows retained and multi-image rows removed. It was created by mtybilly and last updated on May 6, 2026.

MultimodalMedical VisionBiomedical ResearchMultimodal AiComputer VisionVisual Question Answering+1

0 views

Multimodal & LLM

Ha Multi Samples: Human Sensorimotor Intelligence Dataset

2,000+ hours of multimodal human sensorimotor data are collected weekly, making this the largest dataset of its kind. The dataset is produced by Human Archive, a project backed by Y Combinator and engineers from OpenAI, BAIR, SAIL, and other organizations. The dataset page was last updated on 2026-05-18.

MultimodalMultimodal DataAi TrainingHuman Sensorimotor Intelligence+1

0 views

Multimodal & LLM

NSD-VQA: Visual Question Answering Benchmark from Human fMRI Responses

NSD-VQA is a large-scale visual question answering benchmark for studying the decoding of visual and semantic information from human fMRI responses to natural images. It is built from the Natural Scenes Dataset (NSD) and provides automatically generated question-answer annotations grounded in NSD images. The dataset was created by mcosarinsky and was last updated on 2026-05-24.

MultimodalBenchmarkFmriLarge ScaleNeuroscienceMultimodal BenchmarkVisual Question AnsweringSynthetic+1

0 views

Multimodal & LLM

HumaniBench: 32,000+ Image-Question Pairs for Multimodal Model Evaluation

HumaniBench is a benchmark for evaluating large multimodal models using real-world, human-centric criteria. It consists of over 32,000 image-question pairs across seven tasks, including visual question answering, multilingual QA, and visual grounding. The dataset was created by the Vector Institute, with examples annotated using GPT-4o drafts and verified by experts.

MultimodalMultilingualAi EthicsModel EvaluationVision LanguageBenchmarkComputer VisionHuman Centric AiMultimodal Benchmark+1

0 views

Multimodal & LLM

Challenge Phase1 Dataset: Bimanual Robot Manipulation Trajectories

Posttraining-RFM-RSS2026 provides real-robot bimanual manipulation trajectories for three benchmark tasks. Data was collected on a bimanual YAM follower teleoperated by a GELLO leader arm, with timestamp-aligned frames across joint state and action. The dataset was released for the RSS 2026 Workshop & Challenge on Post-training for Robotics Foundation Models.

MultimodalReal Robot TrajectoriesBimanual ManipulationBenchmarkRoboticsTeleoperation+1

0 views

Multimodal & LLM

PRISM Public SFT Data: Multimodal Demonstrations for Model Initialization

PRISM Public SFT Data is a collection of public multimodal demonstrations used for supervised fine-tuning in the PRISM project. The project studies distributional drift in the post-training pipeline for large multimodal models. This dataset serves as the broad SFT initialization source before distribution alignment and RLVR stages.

MultimodalDistributional DriftMultimodal AiLarge ScaleLarge Language ModelsSupervised Fine Tuning+1

0 views

Multimodal & LLM

Multimodal Fatigue Profiling in Collegiate Male Runners

Yekai Wang's study integrates cardiopulmonary, neuromuscular, and biomechanical data from 20 healthy collegiate male athletes performing high-intensity treadmill exercise. The dataset includes gas exchange, heart rate, perceived exertion, sEMG metrics from four leg muscles, and plantar kinetic measures from in-shoe sensors. It provides a structured framework for analyzing fatigue-related adaptations beyond metabolic indicators.

Plantar KineticsMultimodal Fatigue AssessmentNeuromuscular AdaptationElectromyographyLocomotor Biomechanics+1

0 views

PreviousPage 14 of 97Next