DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,937 datasets

Multimodal & LLM

SmellNet: Real-World Smell Measurements from Low-Cost Gas Sensors

SmellNet is a comparatively large dataset for sensor-based machine olfaction. It contains real-world smell measurements collected from a compact array of low-cost metal-oxide gas sensors. The dataset was created by DeweiFeng and was last updated on April 13, 2026.

Time SeriesMultimodalMachine OlfactionGas SensorsMultimodal LearningChemistry Priors+1

0 views

Multimodal & LLM

VidLLVIP: Temporally and Spatially Aligned Infrared-Visible Video Pairs

VidLLVIP is an unofficial processed dataset derived from the raw LLVIP videos. The dataset provides temporally aligned, spatially registered, and quality-checked 5-second video pairs. It was created by user jianfeng0369 and last updated on Hugging Face in May 2026.

VideoMultimodalVisible VideoVideo FusionInfrared VideoMultimodal Video+1

0 views

Multimodal & LLM

OpenWatch: A Multimodal Benchmark for Hand Gesture Recognition

OpenWatch is a multimodal wrist-worn sensor dataset for hand gesture recognition. It captures 59 discrete hand gestures using a custom smartwatch equipped with photoplethysmography (PPG), accelerometer, and gyroscope sensors. The dataset was created by pietrobonazzi and was last updated on 2026-05-06.

MultimodalBenchmarkSmartwatch DataMultimodal SensorsHuman Computer Interaction+1

0 views

Multimodal & LLM

3D MRI Foundation Model Evaluation Cohort: Clinical Imaging Embeddings

John Garrett Dataverse provides a curated dataset of 3D MRI studies designed for evaluating foundation model embeddings. The dataset links patient demographics, study acquisition metadata, and series-level imaging parameters to precomputed 3D FM embeddings. It was last updated on May 5, 2026.

MultimodalFoundation ModelMedical ImagingBenchmarkHealthcareClinical EvaluationMriEmbeddings+1

0 views

Multimodal & LLM

MindTopo: Multimodal Benchmark for Topological Reasoning in Foundation Models

MLL-Lab created MindTopo, a benchmark dataset containing 8,910 procedurally generated examples across 13 environments and 5 categories. It probes whether foundation models reason about topological concepts like connectivity and knottedness rather than superficial visual cues. The dataset was last updated on May 7, 2026.

TabularMultimodalFoundation ModelsBenchmarkTopologyCognitive ScienceNeuroscienceMultimodal BenchmarkBrain MappingSynthetic+1

0 views

Multimodal & LLM

Falcon: A Large-Scale Multimodal Safety Benchmark for AI Models

Falcon is a large-scale dataset introduced to address the scarcity of multimodal safety evaluation benchmarks. It contains 71,000 multimodal samples across 13 harmful categories, including illegal activities and adversarial jailbreaking prompts. The dataset was created by author zhangrjjj and was last updated on Hugging Face in May 2026.

MultimodalMultimodal SafetyBenchmarkLarge ScaleAi Risk EvaluationLarge Language ModelsHarmful Content+1

0 views

Multimodal & LLM

Multimodal Intervention Trial Data for Children with Autism

Kazi Md Azman Hossain published a study protocol for a randomized controlled trial evaluating a five-component multimodal intervention on executive function in children with Autism Spectrum Disorder. The trial plans to enroll 130 children with ASD and 65 typically developing children as controls, with data collection spanning baseline, 12-week, and 24-week follow-up assessments. The protocol was registered in November 2025 and the dataset was last updated in March 2026.

TabularAudioExcelHttps Ctri18 Years DiagnosedEnc AmpAutism Spectrum DisorderMultimodal InterventionComponent Multimodal InterventionExecutive FunctionsWorking memoryBenchmarkImproving Executive FunctioningHealthcareTrial AimsTrail Making TestClinical TrialAdditional GroupTwo CentersWeek FollowLarge ScalePediatric HealthExperimental Group ReceivingImportant Evidence GapExecutive FunctionStudy ProtocolBehavioral InterventionClinical Trial RegistryXlink+1

0 views

Multimodal & LLM

R8 Calibration SFT: Fine-Tuning Data for Qwen3.5-9B-Derived Models

cudabenchmarktest created this dataset for fine-tuning models derived from the Qwen3.5-9B architecture. The description includes a critical warning about a required inference flag to prevent high empty-answer rates when serving models via Ollama. The dataset was last updated on April 15, 2026.

TextAi SafetyLlm Fine TuningInstruction FollowingModel Calibration+1

0 views

Multimodal & LLM

DoseRAD2026: Multimodal Radiotherapy Dose Calculation Dataset

DoseRAD2026 is a large-scale, multimodal dataset for radiotherapy research, created by LMUK-RADONC-PHYS-RES. It is designed to support the development and benchmarking of fast and accurate radiation dose calculation and prediction methods. The dataset was last updated on April 17, -2026.

TabularMultimodalTreatment PlanningMultimodal MedicalRadiotherapyMedical DosimetryLarge ScaleRadiation OncologySynthetic+1

0 views

Multimodal & LLM

Reader Interactions and Reviews with Preference Labels for AI Prediction

An audience preference dataset from Kaggle containing reader interactions, reviews, and preference labels for AI prediction. The dataset's author, organization, size, and temporal coverage are unknown. It is hosted on the Kaggle platform.

TabularReviewsAi PredictionAudience PreferenceReader Interactions+1

0 views

Multimodal & LLM

W-PVLMedSeg: Biomedical Image Segmentation Datasets

A collection of biomedical image segmentation datasets packaged for the W-PVLMedSeg project. The datasets are organized into train, validation, and test splits with corresponding image and label folders. The repository was created by DanRuguo and last updated on 2026-04-27.

ImageMultimodalBiomedical ImagingImage SegmentationComputer VisionMedical Datasets+1

0 views

Multimodal & LLM

MM-ODIR-129: A MultiModal Ophthalmology Dataset

MM-ODIR-129 is a multimodal subset of the ODIR-5K dataset, which contains ophthalmology data. The data has been verified by a professional ophthalmologist. The specific size, format, and temporal coverage of this subset are not detailed in the available metadata.

MultimodalMedical ImagingOphthalmologyMultimodal DataVerified Dataset+1

0 views

Multimodal & LLM

Sat Bbox Metadata Sft V1: Satellite Imagery Chips with Captions and Bounding Boxes

NuTonic/sat-bbox-metadata-sft-v1 is a metadata-first dataset built for training multimodal chat models. It likely contains Sentinel-2 satellite image chips paired with JSON metadata and optionally Mapbox stills. The dataset was created by NuTonic and last updated on April 28, 2026.

GeospatialMultimodalVlm TrainingSatellite ImageryGeospatial MetadataLand Cover+1

0 views

Multimodal & LLM

World2VLM: A Demo Dataset for Dynamic Spatial Reasoning

World2VLM is a demo dataset supporting the 2026 paper 'World2VLM: Distilling World Model Imagination into VLMs for Dynamic Spatial Reasoning' by Wanyue Zhang et al. The dataset is hosted on Hugging Face and was last updated on April 30, 2026. It is designed to address the challenge of enabling Vision-Language Models to perform dynamic spatial reasoning tasks.

MultimodalSpatial ReasoningVision Language ModelsArtificial IntelligenceWorld Models+1

0 views

Multimodal & LLM

Figma2Code: Multimodal UI Design-to-Code Benchmark

A multimodal design-to-code benchmark built from community Figma designs, integrating screenshots, structured metadata, and design assets. The dataset was created by xcodemind and last updated on April 29, 2026.

MultimodalBenchmarkComputer VisionCode GenerationUi DesignMultimodal Benchmark+1

0 views

Multimodal & LLM

Room Tour Video Frames Sampled at 3 Frames per Second

RoomTour3D provides video frames subsampled at 3 frames per second from YouTube room tour videos. The frames are downsampled with a shorter side of 360 pixels. The dataset was created by author 'roomtour3d' and was last updated on April 23, 2026.

ImageMultimodalRoom ToursEmbodied NavigationComputer VisionVideo Frames+1

0 views

Multimodal & LLM

Minimind V: Multimodal Vision-Language Dataset for Instruction Tuning

Minimind V Dataset is a multimodal collection for training vision-language models, assembled by jingyaogong from sources including Chinese-LLaVA-Vision, llava-en-zh-300k, and LLaVA-SFT-665K. It contains approximately 570,000 pre-training images and 965,000 instruction-following data points, with content in both English and Chinese. The dataset was last updated on Hugging Face on April 4, -2026.

MultimodalComputer VisionChinese NlpImage CaptioningMultimodal Vision Language+1

0 views

Multimodal & LLM

VisionFoundry-10K: 10,000 Synthetic VQA Samples for 10 Vision Tasks

VisionFoundry-10K is a synthetic visual question answering dataset containing 10,000 image-question-answer triples. The data was created by the VisionFoundry pipeline, which uses an LLM to generate task-aware content and a text-to-image model to synthesize images, with samples filtered by a multimodal verifier. It was authored by zlab-princeton and last updated on Hugging Face in April 2026.

MultimodalVision LanguageMultimodal AiComputer VisionSynthetic DataVisual Question AnsweringSynthetic+1

0 views

Multimodal & LLM

Sat Vl Sft Training Ready V1

A metadata-first, procedural VLM SFT dataset built from an existing 'sat-bbox' style dataset tree. The dataset, created by NuTonic, was last updated on 2026-04-30. It is designed to provide high-signal supervision for multimodal chat models, using Sentinel-2 satellite chips paired with JSON metadata and optional Mapbox stills.

GeospatialMultimodalSatellite ImageryMultimodal TrainingLand Cover+1

0 views

Multimodal & LLM

Agentic-MME: Benchmark for Multimodal Agent Tool-Use and Reasoning

Agentic-MME is an official benchmark dataset featured in Hugging Face Daily Papers. It is designed to evaluate multimodal agents in tool-use, web searching, and multi-step reasoning through visual clues. The dataset was created by Agentic-MME and last updated on April 11, -2026.

MultimodalAgent EvaluationTool UseBenchmarkWeb SearchMultimodal BenchmarkVisual Reasoning+1

0 views

PreviousPage 20 of 97Next