DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,923 datasets

Multimodal & LLM

Endo-MedSAM: Pelvic MRI Dataset for Uterus Segmentation in Endometriosis

A pelvic MRI dataset of 74 subjects and 3,449 T2-weighted slices from two institutions for developing AI models for uterus segmentation in endometriosis. The dataset was used to fine-tune the Endo-MedSAM model, achieving mean 3D Dice scores of 0.81–0.88 with bounding-box prompts. The dataset was uploaded by Rawan AlSaad on figshare in May 2026.

ImageAi ModelMedical ImagingBenchmarkHealthcareComputer VisionEndometriosisPelvic MriSegmentation+1

0 views

Multimodal & LLM

XRPoseSync: Synchronized Pose Trajectories for EdgeXR Benchmarking

65 sessions across 422 segments with 4,849 files provide synchronized pose trajectories for EdgeXR and VR research. The dataset includes temporally and spatially aligned pose data captured at 500 Hz from SteamVR gaming sessions via OpenXR API readings and marker-based optical motion capture. Ziyu Zhong organized the data into a cleaner structure and prepared it for confidential peer review on Harvard Dataverse.

Time SeriesMultimodalVirtual RealityBenchmarkingEdge ComputingBenchmarkMotion CapturePose Prediction+1

0 views

Multimodal & LLM

Multimodal Gait Data from 17 Participants Across Indoor and Outdoor Environments

17 healthy participants (7 females, 10 males, aged 19–34) performed walking activities across diverse indoor and outdoor terrains. The dataset includes motion data from 7 inertial sensors, foot pressure from 96-point force sensors, and visual data from 3 front-facing cameras, all annotated with 16 locomotion state classes. Collected by Chen Wang and shared under a CC-BY-4.0 license, this 16.9 GB dataset was last updated on 2026-05-31.

Time SeriesMultimodalZIPGait AnalysisWearable SensorsMultimodal DataBiomechanicsHuman Locomotion+1

0 views

Multimodal & LLM

Multimodal Ultrasound Model for Predicting Breast Cancer Axillary Nodal Burden

Renjie Lu developed a multimodal model integrating tumor radiomics and lymph node morphology for predicting axillary nodal metastasis burden in breast cancer. The dataset includes information from 583 patients with pathologically confirmed breast cancer, split into training and testing cohorts. The model was last updated on June 4, 2026.

MultimodalRadiomicsUltrasoundMedical ImagingClinical PredictionBreast cancer+1

0 views

Multimodal & LLM

OpenCaption-UHD: 2,956 Ultra High Definition Images with Long-Form Captions

2,956 Ultra High Definition (UHD) image samples are paired with rich, long-form captions for vision-language research. The dataset, created by prithivMLmods, is designed for tasks like image understanding and dense captioning. It was last updated on July 15, 2026.

MultimodalVision LanguageMultimodal LearningComputer VisionImage CaptioningUltra High Definition+1

0 views

Multimodal & LLM

ABC-130k: The Largest Open Bimanual Robot Teleoperation Dataset

ABC-130k is a multimodal dataset of bimanual robot teleoperation episodes. It contains 134,806 episodes across 195 tasks, representing 3,553 hours of synchronized multi-camera video and robot telemetry. The dataset was created by Voxel51 and is hosted on Hugging Face.

VideoMultimodalRoboticsBimanual RoboticsTeleoperationRobot Telemetry+1

0 views

Multimodal & LLM

Rat Model Data on Ropivacaine-Loaded ReproGel for Postoperative Pain

A preclinical study by Hyo Jin Kim from Harvard Dataverse, last updated in 2026, evaluates a drug delivery system for pain management. The dataset likely contains results from 64 Sprague-Dawley rats across four treatment groups, measuring mechanical withdrawal thresholds and inflammatory cytokine levels over one week. It focuses on the analgesic and anti-inflammatory effects of combining ReproGel with ropivacaine 0.375%.

TabularInflammatory MarkersBenchmarkRodent ModelPostoperative PainAnalgesic EfficacyDrug Delivery System+1

0 views

Multimodal & LLM

A Multimodal Dataset on Territorial Intrusiveness in Mixed Reality: Perceived Intrusivenes

48 participants completed a within-subjects mixed reality experiment using a Meta Quest Pro headset. The dataset includes perceived intrusiveness, physiological arousal, embodied avoidance behaviors, and cognitive performance metrics, all timestamp-synchronized at the trial level. Authored by Yuxuan Li and hosted on Harvard Dataverse, it was last updated in July 2026.

MultimodalCognitive PerformancePhysiological DataMixed RealityProxemicsHuman Computer Interaction+1

0 views

Multimodal & LLM

OpenCaption-Unified-10K: 10,000 Images with Long-Form Synthetic Captions

10,000 images are paired with detailed, long-form captions generated by the Qwen3.5 multimodal model. The dataset is designed for dense image captioning, with descriptions focusing on scene composition, subject attributes, and spatial relationships. It was created by prithivMLmods and last updated on July 13, 2026.

MultimodalMultimodal AiComputer VisionImage CaptioningSynthetic DataSynthetic+1

0 views

Multimodal & LLM

Micro-OD: 252 Images for Few-Shot Cell Detection in Microscopy

Micro-OD is a benchmark of 252 images curated for in-context learning, with bounding-box annotations for 11 cell types across four sources. It was created by Shreyan Ganguly and last updated in May 2026. The dataset is designed to evaluate vision-language models for few-shot object detection in biomedical microscopy.

ImageMultimodalVision Language ModelsBiomedical ImagingBenchmarkFew Shot LearningComputer VisionMicroscopyObject Detection+1

0 views

Multimodal & LLM

VIABench: Video Benchmark for Visual Impairment Assistance

A video benchmark collected from blind individuals for evaluating AI assistance models. The dataset, created by MCG-NJU, is designed for tasks like Proactive Reminder and Visual Question Answering. It was last updated on 2026-07-17.

VideoMultimodalMultimodal AiBenchmarkVideo BenchmarkProactive ReminderVisual Impairment AssistanceVisual Question Answering+1

0 views

Multimodal & LLM

Foundation Model Performance for Multimodal Image Matching in Materials Science

A 2026 evaluation assesses the capabilities of foundation models like MatchAnything RoMa and ELoFTR for multimodal image matching in materials science. The analysis uses the AmalgaMatch dataset, which contains 187 image pairs across six distinct matching tasks and 19 different materials. The work was authored by Ali Riza Durmaz and is shared under a CC-BY-4.0 license.

TabularExcelMultimodal MatchingFoundation ModelsBenchmarkComputer VisionMicroscopyMaterials Science+1

0 views

Multimodal & LLM

Engram: Crowdsourced Typing Preference Data for Keyboard Layout Optimization

Crowdsourced typing preference data from a study that derived ergonomics objectives from user preferences. The dataset includes materials for the Engram approach to optimizing keyboard layouts for English and Spanish, created by Arno Klein and last updated in May 2026. It contains data, software, documentation, and layouts totaling 10.6 MB.

MultimodalExcelCrowdsourced DataKeyboard LayoutsOptimizationErgonomicsHuman Computer Interaction+1

0 views

Multimodal & LLM

TranNhiem Vietnamese Image-Text Reasoning: Multimodal Q&A with Chain-of-Thought

Vietnamese multimodal reasoning data featuring multi-turn visual question-answering grounded on natural images. Each answer includes an explicit chain-of-thought reasoning trace, synthesized by the Qwen3.5-397B-A17B model over images from the LAION-derived Vi-Laion-gemini-VQA set. The dataset was curated by Trần Nhiệm and last updated on 2026-07-17.

MultimodalChain Of ThoughtComputer VisionMultimodal ReasoningLarge ScaleVietnamese LanguageSynthetic DataVisual Question Answering+1

0 views

Multimodal & LLM

TranNhiem Vietnamese Document-Image Reasoning with Chain-of-Thought

Vietnamese document-image understanding with explicit reasoning chains for multi-turn question-answering. The dataset is based on scanned or rendered Vietnamese document pages such as textbooks, articles, and worksheets. It was curated by Trần Nhiệm and the reasoning and answers were synthesized by the Qwen3.5-397B-A17B model over the Viet-Doc-VQA-II document collection.

MultimodalDocument UnderstandingMultimodal AiComputer VisionReasoningVietnamese LanguageVisual Question Answering+1

0 views

Multimodal & LLM

VSI-Super-Wild: Benchmark for Spatial Supersensing in Long-Form Wild Videos

VSI-Super-Wild is a benchmark for evaluating multimodal models on spatial supersensing capabilities in long-form, in-the-wild videos. It was created by researchers from Tsinghua University, NVIDIA, and Stanford University for the ECCV 2026 conference. The dataset moves beyond short indoor clips and object-centric settings to study world state maintenance and prediction.

VideoMultimodalSpatial ReasoningMultimodal AiBenchmarkVideo BenchmarkComputer Vision+1

0 views

Multimodal & LLM

Multimodal Diffusion MRI Biomarkers for Mild Traumatic Brain Injury

Twenty individuals with mild traumatic brain injury and 24 healthy controls underwent advanced diffusion MRI and cognitive assessment. The data includes multi-shell DTI, free-water corrected DTI, diffusion kurtosis imaging, and NODDI metrics, linked to MoCA and GOS-E clinical scores. Authored by Maurizio Bergamino and shared under CC-BY-4.0, this dataset was last updated on May 28, 2026.

TabularDiffusion MriBiomarkersHealthcareNatural Language ProcessingNeuroimagingClinical AssessmentTraumatic brain injury+1

0 views

Multimodal & LLM

Predicting Ordinal Clinical Outcomes in At-Risk Mental States Using Multimodal Factors

Eighty-seven subjects with at-risk mental states (ARMS) were followed up, with clinical outcomes classified into four ordered categories. The dataset contains baseline measures for 15 explanatory variables, including clinical symptoms, cognitive functioning, and electrophysiological measures like P300 and mismatch negativity. The data was authored by Kazuya Nagasawa and last updated on 2026-05-28.

TabularClinical OutcomesMultimodal BiomarkersBenchmarkHealthcareElectrophysiologyPsychosis PredictionAt Risk Mental States+1

0 views

Multimodal & LLM

Cardiac-CT: Cardiac CT Segmentation and Phenotyping Dataset

AI-CVM's Cardiac-CT dataset accompanies a research paper on a unified framework for cardiac CT segmentation and phenotyping. The dataset was used for human-in-the-loop annotation, vision foundation model development, and multicenter evaluation. It was last updated on July 15, 2026.

ImageMultimodalCt ScansClinical ValidationBenchmarkHealthcareCardiac ImagingComputer VisionMedical Segmentation+1

0 views

Multimodal & LLM

WEB-Dataset: 90 Everyday Bimanual Manipulation Tasks with Language Annotations

WorldEngineAI's WEB-Dataset is a large-scale, language-annotated real-robot bimanual manipulation dataset intended for post-training robotics foundation models. It spans 90 everyday manipulation tasks collected with a bimanual YAM follower arm teleoperated by a GELLO leader. The dataset records joint state, action, and three synchronized camera streams at 60 Hz.

MultimodalBimanual ManipulationRoboticsLanguage AnnotatedTeleoperationLarge ScaleReal Robot+1

0 views

PreviousPage 2 of 96Next