DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,925 datasets

Multimodal & LLM

WEB-Dataset: 90 Everyday Bimanual Manipulation Tasks with Language Annotations

WorldEngineAI's WEB-Dataset is a large-scale, language-annotated real-robot bimanual manipulation dataset intended for post-training robotics foundation models. It spans 90 everyday manipulation tasks collected with a bimanual YAM follower arm teleoperated by a GELLO leader. The dataset records joint state, action, and three synchronized camera streams at 60 Hz.

MultimodalBimanual ManipulationRoboticsLanguage AnnotatedTeleoperationLarge ScaleReal Robot+1

0 views

Multimodal & LLM

DeepCaption-1K-Qwen35: 1,000 Image-Caption Pairs for Vision-Language Models

A curated collection of 1,000 image and caption pairs. Each sample pairs an image with a detailed natural language description, making it suitable for training and evaluating vision-language models. The dataset was created by prithivMLmods and was last updated on July 4, 2026.

MultimodalVision LanguageMultimodal LearningComputer VisionImage CaptioningNatural Language Processing+1

0 views

Multimodal & LLM

Experimental Data on Multimodal Chatbot Use for Digital Literacy in Elderly Women

A replication package from an experimental study evaluating a multimodal chatbot as a pedagogical mediator for digital literacy among elderly women. The dataset includes anonymized participant data, task completion time records, success rates, and axial networks built from transcripts. It was authored by AMANDA SALES and last updated on 2026-05-30.

TextTabularExcelElderly WomenDigital LiteracyChatbot InterventionExperimental Study+1

0 views

Multimodal & LLM

Certified Document QA: 6,000+ Span-Verified Claims for LLM Evaluation

SovNodeAI's Certified Document QA dataset contains over 6,000 rows of machine-checkable question-answer claims for verifying large language model outputs. Every claim includes a certificate allowing item-by-item re-verification, and the data includes filings newer than major model training cutoffs. The dataset also includes a free 127,000-token verified long-context task set and a frontier failure table comparing six models on 100 questions.

TextDocument QaLlm EvaluationCertified DataMachine Verification+1

0 views

Multimodal & LLM

AmalgaMatch: 187 Multimodal Microscopy Image Pairs for Materials Science

187 image pairs from the AmalgaMatch dataset, partitioned into six distinct matching tasks and 19 material subsets, facilitate evaluation of foundation models for multimodal image registration. Ali Riza Durmaz published this supplementary PDF in May 2026 under a CC-BY-4.0 license. The dataset covers metals, alloys, and ceramics imaged with diverse microscopy modalities, presenting challenges like limited mutual information and field-of-view ratios as low as 2%.

MultimodalBenchmarkComputer VisionMicroscopyMultimodal FusionMaterials Science+1

0 views

Multimodal & LLM

Multimodal-USElecDeb60To16: Audio-Enhanced Political Debates for Argument Mining

Multimodal USElecDeb60To16 provides audio features and synthetic speech for U.S. presidential debates from 1960 to 2016. The dataset was created to augment pre-trained language models for argumentation mining, as described in a 2023 EACL Findings paper. It offers a version with large audio files and a lighter version without them.

TextAudioMultimodalComputational LinguisticsArgumentation MiningAudio FeaturesPolitical DebatesSynthetic+1

0 views

Multimodal & LLM

Postpartum Depression and Sleep Quality Data from a High-Risk Parturient Study, 2022-2024

Supplementary file 1 from a retrospective cohort study by Hui Zhang, published on figshare in 2026. The data likely contains results from 82 high-risk parturients receiving a multimodal analgesic protocol and 79 historical controls, collected between January 2023 and December 2024. Outcomes include postpartum depression incidence, Edinburgh Postnatal Depression Scale scores, Pittsburgh Sleep Quality Index scores, and opioid consumption.

TabularPostpartum DepressionAnalgesiaSleep QualityCesarean DeliveryHealthcareClinical Study+1

0 views

Multimodal & LLM

TUM Uterusreport: Paired Uterine Pathology Images and Reports

Paired uterine whole-slide images and corresponding pathology reports for multimodal computational pathology research. The dataset was created by Zhengyang-TUM and is associated with a paper published on arXiv. The dataset page was last updated on July 17, 2026.

MultimodalPathology ReportsMedical ImagingComputational Pathology+1

0 views

Multimodal & LLM

Index Cards Eval: Digitized Catalog Metadata with Multimodal Extraction

Catalogue metadata extracted from digitised material with a multimodal model and human-reviewed. The dataset is authored by the NationalLibraryOfScotland and was last updated on July 16, 2026. It contains structured fields describing index cards, including headings, types, and cross-references.

GeospatialMultimodalLibrary ScienceMultimodal AnnotationHistorical CatalogDigital Humanities+1

0 views

Multimodal & LLM

CC3M Semantic Subset: WCAG 2.2-Compliant Russian Image Captions

A curated subset of 35,794 image-caption pairs from the Conceptual Captions dataset, re-annotated in Russian for accessibility. The data was processed through semantic clustering of 2,484 groups and re-annotated using teacher vision-language models. It was created by Pavel Mikheyev and last updated in May 2026.

MultimodalMultilingualZIPVision LanguageBenchmarkComputer VisionImage CaptioningAccessibilitySemantic Clustering+1

0 views

Multimodal & LLM

Multimodal Model for Predicting Hepatic Encephalopathy Risk Post-TIPS

A multimodal dataset was used to develop a predictive model for overt hepatic encephalopathy (OHE) within one year after a transjugular intrahepatic portosystemic shunt (TIPS) procedure. The study by Lin-Feng Zhou, last updated in May 2026, integrated manual CT imaging features, radiomics, and clinical data from 338 patients treated between November 2015 and January 2022. The combined model (Model MRC) demonstrated superior predictive performance with an AUC of 0.902.

MultimodalRadiomicsMedical PredictionHealthcareMultimodal ModelingClinical DataHepatic Encephalopathy+1

0 views

Multimodal & LLM

Multimodal Clinical and Imaging Data for Predicting Hepatic Encephalopathy Post-TIPS

Lin-Feng Zhou's dataset supports a study developing a multimodal model to predict overt hepatic encephalopathy (OHE) within one year after a transjugular intrahepatic portosystemic shunt (TIPS) procedure. The data includes manual CT features, radiomics, and clinical data from 338 patients treated between November 2015 and January 2022. The combined model (Model MRC) achieved an area under the ROC curve of 0.902.

MultimodalMedical ImagingPredictive ModelingHealthcareRadiologyClinical DataHepatic Encephalopathy+1

0 views

Multimodal & LLM

MODUS: Pixel-Aligned 15-Modality Dataset for Multimodal Training

MODUS is a large-scale dataset with pixel-aligned samples across 15 modalities. The dataset includes modalities covering appearance, geometry, structure, segmentation, detection, text, and learned features. It was created by epfl-vilab-modus and was last updated on July 5, 2026.

MultimodalMultimodal AlignmentComputer VisionLarge ScaleSegmentationDetectionPixel Aligned+1

0 views

Multimodal & LLM

Last-Mile Mail Delivery Optimization with Multimodal Green Transport

A PDF supplementary file describes a decision-support framework for optimizing last-mile mail delivery in Australian regional areas. The study integrates mail demand and GIS data with an optimization engine to coordinate van, walking, and cycling routes. The system reportedly achieved reductions of up to 21.67% in delivery time and 11.36% in CO₂ emissions compared to van-only operations.

TabularGeospatialSustainabilityVehicle RoutingOptimization+1

0 views

Multimodal & LLM

Indo-CXR-VQA: Indonesian Visual Question Answering for Chest X-Rays

Indonesian-language Visual Question Answering dataset derived from VinDr-CXR radiologist annotations. It contains 15,991 question–answer–reason triples grounded in annotated findings. The dataset was created by Softcase and was last updated on 2026-07-09.

MultimodalMedical ImagingChest X RayIndonesian LanguageRadiologyVisual Question Answering+1

0 views

Multimodal & LLM

SPHERE Challenge: Multimodal Sensor Data for Activity Recognition

The SPHERE Challenge dataset was created for a 2016 machine learning competition held in conjunction with ECML-PKDD. It provides multimodal sensor data intended for human activity recognition tasks. The data was authored by Niall Twomey and colleagues from the SPHERE research project.

MultimodalMachine Learning ChallengeActivity RecognitionMultimodal SensorHuman Activity+1

0 views

Multimodal & LLM

RESOURCE2SKILL: Executable Agent Skills from Multimodal Resources

A Microsoft dataset release for the RESOURCE2SKILL system, which distills human-created multimodal resources into reusable executable skills for software agents. The dataset was last updated on 2026-07-17 and includes structured skill entries for discovery and inspection. The project page, paper, and code are available via the provided links.

MultimodalSoftware AgentsSkill LibrariesAi AgentsMultimodal ResourcesExecutable Skills+1

0 views

Multimodal & LLM

Small-Molecule Natural Product Data for Foundation Model Pretraining and Bioactivity Tasks

A collection of datasets for training and evaluating machine learning models on small-molecule natural products. The data, totaling 128.0 MB, was compiled by Zhenming Liu from multiple public databases including COCONUT, NPASS, LOTUS, and MIBiG. The collection was last updated on 2026-04-30.

TabularCSVFoundation ModelBioactivityNatural productsSmall MoleculesCheminformatics+1

0 views

Multimodal & LLM

MMGist: A Multimodal Benchmark with 7,262 Samples

MMGist is a curated multimodal evaluation benchmark built from 18 widely used vision-language benchmarks. It contains 7,262 samples spanning seven capability dimensions and is designed to make LVLM evaluation more efficient, visually grounded, discriminative, and reliable. The dataset was authored by Winston-Yuan and last updated on June 29, 2026.

MultimodalAi EvaluationVision LanguageEvaluationBenchmarkComputer VisionMultimodal Benchmark+1

0 views

Multimodal & LLM

Contextual Decoupling in Color Preference: Multimodal Evidence from Spatial Evaluation

A multimodal dataset from a three-stage study examining color preference stability in spatial contexts. The data includes baseline preferences for ten Munsell hues, Preference and Comfort ratings, eye-tracking, and pupillometric data from a simulated makerspace environment, authored by Hourong Yu and last updated in May 2026. The dataset is shared under a CC-BY-4.0 license on figshare.

MultimodalMakerspaceBenchmarkEye TrackingMultimodal ResearchColor PreferenceSpatial EvaluationSynthetic+1

0 views

PreviousPage 3 of 96Next