DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,936 datasets

Multimodal & LLM

Quasi-Experimental Study on Students' Speaking Performance

This dataset contains pretest and posttest speaking performance scores from a quasi-experimental study involving students. It is hosted on figshare and includes data collected to assess the impact of an instructional intervention on oral proficiency.

Pretest PosttestStudent PerformanceEducationQuasi ExperimentalSpeaking Skills+1

0 views

Multimodal & LLM

CMAP-Fusion Ablation Study Results on ChestX-ray14 Extended Dataset

Ablation study results for the CMAP-Fusion model on the ChestX-ray14 Extended Dataset. The data likely contains metrics comparing the impact of ViT-B/16, SmartTrim, and CMT modules on classification performance and efficiency. The dataset was authored by Chong Liu and last updated on April 24, 2026.

TabularExcelMedical ImagingMultimodal FusionPerformance MetricsModel Ablation+1

0 views

Multimodal & LLM

CMAP-Fusion Ablation Study Results for ISIC Skin Cancer Classification

Ablation study results for CMAP-Fusion on the ISIC Skin Cancer datasets. The dataset compares the impact of ViT-B/16, SmartTrim, and CMT modules on classification accuracy, F1 Score, AUC, Kappa, model parameters, FLOPs, feature sparsity, and cross-modal similarity. Chong Liu published the dataset on figshare in April 2026.

TabularExcelMedical ImagingMultimodal LearningComputer VisionSkin CancerModel Ablation+1

0 views

Multimodal & LLM

<p>Ablation study results for CMAP-Fusion on the COVID-19 Radiography datasets: Comparison

Ablation study results for CMAP-Fusion on the COVID-19 Radiography datasets compare the impact of the ViT-B/16, SmartTrim, and CMT modules on classification accuracy, F1 Score, AUC, Kappa, model parameters, FLOPs, feature sparsity, and cross-modal similarity. The 5.5 KB Excel file was authored by Chong Liu and shared under a CC-BY-4.0 license on figshare in April 2026.

TabularExcelVision TransformersMedical ImagingModel EvaluationCovid 19Ablation Study+1

0 views

Multimodal & LLM

MITW-KYM: A Validated Multimodal Meme Interpretation Dataset

A dataset of 105 selected meme images validated for complex interpretation. Each item was selected by a human researcher and validated using two frontier multimodal LLMs. The dataset focuses on cases where meaning emerges through image-text interaction, pragmatic inference, cultural context, ambiguity, incongruity, or potential false-positive moderation risk.

MultimodalCultural ContextMeme InterpretationComputer VisionMultimodal MemesLlm Validation+1

0 views

Multimodal & LLM

TextSculptor Data: Image-Text Pairs for Scene Text Editing

TextSculptor Data contains two Parquet subsets for scene text editing tasks. The subsets include columns for text captions or prompts paired with images stored as embedded bytes. The dataset is associated with a research project and was last updated on 2026-05-21.

MultimodalScene Text EditingImage TextMultimodal TrainingComputer Vision+1

0 views

Multimodal & LLM

LLaVA-OneVision-2-Data: Training Corpus for Multimodal Video and Spatial Reasoning

Training data for the LLaVA-OneVision-2 family of multimodal models, covering large-scale video and spatial reasoning corpora used in mid-training. The dataset includes subsets like 'mid_training_video/60s_rest/' with 10,809 shards of approximately 60-second video clips and JSONL files containing captions for 30-second and 60-second clips. It was created by mvp-lab and last updated on May 6, 2026.

VideoMultimodalVision LanguageLlm Training DataMultimodal TrainingVideo CaptioningLarge Scale+1

0 views

Multimodal & LLM

4DThinker: Training Data for Dynamic Latent Mental Imagery in VLMs

Training data for the 4DThinker framework, which enables Vision Language Models to 'think with 4D' through dynamic latent mental imagery. The dataset includes approximately 38,000 samples for DIFT training and 37,000 samples for 4DRL training, built upon SpatialVID and DSR_Suite-Data. It was authored by jankin123 and last updated on May 11, 2026.

MultimodalTraining DataVision Language ModelsMultimodal LearningDynamic ImageryVideo Frames+1

0 views

Multimodal & LLM

MOTOR: A Multimodal Dataset for Two-Wheeler Rider Behavior

A multimodal dataset for understanding two-wheeler rider behavior, addressing a research gap in road safety. The dataset was created by varunpaturkar and presented at ICRA 2026. It was last updated on 2026-05-21.

MultimodalBehavior AnalysisRoad SafetyGlobal South+1

0 views

Multimodal & LLM

Studio Ghibli Character Images with BLIP2-Generated Captions

810 images of Studio Ghibli characters were collected from the official free-to-use gallery. The dataset includes custom captions generated for each image using the BLIP2 model. Author 9r4n4y uploaded it to Hugging Face, with a last recorded update in April 2026.

ImageMultimodalStudio GhibliAnimeImage Captioning+1

0 views

Multimodal & LLM

MoM-Augmented Dataset for Cyclic Olefin Copolymerization Performance Prediction

A multimodal dataset of 2,700 entries for predicting cyclic olefin copolymerization performance. The data and trained models were published by 俊杰姜 on figshare in April 2026. The repository includes files in CSV, PKL, and H5 formats totaling 233.0 MB.

MultimodalCSVHDF5Polymer ChemistryMachine LearningMultimodal DataCopolymerizationMaterials Science+1

0 views

Multimodal & LLM

FAVOR-Bench: Fine-Grained Video Motion Understanding Benchmark

FAVOR-Bench is a benchmark for fine-grained video motion understanding accepted by NeurIPS 2025. It spans both ego-centric and third-person perspectives and includes evaluation for close-ended QA and open-ended descriptive tasks. The dataset was released by the FAVOR-Bench organization in March 2025.

VideoMultimodalBenchmarkVideo UnderstandingFine Grained MotionMultimodal Evaluation+1

0 views

Multimodal & LLM

Supplemental Information: Video Captions

A 29.1 KB PDF file containing video captions, published by Caleb Anderson on figshare in May 2026. The dataset's specific content and scope are not detailed in the available metadata.

MultimodalMultimodal DataSupplementary MaterialVideo Captions+1

0 views

Multimodal & LLM

SeePhys Pro: Benchmark for Diagnosing Modality Transfer in Physics Reasoning

SeePhys Pro is a benchmark from a paper authored by Kun-Xiang, designed to diagnose modality transfer in multimodal physics reasoning. It evaluates the same underlying physics concepts across progressively more visual representations, making it useful for measuring whether a model grounds its reasoning in diagrams and images rather than text priors. The dataset was last updated on May 13, 2026.

MultimodalModality TransferAi BenchmarkBenchmarkMultimodal ReasoningPhysics ReasoningVisual Grounding+1

0 views

Multimodal & LLM

Children's Story Writing Dataset for Instruction Tuning

Creative short stories written for children help models learn child-friendly language and narrative instruction-following. The dataset is structured in ChatML format, making it suitable for instruction tuning. Authored by PinkPixel, it was last updated on May 11, 2026.

TextChildrens StoriesNatural Language ProcessingCreative Writing+1

0 views

Multimodal & LLM

Chronic Musculoskeletal Pain Treatment Success Predictors for 2,204 Patients

2204 individuals with chronic musculoskeletal pain underwent a 10-week interdisciplinary multimodal pain treatment, with success rates ranging from 28% to 52% across four different outcome measures. Michel GCAM Mertens externally validated and updated four prediction models using 63 demographic and patient-reported candidate predictors. The updated models, last shared in March 2026, demonstrated strong calibration and acceptable discrimination, with 'treatment control' emerging as the most consistent predictor across outcomes.

TabularChronic PainPatient Reported OutcomesMultimodal TreatmentClinical ValidationHealthcareDisabilityQuality of LifePredictionInterdisciplinary Multimodal Pain TreatmentChronic Musculoskeletal PainTreatment PredictionPatients Perspective Recovery+1

0 views

Multimodal & LLM

Med-HallMark: Medical Multimodal Hallucination Benchmark

Med-HallMark is a benchmark dataset containing 750 image-question pairs for evaluating hallucinations in medical vision-language models. It includes three task types: conventional hallucination detection (499 pairs), counterfactual prompt-induced hallucination (111 pairs), and confidence weakening hallucination (140 pairs). The dataset was created by MM-Hallu and last updated on April 30, 2026.

MultimodalHallucination BenchmarkMedical ImagingBenchmarkHealthcareComputer VisionMedical VqaMultimodal Evaluation+1

0 views

Multimodal & LLM

Chinese-English Code-Mixed Speech with Prosodic Annotations

A 3-hour collection of naturalistic Chinese-English code-mixed speech sourced from social media videos. The dataset includes dual-level annotations, featuring manual token-level labeling for prosodic analysis at switch boundaries. It was created by hafsamenaz1 and last updated on Hugging Face in April 2026.

AudioMultimodalChinese EnglishCode MixingMultimodal SpeechSpeech ProsodyPhonology+1

0 views

Multimodal & LLM

NWA 8171 Meteorite: Pyrite Trace Element Analysis

Data analysis files from a study of the martian meteorite Northwest Africa 8171. The work was conducted by researchers at the University of Toronto Department of Earth Sciences and the Pacific Northwest National Laboratory's Environmental Molecular Sciences Laboratory. The dataset was last updated on 2026-05-30.

MultimodalMultimodal AnalysisMartian RegolithMeteorite AnalysisTrace elementsGeochemistry+1

0 views

Multimodal & LLM

ShapeCodeBench Eval V1: 150 Synthetic Images for Multimodal Program Reconstruction

150 grayscale 512x512 PNG images form a frozen evaluation split for ShapeCodeBench. This synthetic benchmark tests if multimodal models can reconstruct executable drawing programs from rendered shape images, with 50 easy, 50 medium, and 50 hard examples. The dataset was created by author shivamk3r and last updated on Hugging Face in May 2026.

MultimodalProgram SynthesisBenchmarkComputer VisionSynthetic DataMultimodal BenchmarkSynthetic+1

0 views

PreviousPage 16 of 97Next