DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,938 datasets

Multimodal & LLM

Human-Centric Multimodal Animation Learning Data

A dataset focused on human-centric animation, likely containing multimodal data such as video, motion capture, or pose sequences. It is hosted on Kaggle, a platform for data science and machine learning projects. The specific data volume, collection method, and creator are not detailed in the available metadata.

MultimodalMachine LearningMultimodal LearningComputer VisionHuman Centric Animation+1

0 views

Multimodal & LLM

ChartNet-Bench: 3,807 Chart Images for Faithful Multimodal Understanding

ChartNet-Bench is a benchmark dataset containing 3,807 chart images for evaluating faithful multimodal chart understanding. It includes 2,000 synthetic charts and 1,807 real-world charts, all human-verified. The benchmark supports tasks like chart-to-CSV extraction, summarization, and hallucination detection.

MultimodalData ExtractionHallucination DetectionVision LanguageChart UnderstandingBenchmarkComputer VisionMultimodal BenchmarkSynthetic+1

0 views

Multimodal & LLM

PianoVAM: A Multimodal Piano Performance Dataset

PianoVAM v1.1 is a multimodal dataset containing piano performances, including video, audio, and MIDI data. The dataset was initially released for ISMIR 2025 and is maintained by the PianoVAM organization. The current version includes corrections for video-MIDI synchronization issues.

AudioMultimodalMultimodal DataMidiAudio VideoPiano Performance+1

0 views

Multimodal & LLM

ASVspoof-WavLM-Aug-Model: Audio Spoofing Detection Model

A machine learning model for audio spoofing detection, likely based on the WavLM architecture. It is published on the Kaggle platform. The dataset's specific size, creation date, and author are unknown.

AudioMultimodalMachine Learning ModelSpeech ProcessingAudio Spoofing DetectionDeepfake Audio+1

0 views

Multimodal & LLM

WavLM-Base: A Pre-Trained Speech Foundation Model

WavLM-Base is a pre-trained model for speech processing tasks, published on the Kaggle platform. Its specific architecture and training data details are not provided in the minimal metadata. The dataset likely contains model weights or related artifacts for audio representation learning.

AudioFoundation ModelMachine LearningAudio ModelSpeech Processing+1

0 views

Multimodal & LLM

Bordair Multimodal: 62,063 Labeled Prompt Injection Samples for AI Security

Bordair Multimodal Prompt Injection Dataset contains 62,063 labeled samples for training and evaluating prompt injection detectors. The dataset, created by Bordair and last updated in April 2026, includes 38,304 attack and 23,759 benign samples covering cross-modal, multi-turn, and evasion attack types. All samples are source-attributed to peer-reviewed papers or documented industry research and are labeled with an expected_detection flag.

MultimodalPrompt InjectionMultimodal AiAdversarial AttacksAi SecurityDetection Training+1

0 views

Multimodal & LLM

Mindsemantix Polycap: Image-Grounded Captions for Neuroscience

PolyCap is a dataset of image-grounded captions for the MindSemantix project. The dataset, created by author ziqiren, was last updated on HuggingFace on 2026-05-12. It contains caption files for subjects sub01, sub02, sub05, and sub07, with corresponding COCO captions referenced to be obtained from the separate NSD dataset.

MultimodalComputer VisionImage CaptioningNeuroscienceCOCO+1

0 views

Multimodal & LLM

sMyBP-C M-domain Protein Characterization with Pathogenic Mutation Impact

Aishwarya Iyer published multimodal characterization data for the cardiac muscle protein sMyBP-C M-domain on figshare in May 2026. The dataset likely contains structural, functional, and dynamic measurements of the protein. It specifically examines the impact of a novel pathogenic mutation.

MultimodalMultimodal AnalysisProtein StructureMolecular BiologyCardiac MusclePathogenic Mutation+1

0 views

Multimodal & LLM

MIMII Pump: Precomputed Acoustic Features for Machine Sound Analysis

MIMII Pump Precomputed Multimodal Features provide 1D and 2D Numpy array representations of an acoustic dataset focused on pump sounds. The dataset is hosted on Kaggle, but specific details about its size, origin, and update history are not provided in the available metadata. The precomputed format suggests it is derived from the original MIMII pump dataset for machine sound analysis.

AudioMultimodalAcoustic DataMachine SoundMultimodal FeaturesNumpy Format+1

0 views

Multimodal & LLM

Geminipro3.2 Max Distill God Seed 25K: Synthetic Dataset for LLM Distillation

WithinUsAI's 'Geminipro3.2 Max Distill God Seed 25K' is a dataset of 25,000 examples engineered for distilling large language models. The dataset aims to imbue base models with capabilities described as deep scientific reasoning, long-context understanding, and thoughtful calibration. It was last updated on the Hugging Face platform on April 23, 2026.

TextModel AlignmentSynthetic DataLlm Distillation+1

0 views

Multimodal & LLM

VLM_tinyclip: Vision-Language Model Dataset

Kaggle hosts a dataset titled VLM_tinyclip. The name suggests it relates to Vision-Language Models, specifically a smaller-scale implementation of the CLIP architecture. Its content likely contains paired image and text data for training or evaluating multimodal models. No further metadata is available.

VideoMultimodalVision Language Model+1

0 views

Multimodal & LLM

RvR Data: Image Refinement via Regeneration

Refinement via Regeneration (RvR) reformulates image refinement in unified multimodal models from an editing-based paradigm to a regeneration-based one. The dataset likely contains images and associated data for training or evaluating this novel framework. It was created by researchers from Tsinghua University and Tencent Hunyuan and was last updated on April 29, 2026.

MultimodalComputer VisionAi TrainingMultimodal ModelsImage Refinement+1

0 views

Multimodal & LLM

MMK12: Manually Collected Multimodal Math Reasoning Questions from the Real World

MMK12 is a manually collected multimodal mathematical reasoning dataset. All questions are sourced from the real world, and the dataset is designed to ensure answer authenticity. The dataset was created by FanqingM and was last updated on April 7, 2026.

MultimodalK12 EducationMathematicsQuestion AnsweringMultimodal Reasoning+1

0 views

Multimodal & LLM

OmniMedVQA-V2: Medical Visual Question Answering Benchmark Across 12 Modalities

OmniMedVQA-V2 is a large-scale medical visual question answering benchmark covering 12 imaging modalities and 5 clinical question types. The v2 release introduces 13 granular named configurations for modalities and question types, with train/test partitions following the Med-R1 standard. Images are sourced from the canonical foreverbeliever/OmniMedVQA release, with restricted-access images excluded.

MultimodalMedical ImagingClinical QuestionsBenchmarkHealthcareLarge ScaleMedical VqaMultimodal Benchmark+1

0 views

Multimodal & LLM

Trendyol Cybersecurity Instruction Tuning Dataset with 53,202 Examples

53,202 instruction-tuning examples for AI assistants, curated by the Trendyol Security Team. The dataset covers over 200 specialized cybersecurity domains, including cloud-native threats and AI/ML security. It was expanded from 21,000 to 53,000 rows and last updated on April 14, -2026.

TextCybersecurityAi AssistantDefensive Security+1

0 views

Multimodal & LLM

Strawberry Disease Images with Environmental Parameters and Variety Data

A multimodal dataset for strawberry disease detection contains image data, environmental parameters, and variety information. It was authored by Qin2006 and last updated on 2026-04-19. The dataset is intended for studying correlations between environmental factors and disease occurrence.

MultimodalHealthcareComputer VisionAgricultureStrawberry DiseaseMultimodal FusionEnvironmental Factors+1

0 views

Multimodal & LLM

Agentic-MME: Benchmark for Multimodal Agent Tool-Use and Reasoning

Agentic-MME is a benchmark dataset featured in Hugging Face Daily Papers. It is designed to evaluate multimodal agents in tool-use, web searching, and multi-step reasoning through visual clues. The dataset was created by author Crystal1047 and last updated on 2026-04 11.

MultimodalTool UseBenchmarkReasoningWeb SearchMultimodal Agents+1

0 views

Multimodal & LLM

Literature Gene References for Chronic Myelogenous Leukemia Therapy

Supplementary material from a 2026 computational study on targeted gene and drug therapy for chronic myelogenous leukemia. The dataset, authored by Margaret L. Lugin, provides a curated list of literature genes and their corresponding research papers. It is a small, focused collection supporting the analysis in the primary study.

TabularExcelComputational BiologyDrug TargetsLeukemia ResearchGene Literature+1

0 views

Multimodal & LLM

Sora100K: A Large-Scale Multimodal Video Dataset for AI Research

Sora100K is a large-scale multimodal video dataset submitted for the ACM MM 2026 Dataset Track. The dataset was created by ysicong and its record was last updated on April 9, 2026. Its specific size, structure, and content are detailed on its dedicated Hugging Face page.

VideoMultimodalSoraVideo GenerationLarge ScaleAi Generated ContentMultimodal Video+1

0 views

Multimodal & LLM

FLARE26-MLLM-3D: Multimodal Model Training for 3D Medical CT Scans

FLARE 2026 aims to train a single multimodal model for medical report generation and visual question answering. The dataset contains two subsets for abdomen and lung CT scans, sourced from projects like AMOS and RATE. It was created by FLARE-MedFM and last updated in April 2026.

MultimodalMedical ImagingReport GenerationMultimodal LlmHealthcareComputer VisionVision Qa3d Ct Scans+1

0 views

PreviousPage 23 of 97Next