DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,936 datasets

Multimodal & LLM

SWE-Zero Openhands Trajectories: 318k Agent Trajectories for Software Engineering LLMs

318,000 agent trajectories for instruction tuning of large language models in software engineering. The dataset was synthesized using the Qwen3-Coder-480B-A35B-Instruct model and collected via the OpenHands framework. NVIDIA authored the dataset, which was last updated on May 5, 2026.

TextSoftware EngineeringLlm Fine TuningAgent Trajectories+1

0 views

Multimodal & LLM

Wireless Biosensor Data from 26 Subjects During Breathing Tasks

26 subjects performed breath-holding, paced-breathing, and mild hypercapnia tasks while wearing a low-cost multimodal wearable. Thien Nguyen collected this 53.3 MB dataset, which is stored in MAT files and was last updated in May 2026. The data is intended to support research on vital signs and tissue oxygen saturation monitoring.

Time SeriesMultimodalWearable SensorsTissue OxygenationPhysiological MonitoringBreath HoldingBiosensor+1

0 views

Multimodal & LLM

Agent Knowledge Cycle: Knowledge Graph for AI Agent Behavior and Intent Alignment

A JSON-LD knowledge graph encoding the concept layer of the Agent Knowledge Cycle (AKC), a six-phase bidirectional growth loop for agent behavior and operator judgment. The dataset is a mirror of the graph.jsonld file from the AKC GitHub repository, provided for LLM training pipelines. It was uploaded by Shimo4228 and last updated on 2026-05-18.

GraphAi AgentsIntent AlignmentLlm TrainingAgent Knowledge Cycle+1

0 views

Multimodal & LLM

GuideDog: Egocentric Multimodal Dataset for Blind and Low-Vision Guidance

GuideDog is a real-world egocentric multimodal dataset for accessibility-aware guidance for blind and low-vision users. It contains 22,084 image-description pairs, including 2,106 human-verified gold and 19,978 VLM-generated silver annotations, collected from real walking videos across diverse cities. The dataset accompanies an ACL 2026 paper and includes derived multiple-choice subsets.

MultimodalComputer VisionAccessibilityEgocentric VisionAssistive TechnologySynthetic+1

0 views

Multimodal & LLM

Multimodal Imaging Biomarkers for Myofascial Pain Syndrome

A research dataset from Harvard Dataverse, last updated 2026-05-26, aiming to improve myofascial pain management. The project, led by Siddhartha Sikdar, develops imaging biomarkers to distinguish healthy and diseased soft tissues like muscle, connective tissue, nerves, and blood vessels. It compares tissue changes in individuals with myofascial pain to those without pain.

MultimodalMedical ImagingSoft TissueBiomarkersHealthcareMyofascial Pain+1

0 views

Multimodal & LLM

MIAO: Multimodal Image-Audio Onomatopoeia Dataset

MIAO is a multimodal dataset consisting of paired sound event clips and onomatopoeic images. It is designed to support research on multimodal correspondence between sounds and visual onomatopoeic expressions. The dataset was authored by KeisukeImoto and was last updated on 2026-05-19.

AudioMultimodalSound EventsAudio Image PairsMultimodal LearningComputer VisionOnomatopoeia+1

0 views

Multimodal & LLM

VNCultureVQA: Visual Question Answering Dataset for Vietnamese Culture

VNCultureVQA is a Visual Question Answering dataset focused on Vietnamese culture, containing images with corresponding question–answer pairs. The dataset is divided into train and test sets based on difficulty levels. It was created by multimedia-synergy-lab and was last updated on 2026-05-07.

MultimodalMultimodal AiComputer VisionNatural Language ProcessingVisual Question AnsweringVietnamese Culture+1

0 views

Multimodal & LLM

PIN-200M: A Knowledge-Intensive Dataset of Paired and Interleaved Multimodal Documents

PIN-200M contains approximately 200 million samples of paired and interleaved multimodal documents, requiring around 312 terabytes of storage. The dataset is a mini version of the PIN dataset introduced in a paper from June 2024. It was created by author m-a-p and last updated on Hugging Face in April 2026.

MultimodalPaired DocumentsKnowledge IntensiveInterleaved DocumentsMultimodal Documents+1

0 views

Multimodal & LLM

Svamp Rendered Vlm V1: Multimodal Vision-Language Data

Svamp Rendered Vlm V1 is a dataset published on HuggingFace by the author vlm-modality-research. The dataset was last updated on 2026-06-25. Its title suggests it contains rendered scenes likely intended for training or evaluating vision-language models.

MultimodalVision Language ModelsMultimodal AiRendered ScenesSvamp+1

0 views

Multimodal & LLM

Story Writing Dataset in ChatML Format for Instruction Tuning

PinkPixel's Story-Writing Dataset is a collection of creative writing stories based on the Writing Prompts ([WP]) format. The data is structured in ChatML format, making it suitable for instruction tuning of language models. The dataset was last updated on May 11, 2026.

TextChatmlStory GenerationCreative Writing+1

0 views

Multimodal & LLM

PCBA Standard-to-Real Challenge: Cross-Domain Visual Question Answering for Manufacturing

PCBA Standard-to-Real Challenge is the official dataset for the ACM Multimedia 2026 Grand Challenge. It focuses on cross-domain visual question answering for real-world manufacturing inspection. The dataset was created by author 'aimmifm' and was last updated on May 14, 2026.

MultimodalMultimodal AiComputer VisionManufacturing InspectionVisual Question Answering+1

0 views

Multimodal & LLM

PKU-SafeRLHF-RLHF: 37,000 Reward Model Training Examples

AIPlans provides a dataset of 37,022 text examples formatted for reinforcement learning from human feedback (RLHF). The dataset, derived from PKU-Alignment/PKU-SafeRLHF, includes 33,334 training and 3,688 test examples. It was last updated on 2026-05-04.

TextAlignmentText GenerationReinforcement LearningHuman FeedbackReward Model+1

0 views

Multimodal & LLM

FLAIR-HUB: Large-scale Multimodal Land Cover and Crop Map of France

Over 2,500 km² of diverse French ecoclimates and landscapes are covered by this large-scale, multi-sensor land-cover resource. It features 63 billion hand-annotated pixels across 19 land-cover and 23 crop type classes, building upon the FLAIR#1 and FLAIR#2 datasets. The dataset was created by IGNF and was last updated on the platform in April 2026.

GeospatialMultimodal🇫🇷 FranceSatellite ImageryCrop MappingLand CoverLarge Scale+1

0 views

Multimodal & LLM

Authorship Strategy: A Knowledge Graph for AI-Mediated Diffusion Research

A JSON-LD knowledge graph encoding the concept layer of the Authorship Strategy research line. The dataset is a mirror of a GitHub repository file, provided for LLM training and AI research tools. It was created by Shimo4228 and last updated on 2026-05-18.

GraphNormative FrameworkAi ResearchBenchmarkAuthorship Strategy+1

0 views

Multimodal & LLM

SWE-Hero Trajectories: 34k Agent Trajectories for Software Engineering LLM Fine-Tuning

34,000 agent trajectories were synthesized using the Qwen3-Coder-480B-A35B-Instruct model for supervised fine-tuning of software engineering agents. This dataset, created by NVIDIA, was collected using the OpenHands framework and last updated on May 5, 2026. It is designed to advance the capabilities of large language models in software engineering tasks.

TextSoftware EngineeringLlm Fine TuningAgent Trajectories+1

0 views

Multimodal & LLM

Xl Docbench

1,519 questions comprise this benchmark for evaluating long-context, multimodal, and cross-document understanding. The dataset, created by 'anonymous12123' and last updated in May 2026, includes benchmark questions, answers, public source URLs for documents, and human-annotated evidence pages and snippets. It contains 331 public document records in a separate file.

MultimodalDocument UnderstandingBenchmarkQuestion AnsweringLong Context+1

0 views

Multimodal & LLM

PRISM-CoT-new: Expanded Supervised Fine-Tuning Corpus for Vision-Language Model Safety

PRISM-CoT-new is an expanded supervised fine-tuning corpus for the PRISM Vision-Language Model safety alignment framework. It supersedes the original PRISM-CoT dataset for SFT use cases and was created by andyc03, with contributions from sources like prism-cot-orig and holisafe-bedrock. The dataset was last updated on May 14, 2026.

MultimodalVision Language ModelSafety AlignmentComputer VisionNatural Language ProcessingSupervised Fine TuningReasoning Corpus+1

0 views

Multimodal & LLM

Quasi-Experimental Study on Students' Speaking Performance

This dataset contains pretest and posttest speaking performance scores from a quasi-experimental study involving students. It is hosted on figshare and includes data collected to assess the impact of an instructional intervention on oral proficiency.

Pretest PosttestStudent PerformanceEducationQuasi ExperimentalSpeaking Skills+1

0 views

Multimodal & LLM

CMAP-Fusion Ablation Study Results on ChestX-ray14 Extended Dataset

Ablation study results for the CMAP-Fusion model on the ChestX-ray14 Extended Dataset. The data likely contains metrics comparing the impact of ViT-B/16, SmartTrim, and CMT modules on classification performance and efficiency. The dataset was authored by Chong Liu and last updated on April 24, 2026.

TabularExcelMedical ImagingMultimodal FusionPerformance MetricsModel Ablation+1

0 views

Multimodal & LLM

CMAP-Fusion Ablation Study Results for ISIC Skin Cancer Classification

Ablation study results for CMAP-Fusion on the ISIC Skin Cancer datasets. The dataset compares the impact of ViT-B/16, SmartTrim, and CMT modules on classification accuracy, F1 Score, AUC, Kappa, model parameters, FLOPs, feature sparsity, and cross-modal similarity. Chong Liu published the dataset on figshare in April 2026.

TabularExcelMedical ImagingMultimodal LearningComputer VisionSkin CancerModel Ablation+1

0 views

PreviousPage 15 of 96Next

Multimodal & LLM Datasets | DataSalon