DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,932 datasets

Multimodal & LLM

CanvasCraftSFT: Multimodal Tool-Use Trajectories for Image Creation and Editing

CanvasCraftSFT is a supervised fine-tuning subset of the CanvasCraft dataset introduced with the CanvasAgent research. It contains executable multimodal tool-use trajectories designed to teach agents to reason over user requests, call visual tools, and observe intermediate results for complex image tasks. The dataset was created by GML-FMGroup and was last updated on Hugging Face in May 2026.

MultimodalTool UseMultimodal AiComputer VisionSupervised Fine Tuning+1

0 views

Multimodal & LLM

ViMU: Benchmark for Video Metaphorical Understanding

ViMU is a benchmark for evaluating multimodal models on video metaphorical understanding. The code repository from the National University of Singapore contains evaluation scripts for four distinct tasks. The dataset page was last updated on 2026-05-16.

MultimodalMachine LearningVideo MetaphorAi EvaluationBenchmarkComputer VisionNatural Language ProcessingMultimodal Benchmark+1

0 views

Multimodal & LLM

HLE-MM-STEM: Multimodal STEM Benchmark for LLM Post-Training

TuringEnterprises created a multimodal STEM dataset designed to challenge state-of-the-art large language models. The dataset is described as high-value and empirically proven to push model capabilities beyond current limits. It was last updated on May 12, 2026.

MultimodalBenchmarkLlm EvaluationStem+1

0 views

Multimodal & LLM

SMODA: Multimodal Omics Data for Esophageal Cancer Classification

SMODA is a framework integrating multimodal omics data via heterogeneous transfer learning. The associated dataset likely contains molecular data used for disease classification and subtype discovery, as demonstrated on an esophageal cancer dataset. The framework was authored by Jinhui Zhao and last updated on April 10, 2026.

MultimodalExcelMultimodal OmicsHealthcareTransfer LearningDisease ClassificationEsophageal CancerSubtype Discovery+1

0 views

Multimodal & LLM

4am Finqna Filtered: 500 Financial Question-Answer Records

A refined JSONL version of the FinQNA dataset optimized for financial QA tasks. It contains 500 records extracted from a complex JSON source, each formatted as an independent JSON object for easy ingestion. The dataset was created by 3amthoughts and last updated on Hugging Face in May 2026.

TextFinancial QaQuestion AnsweringFinanceLlm Training+1

0 views

Multimodal & LLM

Case Report on Synchronous Lung and Thyroid Cancers with Multimodal Imaging

A medical case report PDF analyzing the management of recurrent thyroid cancer with concurrent pulmonary lesions. The document details a multimodal imaging approach using 18F-FDG PET/CT and 131I-NaI SPECT/CT to characterize metastatic disease. It was authored by Meng Yuan and last updated in April 2026.

MultimodalMultimodal AnalysisThyroid CancerMedical ImagingLung cancerHealthcarePathology+1

0 views

Multimodal & LLM

CompaRAG Tool Votes: Human Preference Data for MCP Tool Comparison

Human preference votes collected on the CompaRAG blind comparison platform for Model Context Protocol (MCP) tools. Users submitted a task and goal, voted for the best anonymous tool response, and the data was compiled by ArthurSrz. The dataset was last updated on May 16, 2026.

TabularComparison DataHuman PreferenceModel Context ProtocolLlm Evaluation+1

0 views

Multimodal & LLM

TinyKGI: Knowledge Gap Annotations for Visual Question Answering

Knowledge Gap (KG) annotations developed for the paper 'Identifying Knowledge Gaps on the Edge for Visual Question Answering'. The dataset supports research on identifying plausible cognitive capabilities that an AI model may lack. It was created by Sarikaa-Sridhar and was last updated on May 31, 2026.

MultimodalAi EvaluationKnowledge GapCognitive CapabilitiesVisual Question Answering+1

0 views

Multimodal & LLM

MultiTasks-v2: Twelve Multimodal Benchmarks in Image-QA Format

MultiTasks-v2 is a collection of twelve multimodal benchmark subsets normalized into a shared image-question-answer format. The dataset, created by LoserLi, provides a train split and a test split for each subset. It was last updated on 2026-05-28.

MultimodalAi EvaluationVision LanguageBenchmarkQuestion AnsweringComputer VisionMultimodal Benchmark+1

0 views

Multimodal & LLM

Multimodal Remote Sensing Images for Cross-Modal Object Detection

A 53.3 MB collection of TIF and DOCX files, this dataset supports research on object detection with incomplete multimodal remote sensing data. It was contributed by author Hongjun Ma and last updated in April 2026. The data was used to validate a proposed cross-modal contrastive learning and knowledge distillation method.

ImageGeospatialMultimodalContrastive LearningMultimodal LearningComputer VisionObject DetectionKnowledge Distillation+1

0 views

Multimodal & LLM

UFO Pursue Open Atlas: Declassified U.S. Government Documents with Images

161 records contain 4,153 pages of declassified U.S. Department of War documents on UFO/UAP phenomena, re-extracted into cleaned Markdown with inline image captions. The dataset includes per-page JPEG renders and interactive 3D atlas components, representing data derived from 80 years of declassified material. All data is released under a CC0 license by author alex-zhang42, with a version dated 2026-05-08.

MultimodalGovernment RecordsUfo UapComputer VisionDeclassified DocumentsHistorical Data+1

0 views

Multimodal & LLM

Thai Astrology PDF OCR Text and Image Captions

OCR text and image descriptions extracted from Thai Astrology PDF documents using the Gemma 4 31B multimodal model. The dataset includes columns for source PDF name, page number, and original page image. The dataset was created by Phonsiri and last updated on May 25, 2026.

MultimodalImage CaptionsOcr TextComputer VisionMultimodal ExtractionAstrology+1

0 views

Multimodal & LLM

Agent Attribution Practice: Knowledge Graph for AI Agent Accountability

A JSON-LD knowledge graph encoding the concept layer of the Agent Attribution Practice (AAP) research line. The dataset is a mirror of the graph.jsonld file from the AAP GitHub repository, provided for LLM training pipelines. It was authored by Shimo4228 and last updated on May 18, 2026.

GraphAccountabilityAi AgentsResearch LineArchitecture Decision Records+1

0 views

Multimodal & LLM

DisasterVQA: 1,395 Disaster Scene Images with Expert-Curated Questions

1,395 real-world disaster images and 4,405 expert-curated question–answer pairs covering floods, wildfires, and earthquakes. The dataset includes binary, multiple-choice, and open-ended questions for evaluating Vision-Language Models. It was created by QCRI and last updated in May 2026.

MultimodalMultimodal AiBenchmarkComputer VisionVisual Question AnsweringDisaster Response+1

0 views

Multimodal & LLM

CASTER-Bench: Human-Annotated Multimodal Benchmark for Community Resonance

CASTER-Bench is a human-annotated multimodal benchmark for Community-Aware Assessment of Social Textual Engagement and Resonance (CASTER). It was introduced by IndexTeam in a paper for ACL 2026 and is hosted on Hugging Face. The benchmark evaluates whether User-Generated Content achieves positive community resonance, moving beyond traditional aesthetic-focused Video Quality Assessment.

MultimodalUser Generated ContentBenchmarkHuman AnnotatedCommunity ResonanceSocial Media AnalysisMultimodal BenchmarkSynthetic+1

0 views

Multimodal & LLM

KOLongDoc: Korean Long Document Benchmark for Multimodal and RAG Evaluation

A benchmark dataset for evaluating AI models on Korean long and complex documents, created by Markr-AI. It contains 136 'Long Document Problems' and 64 'Super Long Document Problems', as described on the dataset page. The dataset was last updated on 2026-05-30.

TextMultimodalKorean LanguageRagMultimodal AiBenchmarkLong Document+1

0 views

Multimodal & LLM

Minecraft Crafting and Gameplay VQA Dataset in Russian and English

A bilingual, multimodal dataset designed for fine-tuning Vision-Language Models such as Qwen2.5-VL and Qwen3-VL. The dataset, created by KuroTo4ka, is structured by language locale and was last updated on 2026-05-19. It is intended to train AI models on in-game visual understanding tasks.

MultimodalCrafting RecipesVision Language ModelsBilingualComputer VisionMinecraftMultimodal Vqa+1

0 views

Multimodal & LLM

PRISM Gemini Distill: Multimodal Reasoning Data for Distribution Alignment

PRISM Gemini Distill is a self-distilled multimodal reasoning dataset collected from the Gemini 3 Flash model for the PRISM project. The dataset is intended to address distributional drift in the SFT to RLVR post-training pipeline by providing data for an intermediate Distribution Alignment stage. It was created by the prism-vlm organization and was last updated on Hugging Face in May 2026.

MultimodalDistribution AlignmentAi TrainingMultimodal ReasoningLlm Distillation+1

0 views

Multimodal & LLM

Causalphys: A Causal Reasoning VQA Dataset with 3,062 Questions

Causal-VL is a multimodal dataset for visual question answering focused on causal and physical reasoning. It contains 3,062 questions organized into 4 main categories and 16 subcategories. The dataset was created by author 'haorentang' and was last updated on May 22, 2026.

MultimodalMultimodal AiCausal ReasoningPhysics ReasoningVisual Question Answering+1

0 views

Multimodal & LLM

Northeast Recreational Fishing Economic Data from 1994

Revealed preference data on recreational angler behavior and trip valuation, collected by the National Oceanic and Atmospheric Administration. The dataset includes variables such as trip length, household income, and trip purpose, and is available in PDF and JSON formats. It was last updated on the platform in April 2026.

TabularJSONRecreational FishingNoaaEconomic ModelingFinanceRevealed Preference+1

0 views

PreviousPage 13 of 97Next