DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,947 datasets

Multimodal & LLM

LLM Fine-Tuning and Quantization Reference Dataset for Developers

AYI-NEDJIMI's dataset covers the open source large language model value chain from fine-tuning to production deployment. The description suggests it serves as a technical reference for mastering techniques like LoRA, QLoRA, DPO, RLHF, GPTQ, GGUF, and AWQ. Last updated on February 13, 2026, its specific content and scale require inspection via the linked Hugging Face page.

TextMachine LearningOpen Source ModelsModel QuantizationLlm Fine Tuning+1

0 views

Multimodal & LLM

Nanbeige4 3B Blindspots: 11 Examples of Model Weaknesses

February 21, 2026 marks the creation of this dataset by Willy08. It contains 11 carefully selected examples of blind spots discovered while experimenting with the Nanbeige/Nanbeige4-3B-Base model. The examples are deliberately diverse and target real weaknesses that even frontier models showed in 2026.

TextJSONModel BlindspotsLibrarypolarsAi SafetySize Categoriesn1 KModalitytextLibrarymlcroissantBenchmarkingLibrarydatasetsLibrarypandasLlm EvaluationRegionusLarge Scale+1

0 views

Multimodal & LLM

MMSI-Video-Bench: A Benchmark for Video-Based Spatial Intelligence

MMSI-Video-Bench is a holistic benchmark for evaluating spatial intelligence in video-based multimodal models. The dataset, created by author 'rbler', includes video clips and was last updated on February 10, 2026. It is hosted on Hugging Face and has been integrated into the VLMEvalKit framework.

VideoMultimodalSize Categories1 Kn10 KTask Categoriesmultiple ChoiceSpatial IntelligenceLanguageenTask Categoriesvisual Question AnsweringArxiv251210863Task Categoriesvideo Text To TextBenchmarkVideo BenchmarkMultiple ChoiceRegionusLicenseccVisual Question AnsweringMultimodal Evaluation+1

0 views

Multimodal & LLM

MMB: Counterfactual Visual Question Answering Images and Questions

A counterfactual VQA dataset constructed using CLEVR blender assets to procedurally generate both negative and normal counterfactual images and questions. The dataset was created by author 'scholo' for the Multimodal Benchmark paper and was last updated on Hugging Face in February 2026. It contains original images, counterfactual variants, and corresponding questions.

MultimodalBenchmarkComputer VisionCounterfactual ReasoningSynthetic DataMultimodal BenchmarkVisual Question Answering+1

0 views

Multimodal & LLM

M2DGR: Multi-Modal and Multi-Scenario Ground Robot SLAM Data

SJTU-ViSYS developed M2DGR, a multi-modal and multi-scenario dataset for ground robot navigation, published in RA-L 2021 and ICRA 2022. It provides synchronized sensor data across diverse environments to support Simultaneous Localization and Mapping (SLAM) research.

Robotics+1

0 views

Multimodal & LLM

RS-VLM-Checkpoints: Vision-Language Model Weights

A collection of model checkpoints for a vision-language model, published on Kaggle. The specific architecture, training data, and performance metrics are not detailed in the available metadata. The author, organization, and last update date are unknown.

MultimodalMachine LearningVision Language ModelsArtificial IntelligenceCheckpoints+1

0 views

Multimodal & LLM

Theory of Space: Pre-Rendered 3D Multi-Room Environments for VLM Benchmarking

Pre-rendered 3D multi-room environments support the Theory of Space benchmark for evaluating spatial reasoning in Vision Language Models. The dataset is designed to test whether foundation models can construct spatial beliefs through active exploration. It was created by MLL-Lab and last updated on February 11, 2026.

MultimodalActive ExplorationSpatial ReasoningBenchmark DataVision Language ModelsBenchmarkComputer Vision+1

0 views

Multimodal & LLM

Nexus-HH: RLHF-Enriched Preference Data for Language Models

A dataset titled 'nexus-hh-rlhf-enriched' published on Kaggle. The title suggests it contains data enriched for Reinforcement Learning from Human Feedback (RLHF), likely involving human preferences for language model outputs. Specific details on size, origin, and creation date are unavailable from the provided metadata.

TextTabularPreference DataLanguage ModelReinforcement LearningHuman Feedback+1

0 views

Multimodal & LLM

Power Grid Worker Safety Behavior and Risk Data

Kaggle hosts this dataset on power-grid worker safety behavior. The raw description indicates it contains multimodal data related to risk and standard operating procedure (SOP) operations. The dataset's author, organization, and specific scale are unknown.

MultimodalPower GridRisk assessmentSop OperationsWorker Safety+1

0 views

Multimodal & LLM

Multimodal Phishing Dataset

A dataset likely containing multiple data types related to phishing attacks. The dataset is published on Kaggle, but its specific contents, size, and creation details are not described. Further verification after download is required to confirm its scope and utility.

MultimodalMachine LearningCybersecurityPhishing+1

0 views

Multimodal & LLM

RadImgNet-VQA: Radiology Image Visual Question Answering Dataset

RadImgNet-VQA is a dataset hosted on Kaggle, likely designed for visual question answering tasks in the medical domain. The title suggests it contains pairs of radiology images and associated questions, potentially for training AI models to interpret medical scans. Its specific size, source, and creation date are not provided in the available metadata.

MultimodalMedical ImagingMultimodal AiRadiologyVisual Question Answering+1

0 views

Multimodal & LLM

VoMP: Volumetric Mechanical Properties of 3D Assets

NVIDIA's PhysicalAI dataset provides pre-processed 3D assets for predicting volumetric mechanical properties. The dataset combines four individual 3D asset collections, processed to include multi-view renders, voxelized representations, and LLM-annotated material descriptions. It was last updated on February 5, 2026.

Point CloudMultimodal3d AssetsMaterial PropertiesVoxelRoboticsRegionus+1

0 views

Multimodal & LLM

STRIDE-QA: Visual Question Answering for Autonomous Driving in Tokyo

Tokyo driving data provides a large-scale visual question answering dataset for physically grounded spatiotemporal reasoning. It contains 16 million question-answer pairs over 270,000 frames, constructed from 100 hours of multi-sensor driving data. The dataset was created by turing-motors and last updated on the platform in January 2026.

MultimodalSpatiotemporal ReasoningComputer VisionMulti Sensor DataLarge ScaleAutonomous DrivingVisual Question Answering+1

0 views

Multimodal & LLM

Multimodal Diet Dataset

Multimodal_Diet_Dataset is a dataset hosted on Kaggle. Its title suggests it contains data related to diet and nutrition, potentially combining multiple data types. Further details regarding its size, origin, and specific contents are unavailable from the provided metadata.

MultimodalMultimodal DataHealth TrackingDiet Nutrition+1

0 views

Multimodal & LLM

World Knowledge Benchmark for Multimodal Language Models

WorldVQA is a benchmark dataset created by MoonshotAI to evaluate atomic vision-centric world knowledge in Multimodal Large Language Models (MLLMs). It was last updated in February 2026. The dataset decouples visual knowledge retrieval from reasoning to provide a strict measurement of a model's fundamental world knowledge.

MultimodalAi EvaluationWorld KnowledgeBenchmarkVision Language EvaluationComputer VisionMultimodal Llm Benchmark+1

0 views

Multimodal & LLM

Blip3O 256: A Multimodal AI Benchmark Dataset

Blip3O 256 is a dataset authored by diffusion-bench and hosted on Hugging Face. The dataset was last updated on March 25, 2026. Its specific content and scale are not detailed in the available metadata.

MultimodalBenchmark DataVision LanguageMultimodal AiDiffusion Models+1

0 views

Multimodal & LLM

McNdroid: A Longitudinal Multimodal Benchmark for Android Malware Drift Detection

A longitudinal and multimodal benchmark for robust drift detection in Android malware. The dataset is hosted on Kaggle, but specific details on its size, creation date, and authorship are not provided in the available metadata. Its primary purpose is to serve as a testbed for evaluating the robustness of machine learning models against concept drift in the malware domain.

MultimodalMultimodal DataAndroid MalwareBenchmarkLongitudinal BenchmarkDrift Detection+1

0 views

Multimodal & LLM

Testing-Multimodal: Data for Multimodal Model Evaluation

Testing-multimodal is a dataset published on Kaggle. The title suggests it is intended for evaluating machine learning models that process multiple data types. The dataset's specific content, size, and origin are not detailed in the available metadata.

MultimodalMachine LearningTesting+1

0 views

Multimodal & LLM

SpaVis-6M: Spatially-Aware Multimodal Data for Computational Pathology

SpaVis-6M is a multimodal dataset for computational pathology, integrating visual and molecular data. It was created by minghaofdu and is associated with research presented at ICLR 2026. The dataset page was last updated on February 12, 2026.

MultimodalMedical ImagingMultimodal LearningComputational PathologyComputer VisionGenomics+1

0 views

Multimodal & LLM

MicroLens VQA: Microscopy Vision-Language Dataset

122,000 vision-question-answer pairs across more than 145 microscopy genera. The dataset likely contains images paired with textual questions and answers for visual question answering tasks. Published on Kaggle.

MultimodalMedical ImagingVision LanguageComputer VisionMicroscopyVqa+1

0 views

PreviousPage 44 of 97Next