DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,947 datasets

Multimodal & LLM

OptimusKG: A Modern Biomedical Multimodal Label Property Graph

OptimusKG is a modern biomedical multimodal Label Property Graph (LPG). The dataset was authored by Lucas Vittor and is hosted on the Harvard Dataverse platform, with a last recorded update on April 14, -2026.

GraphMultimodalLabel Property GraphBiomedical+1

1 views

Multimodal & LLM

Zoo-Bus VQA: Synthetic Visual Question Answering for Spatial Reasoning

Zoo-Bus VQA is a synthetic visual question answering dataset built for spatial reasoning and object-centric grounding. It contains generated scenes with benches, stop signs, people, animals, and a clock object representing a bus. The dataset was created by author aprilavrilivan and last updated on March 22, 2026.

MultimodalParquetSize Categories10 Kn100 KLibrarypolarsLibrarydaskSpatial ReasoningModalitytextLibrarymlcroissantModalityimageLibrarydatasetsBenchmarkComputer VisionObject GroundingRegionusSynthetic DataVisual Question AnsweringSynthetic+1

0 views

Multimodal & LLM

Turkish Reasoning Dataset for Large Language Models

Opus-4.6 Reasoning 3000x filtered dataset provides a Turkish translation of reasoning data for LLM training. The dataset is created by Chan-Y to support instruction-following and alignment tasks in Turkish. It was last updated on March 22, 2026.

OPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasLanguagetrRegionus+1

0 views

Multimodal & LLM

MR-RATE: Voxel-Wise Brain MRI Segmentation Maps

MR-RATE-vista-seg contains voxel-wise multi-label brain segmentation maps predicted for center modality brain MRI volumes in native space. The dataset is part of the MR-RATE vision-language foundation model release by author Forithmus, with a last recorded update in March 2026. It is hosted on Hugging Face and includes platform tags for healthcare, radiology, and multimodal tasks.

MultimodalSize Categories10 Kn100 KHuggingscienceTask Categoriesquestion AnsweringTask Categoriesimage To TextMr RateLanguageenTask Categoriesvisual Question AnsweringLicensecc By Nc Sa 40Medical ImagingTask Categoriestext To ImageVision Language3d SegmentationHealthcareTask Categoriesimage ClassificationComputer VisionBrain MriTask Categorieszero Shot ClassificationRegionus3d-medical-imagingScienceRadiologyMedical+1

0 views

Multimodal & LLM

Cadastral-Grounded Geospatial Dataset for Fine-Grained Spatial Understanding

GroundSet is a large-scale Earth Observation dataset built on 20 cm resolution optical aerial orthophotos and legally verified cadastral vector data from the French national mapping agency (IGN). It is designed to advance fine-grained spatial understanding for multimodal models. The dataset was created by RogerFerrod and was last updated in March 2026.

ImageGeospatialMultimodalTextTask Categoriestext GenerationTask Categoriesobject DetectionVision Language ModelLanguageenTask Categoriesvisual Question AnsweringVector DataSpatial UnderstandingModalitytextSize Categories100 Kn1 MTask Categorieszero Shot Image ClassificationModalityimageLibrarydatasetsArxiv260314609Licensecc By 40Task Categoriesimage SegmentationModalitygeospatialEarth ObservationRegionusLarge ScaleCadastral Data+1

0 views

Multimodal & LLM

VQATREC-Count-Anything: A Multimodal Dataset for Object Counting

A dataset titled 'vqatrec-count-anything' is hosted on Kaggle. The dataset's title suggests a focus on visual question answering and object counting tasks. Its specific content, size, and authorship are not detailed in the available metadata.

MultimodalMultimodal AiComputer VisionObject Counting+1

0 views

Multimodal & LLM

VQATRec-Assets-Small: Visual Question Answering and Text Recognition Assets

VQATRec-Assets-Small is a dataset hosted on Kaggle. Its title suggests it contains assets for visual question answering or text recognition tasks. The dataset's specific content, size, and structure are not detailed in the available metadata.

MultimodalAssetsComputer VisionVqa+1

0 views

Multimodal & LLM

vqatrec-test-nogt: Visual Question Answering Test Set

vqatrec-test-nogt is a dataset hosted on Kaggle. Its title suggests a focus on visual question answering, likely containing test data for model evaluation. The dataset's specific content, size, and origin are not detailed in the available metadata.

MultimodalTestRecognitionVqa+1

0 views

Multimodal & LLM

Benchmark-LGBioVLM: A Biomedical Vision-Language Model Benchmark

Benchmark-LGBioVLM appears to be a benchmark dataset for evaluating large language models on biomedical vision-language tasks. The dataset is hosted on Kaggle, but its specific contents, size, and creation details are not provided in the available metadata. Further details about the data volume, creators, and creation date require verification after download.

MultimodalVision LanguageEvaluationBenchmarkLarge Language ModelBiomedical+1

0 views

Multimodal & LLM

Anthropic HH RLHF: Human Preference Data for AI Alignment

Anthropic Hh Rlhf Preprocessed is a dataset published on huggingface by TheHassanSaud. The title suggests it contains preprocessed data from Anthropic's 'HH' (Helpful and Harmless) project, likely used for Reinforcement Learning from Human Feedback (RLHF). The dataset was last updated on 2026-04-24 18:40:45.

TextRlhfAi SafetyText GenerationPreference Data+1

0 views

Multimodal & LLM

VQA: Visual Question Answering Dataset

Kaggle hosts a dataset for Visual Question Answering (VQA), a multimodal task combining computer vision and natural language processing. The dataset likely contains images paired with questions and corresponding answer annotations. Published on Kaggle, its specific size, creator, and update date are unknown.

MultimodalComputer VisionNatural Language ProcessingVisual Question Answering+1

0 views

Multimodal & LLM

PersonaVLM Framework for Long-Term Personalized Multimodal Agents

PersonaVLM is a framework for transforming general-purpose multimodal large language models into personalized assistants. The work, authored by ClareNie, was accepted for presentation at CVPR 2026.

MultimodalIMAGEFOLDERSize Categories1 Kn10 KAi AgentLibrarymlcroissantModalityimageLong Term MemoryLibrarydatasetsBenchmarkCvpr2026RegionusPersonality EvolvingPersonalized Multimodal LlmLicenseapache 20Personalized MllmMultimodal Benchmark+1

0 views

Multimodal & LLM

Assamese Tokenized Text Dataset for LLM Training

Pre-tokenized `.bin` shards for efficient Assamese large language model training. The dataset is hosted on Kaggle, but the author, organization, and specific scale are unknown. The last update date is also unknown.

TextTokenized TextLlm TrainingAssamese LanguagePreprocessed Data+1

0 views

Multimodal & LLM

Nemotron-RL-Super: Training Blends for Nemotron-3-Super-120B-A12B

NVIDIA released this collection of dataset blends in March 2026 to document the specific data mixtures used for Reinforcement Learning (RL) training of the Nemotron-3-Super-120B-A12B model. The data is organized into six distinct training stages including Reinforcement Learning from Verifiable Rewards (RLVR), Software Engineering (SWE), and Reinforcement Learning from Human Feedback (RLHF).

Licensecc By 40Regionus+1

0 views

Multimodal & LLM

vlmDatacalibC: Vision-Language Model Calibration Data

Kaggle hosts this dataset titled 'vlmDatacalibC'. The dataset likely contains data for calibrating vision-language models. Its specific contents, size, and creation details are not provided in the available metadata.

MultimodalCalibrationVision Language ModelsMultimodal Data+1

0 views

Multimodal & LLM

vlmdataCalibCFull: Vision-Language Model Calibration Data

A dataset titled 'vlmdataCalibCFull' published on Kaggle. The name suggests it is likely related to calibration for vision-language models. No further details on size, origin, or specific content are available from the provided metadata.

MultimodalCalibrationVision Language ModelComputer Vision+1

0 views

Multimodal & LLM

BD-HazardVLM-100-Pilot: A Vision-Language Benchmark for Hazard Detection

BD-HazardVLM-100-Pilot is a dataset hosted on Kaggle, likely designed to evaluate vision-language models on hazard recognition tasks. The dataset's specific content, size, and collection details are not provided in the available metadata. Its title suggests it may contain a pilot collection of 100 multimodal samples for benchmarking.

MultimodalVision Language ModelHazard DetectionMultimodal Benchmark+1

0 views

Multimodal & LLM

FusionX Multimodal Sample Data: Synchronized Vision and Tactile Glove Data

A multimodal dataset from HuggingFace provides synchronized vision and tactile glove sensor data across distinct tasks. The dataset includes RGB video at 30 Hz and 720p resolution, lossless 16-bit depth streams, monochrome camera views, and per-frame aligned tactile data in Parquet format. It was created by touchtronix and last updated on March 16, 2026.

MultimodalIMAGEFOLDERSize Categories1 Kn10 KVisionTouch TronixLibrarymlcroissantModalityimageLibrarydatasetsLicensecc By 40RoboticsComputer VisionRegionusSensor DataTactile SensingFusion X+1

0 views

Multimodal & LLM

DECO-50: 50 Hours of Bimanual Dexterous Robot Manipulation with Tactile Sensing

DECO-50 comprises over 5 million frames of teleoperated data for bimanual dexterous manipulation with tactile sensing. The dataset includes 50 hours of data collected on real dual-arm robots across 4 scenarios and 28 subtasks. It was created by BAAI-Humanoid and was last updated on Hugging Face in February 2026.

MultimodalBimanual ManipulationMultimodal AiRoboticsTeleoperationLarge ScaleTactile Sensing+1

0 views

Multimodal & LLM

Protein Function Training Data with GO Annotations and Interactions

Training corpus for GO-GPT, an autoregressive transformer model for Gene Ontology term prediction. It contains proteins annotated with GO terms, InterPro domains, STRING protein-protein interactions, and metadata sourced from UniProt.

OPTIMIZED-PARQUETParquetLibrarypolarsLanguageenModalitytextSize Categories100 Kn1 MLibrarymlcroissantBiologyLibrarydatasetsLibrarypandasBioinformaticsRegionusLicenseapache 20ProteinGene Ontology+1

0 views

PreviousPage 34 of 97Next