DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,947 datasets

Multimodal & LLM

LLM4Pricing: E-Commerce Product Pricing Analysis Dataset

3231 training and 809 test examples for large language model instruction tuning in e-commerce pricing scenarios. The dataset, created by jiayukk and last updated on 2026-02-25, contains target product names and reference market prices, requiring models to analyze pricing logic.

TextE CommerceLlm TrainingPricing AnalysisInstruction Tuning+1

0 views

Multimodal & LLM

Human Preference Rankings for 60 Vibe-Coded Web Applications

60 web applications from the Vibe Coding Showcase were evaluated in a pairwise human preference study. The dataset contains 1,770 pairwise comparisons, with 30 human votes collected for each pair to judge visual design based on screenshots. It was created by datapointai and last updated in March 2026.

MultimodalOPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsVisual DesignLibrarydaskTask Categoriesvisual Question AnsweringUi UxPairwise ComparisonVibe CodingWeb ApplicationsModalitytextHuman PreferenceLibrarymlcroissantModalityimageLibrarydatasetsHuman EvaluationLicensecc By 40Task Categoriesimage ClassificationRegionusWeb DesignDesign Preference+1

0 views

Multimodal & LLM

Full Body Motion Capture Data with Finger Dexterity Tracking

Motion capture data records full body and finger movements using 10 IMU sensors and a Phi9 Glove. The dataset was created by phi-9 and last updated in March 2026. It is released for non-commercial research under a CC-BY-NC-4.0 license.

MultimodalParquetSize Categories10 Kn100 KLibrarypolarsGMRRerunImuModalitytextTask CategoriesroboticsModalitytabularLibrarymlcroissantLibrarydatasetsLibrarypandasRoboticsMotion CaptureModalityvideoLicensecc By Nc 40RegionusHuman PoseQuaternionRetargetingGeneral-Motion-RetargetingMultimodal Sensors+1

0 views

Multimodal & LLM

OSWorld File Cache for Multimodal Agent Evaluation

OSWorld File Cache provides reliable access to evaluation files for the OSWorld project. The repository, created by xlangai, hosts files previously stored on Google Drive to support scalable, real computer environment testing. It was last updated in February 2026.

MultimodalEvaluation FilesComputer EnvironmentsBenchmarkArxiv240407972RegionusMultimodal AgentsLicenseapache 20Operating Systems+1

0 views

Multimodal & LLM

Wind Speed and Direction Measurements for Southern Florida Coastal Systems

Wind data collected at sites along Old Ingraham Highway near Flamingo, FL and C-111. The dataset includes date, time, wind speed, and direction, aimed at improving the treatment of wind forcing in hydrological models. It was collected by CEOS_EXTRA and last updated on 1997-12-31 23:59:59.999000.

Time SeriesWind DirectionWind SpeedField MeasurementHydrological ModelingCoastal Systems+1

0 views

Multimodal & LLM

Trento HSI-LiDAR: Multimodal Remote Sensing Data for Land Cover

Multimodal HSI-LiDAR dataset captures a combined Italian rural and urban scene. The data is annotated for 6 distinct land cover classes, supporting classification tasks. The dataset's author, organization, and specific collection details are not provided in the input metadata.

GeospatialPoint CloudMultimodalMultimodal DataLand CoverHyperspectral Imagery+1

0 views

Multimodal & LLM

ODA-Fin-RL-12K: 12,187 Verifiable Financial Reasoning Samples

OpenDataArena published ODA-Fin-RL-12K in March 2026, providing 12,187 hard-but-verifiable samples for reinforcement learning in the financial domain. The dataset focuses on complex reasoning tasks with concise answers optimized for automated reward modeling and distillation.

ParquetSize Categories10 Kn100 KTask Categoriestext GenerationLibrarypolarsLanguagezhTask Categoriesquestion AnsweringLanguageenDomain AdaptationModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasGRPORegionusReasoningReinforcement LearningFinanceArxiv260307223Licenseapache 20+1

0 views

Multimodal & LLM

Path VQA Turkish Final: Visual Question Answering Dataset

Path VQA Turkish Final is a dataset hosted on Kaggle. The title suggests it contains visual question answering data in the Turkish language, likely pairing images with questions and answers. The dataset's specific scale, origin, and update history are not detailed in the provided metadata.

MultimodalTurkish LanguageMultimodal AiComputer VisionVisual Question Answering+1

0 views

Multimodal & LLM

Synthetic Arduino Project Conversation Data for LLM Training

Synthetic conversation examples generated by a Java-based Arduino project suggestion system. The dataset, created by Cameron Jones, contains structured multiturn dialogues where a user interacts with a bot. It has no affiliation with the Arduino brand.

TextMicrocontrollerLlm TrainingArduinoSynthetic DataSynthetic+1

0 views

Multimodal & LLM

Text 2 Video Human Preferences: 29,283 Motion Quality Labels

29,283 pairwise human preference labels comparing human motion quality across four frontier video generation models, released by Datapoint AI in February 2026. The dataset captures evaluations from 4,349 unique annotators focusing on three specific quality dimensions of AI-generated video.

OPTIMIZED-PARQUETParquetLibrarypolarsTask Categoriesreinforcement LearningLanguageenSize Categoriesn1 KModalitytextHuman MotionModalitytabularLibrarymlcroissantModalityimageLibrarydatasetsLibrarypandasPreference DataLicensecc By 40Video GenerationHuman PreferencesRegionusTask Categoriestext To VideoTask Categoriesvideo Classification+1

0 views

Multimodal & LLM

ECG-Multimodal-Processed: Processed Electrocardiogram Data from CPSC 2018

Kaggle hosts this processed dataset derived from the China Physiological Signal Challenge 2018 (CPSC2018). The title indicates it contains electrocardiogram (ECG) data that has been processed and is multimodal in nature. The original CPSC2018 challenge focused on ECG signal classification and analysis.

MultimodalElectrocardiogramCpsc 2018Medical Signals+1

0 views

Multimodal & LLM

Image Caption Project: Paired Images and Text Descriptions

Image-caption-project is a dataset from Kaggle. Its title suggests it contains pairs of images and textual descriptions. The dataset's specific scale, origin, and update date are unknown.

MultimodalComputer VisionImage Captioning+1

0 views

Multimodal & LLM

Animals10: 10,000 Animal Images with Captions

Kaggle hosts a dataset titled 'animals10-10k-image-caption-dataset'. The dataset likely contains 10,000 images of animals paired with descriptive text captions. Its specific source, creation date, and author are unknown from the provided metadata.

MultimodalComputer VisionImage CaptioningAnimal Recognition+1

0 views

Multimodal & LLM

VLMSafe-420: 420 Multimodal Counterfactual Pairs for VLM Safety Circuits

VLMSafe-420 consists of 420 multimodal counterfactual pairs across 38 safety categories, developed by ArthT and updated in March 2026. The data is designed for mechanistic interpretability research to identify and analyze safety circuits within Vision-Language Models.

MultimodalOPTIMIZED-PARQUETParquetSafetyLibrarypolarsLibrarydaskLanguageenTask Categoriesvisual Question AnsweringSize Categoriesn1 KModalitytextLibrarymlcroissantModalityimageLibrarydatasetsCircuitsCompressionRegionusVlmTask Categoriestext ClassificationMechanistic InterpretabilityLicensemit+1

0 views

Multimodal & LLM

DeepVision-103K: 103,000 Verifiable Multimodal Math Reasoning Pairs

DeepVision-103K contains 103,000 multimodal records focused on verifiable mathematical reasoning, released by skylenage in February 2026. It utilizes image-text pairs to improve the efficiency of vision-language models in solving complex logic problems.

MultimodalOPTIMIZED-PARQUETParquetTask Categoriesimage Text To TextLibrarypolarsLanguageenModalitytextSize Categories100 Kn1 MLibrarymlcroissantModalityimageArxiv250718071LibrarydatasetsLibrarypandasArxiv260216742RegionusReasoningReinforcement LearningMathLicensemit+1

0 views

Multimodal & LLM

Multi-Turn Calendar Scheduling Conversations with Constraints

Nemotron Rl Instruction Following Calendar V2 is a multi-turn conversation dataset for understanding natural language scheduling constraints and inferring conflicts. It contains events with specific duration and timing constraints mentioned in random conversational order. The dataset was created by NVIDIA and last updated in March 2026.

JSONSize Categories1 Kn10 KLibrarypolarsModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasLicensecc By 40Regionus+1

0 views

Multimodal & LLM

Flickr8k Tamil Image Caption Dataset for Vision-Language Research

Flickr8k Tamil Image Caption Dataset provides Tamil language captions for images, intended for image captioning and vision-language research. The dataset's author, organization, size, and update history are not specified in the provided metadata. It is hosted on the Kaggle platform.

ImageTextVision LanguageComputer VisionImage CaptioningMultimodal Research+1

0 views

Multimodal & LLM

MM-IMDb: Movie Posters and Synopses for Genre Classification

MM-IMDb combines visual and textual data for movies, likely sourced from IMDb. The dataset is designed for multi-label genre classification tasks. Its author, organization, and exact size are unknown.

MultimodalMovie PostersText SynopsisMovie Genre+1

0 views

Multimodal & LLM

DGM4MultiModalDeepFake: A Multimodal Deepfake Detection Dataset

DGM4MultiModalDeepFake is a dataset hosted on Kaggle. The dataset's title suggests it contains multimodal data likely intended for deepfake detection research. The specific content, size, and origin are not detailed in the provided metadata.

MultimodalComputer VisionDeepfake DetectionAi Security+1

0 views

Multimodal & LLM

Personalized Multimodal LLM Framework For Long-Term Agent Memory

PersonaVLM is a dataset supporting the development of personalized multimodal agents with long-term memory capabilities. The framework was created by ClareNie and the associated paper was accepted for CVPR 2026. The dataset is hosted on Hugging Face and was last updated in March 2026.

MultimodalWEBDATASETSize Categories10 Kn100 KLibrarywebdatasetAi AgentModalitytextLibrarymlcroissantModalityimageLong Term MemoryLibrarydatasetsBenchmarkCvpr2026RegionusPersonality EvolvingCvprLicenseapache 20Personalized AiPersonalized Mllm+1

0 views

PreviousPage 37 of 97Next