DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,956 datasets

Multimodal & LLM

Qwen3.5-9B-VLM-Q4_K_M: A Quantized Multimodal AI Model

Qwen3.5-9B-VLM-Q4_K_M GGUF Model is a quantized version of a large language model with vision capabilities, published on Kaggle. The dataset likely contains the model weights and architecture files for deployment. Specific details on the model's training data, original authors, and last update date are not provided in the metadata.

MultimodalGgufMultimodal AiComputer VisionQuantized ModelLarge Language Model+1

0 views

Multimodal & LLM

Qwen3.5-9B-VLM-Q4_K_M: A Quantized Multimodal Large Language Model

A GGUF format model file for the Qwen3.5-9B-VLM, a 9-billion parameter multimodal large language model. The dataset includes the quantized model and a projection file (mmproj), likely enabling vision-language tasks. It was published on Kaggle, but the author, organization, and last update date are unknown.

MultimodalGgufMultimodal AiComputer VisionQuantized ModelLarge Language Model+1

0 views

Multimodal & LLM

DFlash_VLM: Vision-Language Model Dataset

A dataset for vision-language model tasks, published on Kaggle. The dataset's specific content, size, and creation details are not provided in the metadata. Further details require verification after download.

MultimodalVision Language ModelMultimodal AiComputer VisionNatural Language Processing+1

0 views

Multimodal & LLM

FSVQA Training: Visual Question Answering Dataset

FSVQA_Training is a dataset for visual question answering tasks, likely containing paired images and textual questions. It is hosted on Kaggle, a platform for open data and machine learning competitions. The dataset's specific content, size, and origin are not detailed in the available metadata.

MultimodalTraining DataVisual Question Answering+1

0 views

Multimodal & LLM

sEMG+pFMG Multimodal Gesture Data

sEMG+pFMG multimodal gesture data likely contains signals from surface electromyography and pressure-sensitive fiber myography sensors. The dataset is hosted on Kaggle, but specific details about its size, collection method, and origin are unknown. Users should verify the actual content and structure after download.

MultimodalBiomedical SignalsGesture RecognitionMultimodal SensorsHuman Activity+1

0 views

Multimodal & LLM

Cervical and Ovarian Pathology Foundation Model Features

Pathology foundation model features likely extracted from cervical and ovarian tissue images. The dataset is hosted on Kaggle, but its specific scale, creation details, and update history are not provided in the metadata. Columns and sample data are unknown, requiring download for full content verification.

MultimodalFoundation ModelMedical ImagingOvarian cancerCervical CancerPathology+1

0 views

Multimodal & LLM

Real-Time Multimodal Sensor Fusion for Pilot Fatigue Monitoring

A monitoring system for acute pilot fatigue is described, focusing on low-overhead, real-time sensor fusion. The dataset is hosted on Kaggle and is categorized for research purposes. Specific details on data volume, collection period, and authorship are not provided in the input.

MultimodalPilot FatigueResearchHuman FactorsSensor FusionReal-time monitoring+1

0 views

Multimodal & LLM

Nemotron Cascade RL: 108,938 Prompts for Instruction-Following Reinforcement Learning

NVIDIA's Nemotron-Cascade-RL-IF-RL dataset contains 108,938 samples designed for Instruction-Following Reinforcement Learning (IF-RL). The dataset includes prompts and associated metadata to improve language models' instruction-following capability and is ready for commercial use with attribution. It was last updated on December 16, III.

TextParquetLibrarypolarsTraining DataLanguageenPrompt EngineeringModalitytextSize Categories100 Kn1 MLibrarymlcroissantLibrarydatasetsLibrarypandasLanguage ModelRegionusReinforcement LearningLicenseodc ByInstruction Following+1

0 views

Multimodal & LLM

LLaVA Dataset: Vision-Language Instruction-Following Data

LLaVA_dataset is a dataset hosted on Kaggle. The dataset's title suggests it is related to the LLaVA (Large Language-and-Vision Assistant) project, which typically involves multimodal data for training vision-language models. The dataset likely contains image-text pairs or instruction-following examples, but its specific content, size, and origin require verification after download.

MultimodalVision LanguageMultimodal AiLlm Training+1

0 views

Multimodal & LLM

RadImageNet-VQA: 7.5 Million VQA Samples for CT and MRI Exams

RadImageNet-VQA contains 750,000 CT and MRI images paired with 7.5 million generated visual question answering samples and 750,000 medical captions. Developed by Raidium and updated in late 2025, the dataset is built upon expert-curated anatomical and pathological annotations from the RadImageNet corpus.

ParquetLibrarypolarsLibrarydaskSize Categories1 Mn10 MLanguageenTask Categoriesvisual Question AnsweringModalitytextLibrarymlcroissantModalityimageLibrarydatasetsRegionusLicenseapache 20Medical+1

0 views

Multimodal & LLM

YouTube Comedy Slam Preference Annotations

YouTube Comedy Slam Preference Data contains human judgments on comedy content from the YouTube platform. The dataset is hosted by the UCI Machine Learning Repository and is tagged for multimodal and LLM applications. Specific details on volume, creators, and recency are not provided.

TabularMultimodalHuman JudgmentComedy PreferenceSocial MediaMultimodal Llm+1

0 views

Multimodal & LLM

Document Understanding Multimodal Dataset

Multimodal data for document understanding tasks, sourced from the UCI Machine Learning Repository. The dataset combines visual and textual information for analysis. Specific details on volume, creation date, and authors are not provided in the available metadata.

MultimodalDocument UnderstandingMultimodal DataComputer VisionText Recognition+1

0 views

Multimodal & LLM

Nemo Instruction Following Chat Translate: Multilingual Text for LLM Training

Nemo Instruction Following Chat Translate is a text dataset published on Hugging Face by author pihull. The platform tags suggest it contains multilingual text formatted for instruction following and chat translation tasks, likely intended for large language model training. The dataset was last updated on February 11, 2026.

TextOPTIMIZED-PARQUETParquetSize Categories10 Kn100 KLibrarypolarsChat TranslationModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasRegionusLlm TrainingMultilingual TextInstruction FollowingText Corpus+1

0 views

Multimodal & LLM

VibraVerse: Geometry-Acoustics Alignment Dataset for Multimodal Learning

VibraVerse is a large-scale multimodal dataset designed to bridge 3D geometry, material physics, and acoustics. It explicitly encodes the causal chain from geometry to acoustic signals, unlike unconstrained audiovisual recordings. The dataset was created by technetium66 and was last updated on 2026-01-03.

AudioMultimodalAcousticsMultimodal LearningLarge Scale3d GeometryMaterial Physics+1

0 views

Multimodal & LLM

Nemotron Cascade RLHF Training Prompts and Metadata

A collection of 45,882 prompt samples designed for Reinforcement Learning from Human Feedback training. Created by NVIDIA, this dataset supports language model alignment and was last updated in December 2025.

TextRlhfPrompt EngineeringLanguage Model AlignmentReinforcement Learning+1

0 views

Multimodal & LLM

Nemotron RLHF Training Prompts and Metadata

45,882 samples comprise this Reinforcement Learning from Human Feedback training dataset. NVIDIA created it for language model alignment, with the dataset last updated in December 2025.

TextRlhfPrompt EngineeringLanguage Model AlignmentReinforcement Learning+1

0 views

Multimodal & LLM

Multimodal for Classifying Cognitive Load

Insufficient information is provided to create a factual summary. The dataset's title suggests a multimodal dataset for cognitive load classification, but no details on size, features, origin, or creation date are available.

0 views

Multimodal & LLM

GroundCUA: UI Screenshots and Annotations for Computer Use Agents

ServiceNow's GroundCUA dataset provides real UI screenshots paired with structured annotations for building multimodal computer use agents. It covers 87 software platforms across productivity, browser, creative, communication, development, and system utility categories. The dataset was last updated on December 24, 2025.

MultimodalHuman DemonstrationMultimodal AiComputer Use AgentsGui Interaction+1

0 views

Multimodal & LLM

VitaSet: 5,145 Vision-Tactile QA Pairs for Physical Property Reasoning

VitaSet is a multimodal dataset for physical property reasoning, combining RGB vision and tactile sensing. It contains 5,145 human-verified question-answer pairs across three tasks: hardness classification, material property description, and surface roughness classification. The dataset was created by Bupt-Joy and last updated on 2025-12-29.

MultimodalIMAGEFOLDERSize Categories10 Kn100 KTask Categoriesquestion AnsweringLanguageenTask Categoriesvisual Question AnsweringMaterial PropertiesLibrarymlcroissantModalityimageLibrarydatasetsRoboticsComputer VisionVision And LanguageRegionusPhysical ReasoningTactile SensingLicensemitVisual Question AnsweringVision Tactile+1

0 views

Multimodal & LLM

SenseNova-SI-800K: Multimodal Training Data for Spatial Intelligence

SenseNova-SI-800K is a dataset created by SenseNova to address deficiencies in spatial intelligence for multimodal foundation models. It is built upon established models like Qwen3-VL and InternVL3 and was last updated on December 23, 2025. The dataset is hosted on Hugging Face and is categorized as containing between 100K and 1M entries.

MultimodalParquetLibrarypolarsTask Categoriesquestion AnsweringSpatial IntelligenceLanguageenTask Categoriesvisual Question AnsweringModalitytextSize Categories100 Kn1 MLibrarymlcroissantFoundation ModelsMultimodal AiLibrarydatasetsLibrarypandasRegionusArxiv251113719Licenseapache 20Visual Question Answering+1

0 views

PreviousPage 56 of 98Next