DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,947 datasets

Multimodal & LLM

KOL Decisions: Web3 Community Manager Instruction Tuning Dataset

KOL Decision-Making Dataset for Web3 Community Managers is designed for instruction tuning. The dataset likely contains examples of decisions or actions taken by Key Opinion Leaders in Web3 communities. Its origin and scale are unspecified, as the description metadata is limited.

TabularCommunity ManagementWeb3Decision MakingInstruction Tuning+1

0 views

Multimodal & LLM

Double-Delta Multi-Fidelity Aerodynamics Dataset

Double-Delta Multi-Fidelity Aerodynamics Dataset is an open-source benchmark for a parametric family of double-delta wings. It contains paired low-fidelity Vortex Lattice Method and high-fidelity simulation data. The dataset was created by yirens and last updated on 2026-03-22.

TabularComputational Fluid DynamicsBenchmarkUncertainty QuantificationAerodynamicsMulti FidelitySurrogate Modeling+1

0 views

Multimodal & LLM

Kannada Image Captioning Dataset for Multimodal AI Training

A dataset for image captioning tasks, likely containing images paired with descriptive text in the Kannada language. It is hosted on the Kaggle platform, but details about its size, creation date, and authorship are not provided in the available metadata. The dataset's content and structure require verification after download.

MultimodalMultimodal AiComputer VisionImage CaptioningKannada Language+1

0 views

Multimodal & LLM

StreamVLM-Checkpoint: Vision-Language Model Weights

Kaggle hosts a dataset titled 'streamvlm-checkpoint'. The dataset likely contains model weights or parameters for a vision-language model. No information is available regarding its author, organization, size, or last update date.

MultimodalVision Language ModelMultimodal AiCheckpoint+1

0 views

Multimodal & LLM

Verified Global Business Entities for AI Agent Supply Chains

NOO-Verified-Global-Entities provides a data infrastructure layer to prevent AI agents from hallucinating non-existent or unsuitable B2B suppliers. The dataset was created by Nooxus-AI and was last updated in March 2026. It is designed as a definitive verification source for global commercial entities.

TabularParquetSize Categories1 Kn10 KTask Categoriestext GenerationAnti HallucinationB2bLibrarypolarsLanguagezhTask Categoriesquestion AnsweringLanguageenRagModalitytextEntity VerificationLibrarymlcroissantLibrarydatasetsLibrarypandasEntity ResolutionRegionusAgentSupply ChainTask Categoriestabular ClassificationLicensemitB2b Data+1

0 views

Multimodal & LLM

Human Preference Annotations for Image-to-Video Generation

DatapointAI released a 1,000-row dataset in March 2026 for evaluating image-to-video generation models. Each row contains a reference image, two generated videos from Pika and CogVideoX models, and 10 aggregated human preference annotations. The dataset provides a total of 10,000 individual human judgments on video quality.

MultimodalOPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsRlhfLibrarydaskLanguageenModalitytextTask Categoriesimage To VideoLibrarymlcroissantModalityimageLibrarydatasetsLicensecc By 40Video GenerationComputer VisionI2vHuman PreferencesQuality EvaluationRegionusImage To VideoDpoTask Categoriesvideo ClassificationSynthetic+1

0 views

Multimodal & LLM

Multimodal Dance Motion Analysis Dataset for Pose and Movement

A dataset focused on dance movement and pose analysis, published on Kaggle. The dataset likely contains motion capture or video data paired with pose annotations. Specific details on size, collection method, and temporal coverage are not provided in the available metadata.

MultimodalMultimodal DataDance AnalysisMotion CapturePose Estimation+1

0 views

Multimodal & LLM

Human Preference Annotations for Image-to-Video Generation

3,000 rows of human preference data for evaluating image-to-video generation. Each row contains a reference image, two generated videos from Pika and CogVideoX models, and 10 aggregated human annotations. The dataset was created by datapointai and last updated in March 2026.

MultimodalOPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsRlhfLibrarydaskLanguageenModalitytextTask Categoriesimage To VideoLibrarymlcroissantModalityimageLibrarydatasetsLicensecc By 40Video GenerationComputer VisionI2vHuman PreferencesRegionusImage To VideoDpoTask Categoriesvideo ClassificationSynthetic+1

0 views

Multimodal & LLM

Visual Question Answering Pairs for Fine-Grained Multimodal Perception

ZwZ-RL-VQA is a dataset containing 74,000 high-quality visual question-answering pairs generated via Region-to-Image Distillation. The dataset was created by inclusionAI for training multimodal large language models on fine-grained perception tasks and was last updated in March 2026.

MultimodalParquetSize Categories10 Kn100 KLibrarypolarsArxiv260211858Vision Language ModelLanguageenModalitytextLibrarymlcroissantLibrarydatasetsLibrarypandasFine Grained PerceptionMultimodal TrainingComputer VisionRegionusRegion To Image DistillationVqaLicenseapache 20Visual Question AnsweringSynthetic+1

0 views

Multimodal & LLM

BovCap-5K: Annotated Cattle Images with Natural Language Descriptions

BovCap-5K provides a collection of cattle images paired with natural language descriptions for research. The dataset's author, organization, and specific scale are not detailed in the provided metadata. Its last update date and licensing terms are also unknown.

MultimodalLivestockComputer VisionImage CaptioningAgricultureNatural Language Processing+1

0 views

Multimodal & LLM

VQA-Med: Visual Question Answering for Medical Images

A dataset for Visual Question Answering (VQA) tasks in the medical domain. It is hosted on Kaggle, but its specific size, creation date, and authorship are not detailed in the provided metadata. The dataset likely contains pairs of medical images and related questions with answers.

MultimodalVision LanguageMultimodal MedicalMedical VqaHealthcare Ai+1

0 views

Multimodal & LLM

AdditiveLLM2-OA: Open Access Journal Articles for LLM Domain Adaptation

Open Access journal articles up to February 2026 used for domain-adaptive pretraining and instruction tuning of the AdditiveLLM2 model. The dataset includes text and images, and is split by source journal. It was created by ppak10 and last updated on March 25, 2026.

MultimodalParquetSize Categories10 Kn100 KTask Categoriestext GenerationLicenseotherLibrarypolarsLibrarydaskLanguageenText GenerationModalitytextLibrarymlcroissantModalityimageLibrarydatasetsOpen AccessRegionusLlm TrainingArxiv260322017Journal Articles+1

0 views

Multimodal & LLM

Everglades Salinity and Water Flow Data from 2002 Study

Tables 1-6 from USGS Open-File Report 02-59 contain data on salinity, discharge, and stage (water level) related to culverts under the main road in Everglades National Park. The data were gathered as part of a 2002 study by the South Florida Natural Resources Center and USGS to assess the road's influence on salinity intrusion into Florida Bay. Monitoring sites recorded water level, salinity, and flow during periods when water was present.

TabularTime SeriesSalinityEvergladesEnvironmental monitoringHydrologyWater Flow+1

0 views

Multimodal & LLM

Evenet Exotichiggs H2A4B: Particle Collision Data for Foundation Model Training

Avencast's dataset, associated with the arXiv preprint 'EveNet: A Foundation Model for Particle Collision Data Analysis', was last updated on March 31, III. The dataset appears to be designed for training and evaluating foundation models in the domain of particle physics, specifically for analyzing collision event data.

MultimodalFoundation ModelCollision DataRegionusArxiv260117126Particle PhysicsLicensemit+1

0 views

Multimodal & LLM

Real-World Construction Documents for Multimodal AI Benchmarking

AEC-Bench is a multimodal collection of real-world Architecture, Engineering, and Construction documents, including construction drawings and floor plans. The dataset was created by nomic-ai and was last updated in April 2026. It is structured for benchmarking tasks across scopes and task families.

ImageMultimodalTextDocument UnderstandingLanguage Creatorsexpert GeneratedTask Categoriesquestion AnsweringAec BenchLanguageenEngineeringTask Categoriesvisual Question AnsweringModalitytextVision LanguageArchitectureModalityimageBenchmarkArxiv260329199RegionusConstructionArchitecture Engineering ConstructionMultilingualitymonolingualLicenseapache 20Visual Question AnsweringAnnotations Creatorsexpert Generated+1

0 views

Multimodal & LLM

Medical VQA - 5 Datasets: Vision-Language Medical Data in LLaVA Format

Medical VQA - 5 Datasets (LLaVA format) is a collection of medical vision-language datasets aggregated on Kaggle. The datasets are formatted for the LLaVA (Large Language-and-Vision Assistant) framework, suggesting they contain paired image and text data. The specific source, size, and creation date of the datasets are not provided in the available metadata.

MultimodalMedical ImagingVision LanguageHealthcareMedical VqaLlava Format+1

0 views

Multimodal & LLM

Image-to-Video Human Preference Annotations for Two Models

A dataset of 2,000 human preference annotations for evaluating image-to-video generation. Each row contains a reference image, two generated videos from Pika and CogVideoX models, and 10 human annotations aggregated via majority vote. Created by datapointai and last updated in March 2026.

MultimodalOPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsRlhfLibrarydaskLanguageenModalitytextTask Categoriesimage To VideoHuman PreferenceLibrarymlcroissantModalityimageLibrarydatasetsLicensecc By 40Video GenerationComputer VisionI2vHuman PreferencesRegionusImage To VideoDpoTask Categoriesvideo ClassificationSynthetic+1

0 views

Multimodal & LLM

MIBench: A Benchmark for Multimodal Interaction Capabilities of Large Models

MIBench is a benchmark designed to evaluate the multimodal interaction capabilities of Large Multimodal Models (LMMs). It was created by an author or organization named Resurrect and was last updated on March 26, 2026. The benchmark focuses on how models integrate and utilize information across different modalities based on task demands.

MultimodalModel EvaluationLarge Multimodal ModelsBenchmarkAi AssessmentMultimodal Benchmark+1

0 views

Multimodal & LLM

BioReason-Pro: Protein Function Prediction Data for RL Optimization

Wanglab's training dataset for reinforcement learning optimization of the BioReason-Pro model. The data contains proteins with Gene Ontology term annotations, InterPro domains, STRING protein-protein interactions, and protein metadata. It was last updated on March 20, 2026.

MultimodalBioinformaticsMultimodal BiologyProtein FunctionReinforcement Learning+1

0 views

Multimodal & LLM

LLaVA FineTurn: Medical Imaging Datasets for Multimodal AI

A multimodal dataset for fine-tuning Large Language and Vision Assistant (LLaVA) models. It likely contains medical images and associated text from the HAM10000 and BCN20000 collections. The dataset is published on Kaggle, but specific details like size, format, and update date are unknown.

MultimodalMedical ImagingLlavaMultimodal AiFine Tuning+1

0 views

PreviousPage 31 of 97Next