DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,937 datasets

Multimodal & LLM

African Medical Multimodal Bone Fracture Dataset

A synthetic dataset parameterized from published Sub-Saharan African literature, not real observations. It is a multimodal bone fracture classification dataset designed for African healthcare contexts, created by electricsheepafrica and last updated on April 14, 2026.

MultimodalMedical ImagingHealthcareBone FractureSynthetic DataSyntheticAfrica Healthcare+1

0 views

Multimodal & LLM

Mouse Brain MRI Dataset with Refined Atlas-Derived Labels

High-resolution-Multimodal is a mouse brain MRI dataset. The dataset includes refined anatomical labels derived from an atlas. The author, organization, and last update date are unknown.

ImageMultimodalMedical ImagingMouse BrainMriNeuroimagingAnatomical Labels+1

0 views

Multimodal & LLM

Spanish Tourism Video Annotations for Multimodal Analysis

Zeng's corpus contains annotation files and coding templates for analyzing institutional Spanish tourism videos. The dataset includes ELAN annotation files and Excel coding templates, with materials available from the author upon request. It was last updated on April 6, 2026.

MultimodalExcelVideo AnnotationMultimodal AnalysisTourism PromotionSpanish InstitutionsNatural Language Processing+1

0 views

Multimodal & LLM

MMRF CoMMpass: Multimodal Survival Prediction for Multiple Myeloma

This reproducibility package contains processed data and outputs for a dynamic-t multimodal landmark survival prediction framework for multiple myeloma. It includes processed MMRF CoMMpass resources, external validation resources, and training/evaluation outputs, supporting reproducible model training, benchmarking, and visualization. The framework utilizes laboratory time-series, drug exposure, and imaging-derived features for survival modeling.

Time SeriesMultimodalMultimodal Deep LearningDeep InsightMedical Time SeriesBenchmarkReproducibility PackageDynamic Survival PredictionMultiple Myeloma+1

0 views

Multimodal & LLM

China VLM Censorship: Multimodal Content Moderation Data

China-related data likely concerning content moderation and censorship, potentially involving visual and language models. The dataset was published on huggingface by author AlexZZA and was last updated on 2026-06-09. Its specific content, scale, and collection method are not detailed in the available metadata.

Multimodal🇨🇳 ChinaSocial MediaContent ModerationCensorship+1

0 views

Multimodal & LLM

FT-LLM 2026 QA: Japanese Visual Question Answering Dataset for VLM Tuning

A Japanese visual-question-answering dataset used for Stage 1-2 visual instruction tuning of the COMPASS Vision-Language Model. Each sample contains a document or natural image together with one or more Japanese question–answer pairs. The dataset was created by Yana and last updated on 2026-04-16.

MultimodalVision Language ModelComputer VisionInstruction TuningJapanese LanguageVisual Question Answering+1

0 views

Multimodal & LLM

LLM Training Metrics and Arena Performance Data

LLM Training Metrics + Arena Performance Data is a dataset hosted on Kaggle. Its title suggests it contains metrics related to large language model training and performance evaluations from an arena-style benchmark. The specific contents, scale, and origin are not detailed in the available metadata.

TabularModel EvaluationLlm TrainingPerformance Metrics+1

0 views

Multimodal & LLM

Zebra Finch Behavioral Responses to Cross-Modal Stimuli Under Different Rearing Conditions

A study of female zebra finches tested for responses to audio and visual stimuli from mates or strangers. The dataset includes processed data for statistical analysis, plus raw coordinate data for beak tip, head, and back positions during pre-stimulus and playback periods. Data was collected by Sarah Woolley and harvested from Borealis Dataverse in April 2026.

TabularAudioTime SeriesGeospatialZebra FinchAnimal BehaviorDevelopmental biologyMultisensory IntegrationPose Tracking+1

0 views

Multimodal & LLM

GTA VI Trailer Frames for Computer Vision Annotation

Frame-level annotations extracted from Grand Theft Auto VI promotional trailers. The dataset is intended for computer vision and multimodal AI research. It was sourced from Kaggle, but the author, organization, and specific collection details are unknown.

ImageMultimodalMultimodal AiComputer VisionGame TrailersVideo Frames+1

0 views

Multimodal & LLM

MPDD-AVG: Multimodal Personality-Aware Depression Detection Data

A dataset for the MPDD-AVG Challenge at ACM MM 2026. It is designed for multimodal personality-aware depression detection via audio-visual interview and gait analysis. The dataset was created by author 'chasonfff' and was last updated on May 2, 2026.

AudioMultimodalGait AnalysisMultimodal Depression DetectionAudio Visual InterviewClinical Ai+1

0 views

Multimodal & LLM

SenBen: Sensitive Content Benchmark with 13,999 Movie Frames

13,999 frames sampled from 157 movies released between 1982 and 2023. The dataset is annotated with grounded scene graphs and 16 safety tags for evaluating vision-language models, created by author fcakyon and last updated on April 27, 2026.

MultimodalMovie FramesScene GraphsVision Language ModelsBenchmarkSafety BenchmarkComputer VisionContent Moderation+1

0 views

Multimodal & LLM

PerturbPair: Dual-Modality Platform for Cellular Response Mapping

A 7.7 GB dataset integrating Perturb-seq and optical pooled screening to map cellular responses and enable cross-modal inference. The dataset is stored in H5AD format and was authored by Romain Lopez, last updated on 2026-04 18.

MultimodalCellular ResponsePerturb SeqOptical Pooled ScreeningSingle Cell GenomicsCross Modal Inference+1

0 views

Multimodal & LLM

Vqa Bn: Bengali Visual Question Answering Dataset Translated from VQA v2.0

A Bengali translation of the VQA v2.0 dataset created for research in Visual Question Generation. The dataset contains Bangla questions and answers aligned with images, along with the original English annotator answers. It was published by Tahsin-Mayeesha in 2023 as part of the work "Visual Question Generation in Bengali" presented at MM-NLG.

MultimodalMultilingualBengaliComputer VisionNatural Language ProcessingVisual Question Answering+1

0 views

Multimodal & LLM

Egocentric Dataset: Daily-Life Manipulation with RGB-D, EMG, and IMU

Lo6yu's Egocentric Multimodal Daily-Life RGB-D EMG IMU Dataset captures synchronized egocentric daily-life manipulation data. The dataset combines RGB-D video for scene context and hand motion with bilateral wrist EMG for muscle activation and wrist IMU signals. It was last updated on 2026-05 02 19:30:11.

VideoMultimodalMultimodal SensingHand Object InteractionEgocentric VisionHuman Activity+1

0 views

Multimodal & LLM

Multimodal Indian Sign Language Data for Computer Vision

Multimodal data for Indian Sign Language is hosted on Kaggle, a platform for data science projects. The dataset likely contains visual or video recordings of signs, potentially paired with text or other modalities. Specific details on the number of samples, collection method, and author are not provided in the available metadata.

MultimodalIndian LanguagesIndian Sign LanguageComputer VisionSign LanguageAccessibilityGesture Recognition+1

0 views

Multimodal & LLM

Female Preference Data for Mate Choice Copying

Preference data tracks changes in female choice for an initially unpreferred male across learning and copying tests. The dataset includes binary indicators for preference increase and test phase identifiers. Authored by Marina Hutchins and last updated in April 2026.

TabularCSVAnimal BehaviorPreference LearningBehavioral BiologyMate choice+1

0 views

Multimodal & LLM

Deepseek AI Thinking with Visual Primitives: Model Benchmarks and Cold-Start Data

A technical report released on 2026-04-30 details an approach for Multimodal Large Language Models (MLLMs) to bridge the 'Perception Gap'. The dataset, uploaded by NodeLinker, is intended to include in-house benchmarks and a subset of cold-start data for future public release, with model weights planned for integration into a foundation model.

MultimodalVisual PrimitivesMultimodal LlmAi BenchmarkComputer Vision+1

0 views

Multimodal & LLM

Multimodal Classroom State Analytics with IoT and Behavioral Data

IoT sensor, behavioral, and learning data for intelligent classroom adaptation. The dataset likely contains multimodal signals for analyzing classroom dynamics. Its specific scale, origin, and update frequency are not detailed in the provided metadata.

MultimodalClassroom DataIot SensorsBehavioral AnalyticsEducational Technology+1

0 views

Multimodal & LLM

Nigerian Pidgin Voice and Text Dataset for Voice AI Development

An estimated 75–100 million people speak Nigerian Pidgin English, yet no production-ready, commercially licensed dataset existed for it. The Nigerian Pidgin Voice + Text Dataset is a multimodal collection built by WAZOBIALABS to fill critical gaps that cause voice AI to fail for Nigerian users. It was last updated on Hugging Face in April 2026.

AudioMultimodalMultimodal DatasetPidgin EnglishLarge ScaleAfrican LanguagesSpeech Recognition+1

0 views

Multimodal & LLM

Ko Vdr Hn: Korean Visual Document Retrieval Hard Negatives for Embedding Models

Korean Visual Document Retrieval Hard Negatives is a multimodal training set for fine-tuning embedding models. The dataset, created by whybe-choi, was last updated on 2026-04-25. Each row contains a text query, a page image document, one positive match, and seven mined hard negatives.

MultimodalKorean LanguageDocument ImagesEmbedding TrainingComputer VisionMultimodal Retrieval+1

0 views

PreviousPage 21 of 97Next