DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,928 datasets

Multimodal & LLM

AMALIA-VL-DPO: Multimodal Preference Triplets for Vision-Language Model Alignment

AMALIA-VL-DPO Dataset is a collection of Direct Preference Optimization (DPO) data for the AMALIA-VL project. It contains preference triplets where each row includes a prompt with normalized role-content turns and image placeholders, a chosen assistant response, and a rejected response. The dataset was created by author 'amalia-llm' and was last updated on June 30, 2026.

MultimodalPreference TripletsMultimodal LlmDirect Preference OptimizationComputer VisionInstruction Tuning+1

0 views

Multimodal & LLM

21,238 YouTube Videos with Multimodal Metadata and Clickbait Labels

YTClickbait21K is a human-annotated dataset of 21,238 YouTube videos for clickbait detection research. It includes video metadata like titles, descriptions, and thumbnails, along with binary clickbait labels from three annotators per video. The dataset was uploaded by Md. Minhazul Islam to figshare on April 9, 2026.

TabularMultimodalZIPCSVMultimodal DataComputer VisionClickbait DetectionYoutube MetadataHuman AnnotatedNatural Language Processing+1

0 views

Multimodal & LLM

ACM: Audio-Centric Multimodal Benchmark for Retrieval

ACM packages the Audio-Centric Multimodal benchmark introduced by the paper 'Efficient and High-Fidelity Omni Modality Retrieval'. The dataset contains four HuggingFace subsets, each using a single test split to preserve natural schemas for queries and candidates. It was authored by chuonghm and last updated on July 4, 2026.

AudioMultimodalMachine LearningBenchmarkAudio RetrievalMultimodal Benchmark+1

0 views

Multimodal & LLM

SciGenEdit-10K: A Scientific Image Dataset for Generation and Editing

SciGenEdit-10K is a public subset of the S1-Omni-Image project, released by the ScienceOne team at the Chinese Academy of Sciences. It is designed for research on scientific image generation, editing, and multi-turn interactions. The dataset page was last updated on 2026-06-25.

MultimodalMultimodal AiComputer VisionImage EditingScientific Image Generation+1

0 views

Multimodal & LLM

X-CASE: 1,000 Multimodal Social Scenarios for AI Agent Safety Evaluation

ACL 2026 benchmark dataset of 1,000 multimodal social activity scenarios created by adonaivera. It is designed to evaluate generative AI agents' ability to detect and correct unsafe behavior during iterative plan revision. Each scenario contains a natural-language social activity description and an unsafe hourly plan spanning 11 activities from 7 PM to 5 AM.

MultimodalMultimodal ScenariosBenchmarkSocial SimulationSafety BenchmarkAi Agent Evaluation+1

0 views

Multimodal & LLM

MillionST: Satellite Image Time Series for Spatiotemporal Foundation Models

A large-scale satellite image time series dataset curated for pre-training spatiotemporal foundation models for Earth observation. It contains approximately 1 million satellite images from 100,000 geographic locations, with each location observed across 10 temporal phases over five years. The dataset was introduced in the TiMo paper and is hosted by Hillui on Hugging Face.

ImageTime SeriesGeospatialSatellite ImageryComputer VisionEarth ObservationLarge Scale+1

0 views

Multimodal & LLM

ImREGIS: Portuguese Multimodal Image-Text Dataset from Oil & Gas Documents

Over 439,000 unique images and 581,000 image-text pairs were automatically extracted from more than 20,000 PDF documents in the REGIS collection. The dataset, created by Geologi, focuses on technical documents, theses, and reports from the Oil & Gas and Geosciences domain. It was last updated on 2026-06-21.

MultimodalPortugueseOil GasGeosciencesImage TextComputer Vision+1

0 views

Multimodal & LLM

Giorgia Meloni's Multimodal Political Communication Videos, 2019-2025

47 video appearances by Italian Prime Minister Giorgia Meloni are analyzed through a 74-variable multimodal coding scheme. The dataset, created by Canan Cetin and last updated in 2026, supports a study on communicative domestication, tracking shifts in themes, gestures, and presentation from opposition to office. Analysis reveals significant changes, such as motherhood references collapsing from 43% to 2% of appearances.

MultimodalMultimodal AnalysisPopulismGendered NationalismNatural Language ProcessingPolitical CommunicationDiscourse Analysis+1

0 views

Multimodal & LLM

LapChole-FOCUS-VQA: A Benchmark for Long-Context Surgical Video Understanding

LapChole-FOCUS-VQA is a clinically grounded benchmark designed for evaluating long-context video understanding in minimally invasive surgery. The dataset is maintained by the ORena FOCUS Challenge team and was last updated on June 22, 2026. It is a gated resource, with access granted only to challenge participants after manual review.

VideoMultimodalBenchmarkMedical VideoVideo UnderstandingSurgical AiClinical Benchmark+1

0 views

Multimodal & LLM

SAW-Bench: A Benchmark for Egocentric Situated Awareness in AI Models

SAW-Bench evaluates observer-centric situated awareness in multimodal foundation models. The benchmark probes a model's ability to reason about space, motion, and possible actions from an evolving egocentric viewpoint. It was created by ucsbai and last updated on Hugging Face in June 2026.

MultimodalAi EvaluationFoundation ModelsBenchmarkEgocentric VisionSituated AwarenessMultimodal Benchmark+1

0 views

Multimodal & LLM

SpaRRTa-Lego: Real-World Spatial-Relation Benchmark with Toy Minifigures

SpaRRTa-Lego is the real-world counterpart of a synthetic spatial-relation benchmark. Scenes are photographed with toy minifigures and everyday objects for evaluating Visual Foundation Models. The dataset is associated with a paper (arXiv:2601.11729) and code, and was last updated on 2026-06-25.

MultimodalSim To RealBenchmarkSpatial RelationsVisual Foundation ModelsComputer VisionSynthetic+1

0 views

Multimodal & LLM

Power Quality Measurements from Controlled High-Impedance Fault Experiments

1.2 GB of power quality measurements recorded during laboratory experiments on high-impedance faults in medium-voltage covered conductors. The data includes RMS voltage and current, harmonics, phase angles, and power metrics, collected using a Hioki PQ3198 analyzer following IEC standards. Author Diogo Biasuz Dahlke published this dataset on figshare in May 2026.

Time SeriesMultimodalZIPTextPower QualityHigh Impedance FaultsElectrical GridCovered ConductorsMedium Voltage+1

0 views

Multimodal & LLM

Mouse Neocortex Activity During a Texture Discrimination Task

Simultaneously recorded neuronal population activity from the S1 and PPC regions of awake mice. Data was acquired using two-photon calcium imaging during a texture discrimination task by Shuting Han at the University of Zurich. The temporal coverage and dataset size are not specified in the provided metadata.

MultimodalCalcium ImagingPopulation ActivityBehavioral DataNeuroscienceMouse Neocortex+1

0 views

Multimodal & LLM

Neutrosophic Technarrative Architecture: Symbolic–Emotional VR Evaluation Framework

A methodological framework and computational pipeline for evaluating symbolic and emotional responses to virtual architectural spaces. The framework integrates symbolic modeling of concepts like justice and identity with psychophysiological measures and AI-ready computational inference. It was authored by Jesus Rafael Hechavarria-Hernandez and published on figshare in May 2026.

MultimodalMultimodal AnalysisVirtual RealityNeuroarchitectureAffective ComputingBenchmarkComputational Model+1

0 views

Multimodal & LLM

PALL-VLM: Dental Vision-Language Dataset for Instruction Tuning

32,884 records over 52,461 dental images formatted as image-text conversations for LLaVA-style instruction tuning. Curated by Harisundar R, this dataset serves as the training data for the PALL-VLM model. The dataset was last updated on June 12, 2026.

MultimodalDental ImagingVision LanguageComputer VisionMedical AiInstruction Tuning+1

0 views

Multimodal & LLM

BBBC038: 2018 Data Science Bowl Nuclei Segmentation Benchmark

A 2D light-microscopy dataset for cell-nucleus segmentation assembled across many imaging experiments. The collection spans multiple modalities including fluorescence and brightfield histopathology, covering humans, mice, and flies across 30+ experiments. This is the official BBBC038v1 release from the Broad Bioimage Benchmark Collection, hosted on Hugging Face by MedOtter.

ImageMultimodalBenchmarkComputer VisionMicroscopyBenchmark DatasetCell BiologyBioimage AnalysisNuclei Segmentation+1

0 views

Multimodal & LLM

Human-Subject Experiments on AI Image Caption Evaluation with Informational Cues

Jiang, Yanru provides experimental materials and anonymized participant data supporting the paper 'Informational Cues Mitigate Heuristic Bias in Evaluating Context-Sensitive AI Image Captions'. The data documents two human-subject experiments examining how informational context shapes audience evaluations of AI-generated image captions. The dataset was last updated on July 14, 2026.

TabularMultimodalHeuristic BiasHuman Subject ExperimentsAi EvaluationBenchmarkComputer VisionMultimodal CaptionsSynthetic+1

0 views

Multimodal & LLM

Halluguard Preferences 76K

HalluGuard-Prefs is a 76,708-entry synthetic preference dataset created by author lrsbrgrn for training models to detect hallucinations in text. It was used to fine-tune the HalluGuard-Qwen3-4B model via Odds Ratio Preference Optimization (ORPO). The dataset was introduced in a paper at the 64th Annual Meeting of the Association and was last updated on Hugging Face in June 2026.

TextNlp EvaluationPreference DataSynthetic DatasetLlm HallucinationSynthetic+1

0 views

Multimodal & LLM

Electrical Waveforms from High-Impedance Faults in Covered Conductors

Waveform data from a multimodal dataset of high-impedance faults in medium-voltage covered conductors contains high-resolution electrical waveform recordings from controlled laboratory experiments. Diogo Biasuz Dahlke collected the data using a Hioki MR8741 waveform recorder, sampling synchronized voltage and current at 20 kS/s. The dataset was last updated on 2026-05-05.

Time SeriesMultimodalZIPPower GridFault AnalysisLaboratory DataElectrical Engineering+1

0 views

Multimodal & LLM

CC12M-Camera: Predicted Camera Parameters for 10.9M Images

CC12M-Camera provides per-image camera parameter annotations for the Conceptual 12M dataset. The annotations cover approximately 10.97 million images and were predicted by the Puffin camera-centric multimodal model. The dataset was created by KangLiao and was last updated on June 24, 2026.

MultimodalMultimodal AnnotationComputer VisionImage DatasetCamera Parameters+1

0 views

PreviousPage 5 of 97Next