DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Multimodal & LLM Datasets | DataSalon

All Categories

🔗

Multimodal & LLM

Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data

1,948 datasets

Multimodal & LLM

Zolai LLM Training Dataset with Predefined Splits and Script

Tedim Zolai LLM train/val/test splits and a training script are provided. The dataset's author, organization, and specific size are unknown. The original platform is Kaggle.

TextLlm TrainingText CorpusTrain Val Test Splits+1

0 views

Multimodal & LLM

Final VQA Done: Visual Question Answering Dataset

Kaggle hosts a dataset titled 'final-vqa-done'. The dataset's content likely relates to visual question answering, a multimodal AI task. Specific details such as the number of samples, collection method, and creator are not provided in the available metadata.

MultimodalMultimodal AiComputer VisionNatural Language ProcessingVisual Question Answering+1

0 views

Multimodal & LLM

final-vqa-rank16: Visual Question Answering Benchmark Dataset

final-vqa-rank16 is a dataset hosted on Kaggle. The title suggests it is likely related to Visual Question Answering (VQA), a multimodal AI task. Its specific content, scale, and origin are not detailed in the provided metadata.

MultimodalMachine LearningMultimodal AiImage TextVisual Question Answering+1

0 views

Multimodal & LLM

MMTA-v1.0: A Multimodal Time-Series Benchmark

MMTA-v1.0 is a benchmark dataset for multimodal time-series analysis, published on Kaggle. The dataset likely contains aligned data from multiple modalities, such as sensor readings, images, or text, over time. Specific details on volume, authorship, and update recency are unavailable from the provided metadata.

Time SeriesMultimodalMachine LearningBenchmark+1

0 views

Multimodal & LLM

HER-Dataset: Reasoning-Augmented Role-Playing Dialogues from Literature

HER-Dataset is a high-quality role-playing dataset featuring reasoning-augmented dialogues extracted from literary works. It introduces dual-layer thinking for cognitive-level persona simulation. The dataset was authored by ChengyuDu0123 and last updated on February 4, 2026.

TextRole PlayingReasoningLlm TrainingLiterary Dialogue+1

0 views

Multimodal & LLM

College Student Stress, Emotion, and Anxiety Data with Intervention Analytics

Multimodal data on college student stress, emotion, and anxiety, likely collected for intervention analytics. The dataset's author, organization, and specific size are unknown. The last update date is also unknown.

MultimodalCollege StressEmotion AnalyticsStudent Mental HealthAnxiety DataIntervention Analytics+1

0 views

Multimodal & LLM

Vision-Language Agent Trajectories Across 17 Interactive Environments

VisGym consists of 17 diverse, long-horizon environments for evaluating Vision-Language Models on interactive tasks. The dataset contains agent trajectories where actions are conditioned on past actions and observation history, challenging multimodal sequence handling.

MultimodalTask Categoriesimage Text To TextSize Categories1 Mn10 MLanguageenVision Language ModelsArxiv260116973SftRegionusReinforcement LearningVlmAgentLicenseapache 20+1

0 views

Multimodal & LLM

final-vqa-rank32: Visual Question Answering Ranked Responses

final-vqa-rank32 is a dataset for Visual Question Answering (VQA) tasks, likely containing image-question pairs with multiple ranked answer candidates. The dataset is hosted on Kaggle, but its specific origin, size, and creation details are not provided in the available metadata. Metadata is minimal; actual content requires verification after download.

MultimodalMultimodal AiRanked DataVisual Question Answering+1

0 views

Multimodal & LLM

final-vqa-rank8: Visual Question Answering Dataset

A dataset likely for Visual Question Answering (VQA) tasks, as suggested by the title abbreviation. It was published on the Kaggle platform. The specific content, size, and creation details are not provided in the available metadata.

MultimodalMachine LearningMultimodal AiVisual Question Answering+1

0 views

Multimodal & LLM

Real-Time CO2 Two-Phase Flow Metering Data

Experimental data collected on a 1-inch bore gas-liquid two-phase CO2 flow rig in real time. The dataset includes time-stamped mass flowrates, temperatures, densities, tube frequencies, and differential pressure readings from Coriolis flowmeters installed on multiple test sections.

Carbon Capture And StorageNerc DdcCarbon dioxide+1

0 views

Multimodal & LLM

Swedish Runestone Inscriptions with Photographs and Translations

2,615 ancient Scandinavian runic inscriptions paired with photographs of the runestones. The dataset, created by birgermoell, provides scholarly transliterations, Old Norse normalizations, and English translations for each entry. It was last updated on Hugging Face in February 2026.

MultimodalMultimodal DataVision LanguageComputer VisionAncient TextsRunestone InscriptionsHistorical Artifacts+1

0 views

Multimodal & LLM

DuwatBench: Arabic Calligraphy Benchmark for Multimodal Understanding

A benchmark dataset bridging language and visual heritage through Arabic calligraphy, developed by researchers from Mohamed bin Zayed University of AI, NUCES, NUST, and Australian National University. It was last updated on January 28, 2026.

MultimodalOPTIMIZED-PARQUETParquetSize Categories1 Kn10 KLibrarypolarsModalitytextLibrarymlcroissantModalityimageLibrarydatasetsBenchmarkLanguage VisualLibrarypandasArxiv260119898Arabic CalligraphyRegionusLicenseapache 20Cultural HeritageMultimodal Benchmark+1

0 views

Multimodal & LLM

STRIDE-QA: Visual Question Answering Dataset for Autonomous Driving

Tokyo-based driving data provides 16 million question-answer pairs over 270,000 frames. The STRIDE-QA dataset is a large-scale visual question answering resource for physically grounded spatiotemporal reasoning in autonomous driving. It was constructed from 100 hours of multi-sensor driving data and includes dense annotations such as 3D bounding boxes, segmentation masks, and multi-object tracks.

MultimodalWEBDATASETLanguageenTask Categoriesvisual Question AnsweringLibrarywebdatasetModalitytextSize Categories100 Kn1 MLibrarymlcroissantModalityimageLibrarydatasetsSpatiotemporal ReasoningComputer VisionMulti Sensor DataRegionusLarge ScaleAutonomous DrivingVisual Question AnsweringArxiv250810427+1

0 views

Multimodal & LLM

SeeFar V0: Multi-Resolution Satellite Images for Geospatial Foundation Models

A collection of multi-resolution satellite images from both public and commercial satellites. The dataset is specifically curated for training geospatial foundation models. It is hosted on AWS Open Data and was contributed by the organization Coastal Carbon.

ImageGeospatial🌍 GlobalFoundation ModelMachine LearningEnvironmentalClimateSustainabilitySatellite ImageryNatural ResourceEarth ObservationCoastalMappingBiodiversity+1

0 views

Multimodal & LLM

VQA Vietnamese Food Dataset: Visual Question Answering for Vietnamese Cuisine

A Visual Question Answering dataset focused on Vietnamese food. The dataset likely contains images of Vietnamese dishes paired with questions and answers in text format. It is published on Kaggle, but details on size, creation date, and authorship are currently unknown.

MultimodalComputer VisionNatural Language ProcessingVqa+1

0 views

Multimodal & LLM

Multimodal Competition Dataset

Multimodal competition data published on Kaggle. The dataset likely contains multiple data types such as images, text, or audio, structured for a competitive machine learning task. Metadata is minimal; actual content and scale require verification after download.

MultimodalMachine LearningCompetition+1

0 views

Multimodal & LLM

Multimodal Physiological Data from Cycling Effort

Multimodal physiological data collected during cycling activity. The dataset is hosted on Kaggle, but the author, collection method, and specific time range are not provided in the available metadata. The title suggests it likely contains synchronized sensor readings from multiple modalities recorded during physical exertion.

Time SeriesMultimodalSports scienceCyclingPhysiological DataMultimodal Sensors+1

0 views

Multimodal & LLM

VQA Animals: Questions and Answers for Visual Question Answering

Annotations for a Visual Question Answering dataset focused on animals. The dataset likely contains image-question-answer triplets, as suggested by the raw description. It is published on Kaggle, but details on the number of samples, collection method, and original authors are not provided in the available metadata.

MultimodalAnnotationsMultimodal QaVisual Question AnsweringAnimal Images+1

0 views

Multimodal & LLM

VQA Food VNK: Visual Question Answering Dataset

A dataset likely for Visual Question Answering (VQA) tasks focused on food items. The dataset is hosted on Kaggle, but detailed metadata such as column descriptions, sample data, and size are unavailable. Its content and structure require verification after download.

ImageMultimodalFoodVisual Question Answering+1

0 views

Multimodal & LLM

Multimodal Stroke Data from Kaggle

Multimodal Stroke Data is a dataset hosted on Kaggle. The dataset likely contains information related to stroke diagnosis, treatment, or outcomes. Specific details regarding its size, origin, and creation date are not provided in the available metadata.

MultimodalHealthcareMedicalStroke+1

0 views

PreviousPage 46 of 97Next