Loading...
Loading...
Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data
1,539 datasets
MIAO is a multimodal dataset consisting of paired sound event clips and onomatopoeic images. It is designed to support research on multimodal correspondence between sounds and visual onomatopoeic expressions. The dataset was authored by KeisukeImoto and was last updated on 2026-05-19.
PIN-200M contains approximately 200 million samples of paired and interleaved multimodal documents, requiring around 312 terabytes of storage. The dataset is a mini version of the PIN dataset introduced in a paper from June 2024. It was created by author m-a-p and last updated on Hugging Face in April 2026.
PinkPixel's Story-Writing Dataset is a collection of creative writing stories based on the Writing Prompts ([WP]) format. The data is structured in ChatML format, making it suitable for instruction tuning of language models. The dataset was last updated on May 11, 2026.
PCBA Standard-to-Real Challenge is the official dataset for the ACM Multimedia 2026 Grand Challenge. It focuses on cross-domain visual question answering for real-world manufacturing inspection. The dataset was created by author 'aimmifm' and was last updated on May 14, 2026.
AIPlans provides a dataset of 37,022 text examples formatted for reinforcement learning from human feedback (RLHF). The dataset, derived from PKU-Alignment/PKU-SafeRLHF, includes 33,334 training and 3,688 test examples. It was last updated on 2026-05-04.
Over 2,500 km² of diverse French ecoclimates and landscapes are covered by this large-scale, multi-sensor land-cover resource. It features 63 billion hand-annotated pixels across 19 land-cover and 23 crop type classes, building upon the FLAIR#1 and FLAIR#2 datasets. The dataset was created by IGNF and was last updated on the platform in April 2026.
A JSON-LD knowledge graph encoding the concept layer of the Authorship Strategy research line. The dataset is a mirror of a GitHub repository file, provided for LLM training and AI research tools. It was created by Shimo4228 and last updated on 2026-05-18.
34,000 agent trajectories were synthesized using the Qwen3-Coder-480B-A35B-Instruct model for supervised fine-tuning of software engineering agents. This dataset, created by NVIDIA, was collected using the OpenHands framework and last updated on May 5, 2026. It is designed to advance the capabilities of large language models in software engineering tasks.
PRISM-CoT-new is an expanded supervised fine-tuning corpus for the PRISM Vision-Language Model safety alignment framework. It supersedes the original PRISM-CoT dataset for SFT use cases and was created by andyc03, with contributions from sources like prism-cot-orig and holisafe-bedrock. The dataset was last updated on May 14, 2026.
This dataset contains pretest and posttest speaking performance scores from a quasi-experimental study involving students. It is hosted on figshare and includes data collected to assess the impact of an instructional intervention on oral proficiency.
Ablation study results for the CMAP-Fusion model on the ChestX-ray14 Extended Dataset. The data likely contains metrics comparing the impact of ViT-B/16, SmartTrim, and CMT modules on classification performance and efficiency. The dataset was authored by Chong Liu and last updated on April 24, 2026.
Ablation study results for CMAP-Fusion on the COVID-19 Radiography datasets compare the impact of the ViT-B/16, SmartTrim, and CMT modules on classification accuracy, F1 Score, AUC, Kappa, model parameters, FLOPs, feature sparsity, and cross-modal similarity. The 5.5 KB Excel file was authored by Chong Liu and shared under a CC-BY-4.0 license on figshare in April 2026.
Ablation study results for CMAP-Fusion on the ISIC Skin Cancer datasets. The dataset compares the impact of ViT-B/16, SmartTrim, and CMT modules on classification accuracy, F1 Score, AUC, Kappa, model parameters, FLOPs, feature sparsity, and cross-modal similarity. Chong Liu published the dataset on figshare in April 2026.
A dataset of 105 selected meme images validated for complex interpretation. Each item was selected by a human researcher and validated using two frontier multimodal LLMs. The dataset focuses on cases where meaning emerges through image-text interaction, pragmatic inference, cultural context, ambiguity, incongruity, or potential false-positive moderation risk.
TextSculptor Data contains two Parquet subsets for scene text editing tasks. The subsets include columns for text captions or prompts paired with images stored as embedded bytes. The dataset is associated with a research project and was last updated on 2026-05-21.
Training data for the LLaVA-OneVision-2 family of multimodal models, covering large-scale video and spatial reasoning corpora used in mid-training. The dataset includes subsets like 'mid_training_video/60s_rest/' with 10,809 shards of approximately 60-second video clips and JSONL files containing captions for 30-second and 60-second clips. It was created by mvp-lab and last updated on May 6, 2026.
Training data for the 4DThinker framework, which enables Vision Language Models to 'think with 4D' through dynamic latent mental imagery. The dataset includes approximately 38,000 samples for DIFT training and 37,000 samples for 4DRL training, built upon SpatialVID and DSR_Suite-Data. It was authored by jankin123 and last updated on May 11, 2026.
A multimodal dataset for understanding two-wheeler rider behavior, addressing a research gap in road safety. The dataset was created by varunpaturkar and presented at ICRA 2026. It was last updated on 2026-05-21.
A multimodal dataset of 2,700 entries for predicting cyclic olefin copolymerization performance. The data and trained models were published by 俊杰 姜 on figshare in April 2026. The repository includes files in CSV, PKL, and H5 formats totaling 233.0 MB.
FAVOR-Bench is a benchmark for fine-grained video motion understanding accepted by NeurIPS 2025. It spans both ego-centric and third-person perspectives and includes evaluation for close-ended QA and open-ended descriptive tasks. The dataset was released by the FAVOR-Bench organization in March 2025.