Loading...
Loading...
Image-text pairs, instruction tuning, visual QA, cross-modal data, foundation model training data
1,540 datasets
S1-MMAlign contains 15.5 million image-text pairs extracted from 2.5 million open-access scientific papers across biology, chemistry, and physics. Developed by ScienceOne-AI and released in 2026, it provides a large-scale resource for aligning complex scientific imagery with textual descriptions. The dataset is designed to bridge the semantic gap in scientific multimodal learning using peer-reviewed literature.
BioReason-Pro Test Data is an evaluation dataset for protein function prediction containing proteins with Gene Ontology term annotations, GO-GPT predictions, InterPro domains, STRING protein-protein interactions, and protein metadata. It was created by wanglab and follows the CAFA framework's temporal holdout protocol. The dataset was last updated in March 2026.
Evaluation dataset for the BioReason-Pro model, containing proteins with Gene Ontology term annotations, GO-GPT predictions, InterPro domains, STRING protein-protein interactions, and protein metadata. The dataset was created by wanglab and follows the CAFA framework's temporal holdout methodology. It was last updated in March 2026.
A summary list of foundation models supported by the BioFuse embedding fusion framework. The dataset is a 9.5 KB Excel file created by Mirza Nasir Hossain and last updated on March 18, 2026. It is shared under a CC-BY-4.0 license on figshare.
Nemotron RL Instruction Following MultiTurnChat v1 is a benchmark dataset designed to test and improve large language models in complex, multi-turn conversations. It was created by NVIDIA and employs a 'model breaking' methodology, testing tasks against advanced models like Nemotron-Nano-V2 and Qwen3-235B-A22B-Thinking-2507 to expose failure modes. The dataset was last updated on March 11, 2026.
MMPD-3 contains images and morphological metrics for 30 species of medicinal plants. The dataset is hosted on Kaggle, but the author, organization, and specific data collection details are not provided. The last update date and data volume are unknown.
A dataset of adversarial prompts designed to explicitly conflict with an AI model's standard training instincts, such as writing code without comments or refusing helpfulness norms. It focuses on 8 distinct 'anti-convention' patterns and was created by NVIDIA, with a last recorded update on March 11, 2026. The dataset uses a targeted 'model breaking' methodology to generate candidate responses for testing constraint difficulty.
Created by VLR-CVC for the ICDAR2026 competition, this multimodal dataset contains approximately 1,000 document-question pairs focused on complex reasoning. It spans eight distinct domains including business reports, scientific papers, slides, posters, maps, comics, infographics, and engineering drawings.
115,732 pairwise human preference labels compare outputs from 4 frontier video generation models. Annotators from Datapoint AI evaluated videos across 3 quality dimensions, using 417 unique prompts and 11 motion categories. The dataset was last updated in March 2026.
SimWorld-AI created SimWorld, an open-ended realistic simulation platform for developing and evaluating LLM and VLM AI agents. The platform supports complex physical and social environments, with a last recorded update on 2026-03-14. The description indicates support for importing customized environments and agents.
52,678 in-the-wild videos feature synchronized visual, audio, and text data. Ground-truth importance scores are derived from YouTube's 'Most Replayed' statistics, reflecting collective viewer engagement. The dataset was created by author hminjeong and was last updated in March 2026.
57,866 pairwise human preference labels compare 4 frontier video generation models. Datapoint AI collected these annotations across 3 quality dimensions for 417 unique prompts covering 11 motion categories. The dataset was last updated in March 2026.
Multimodal sensor data includes WiFi signals, inertial measurement units, AirPods audio, depth/IR cameras, DensePose, human pose, mesh, and action labels. The dataset appears to be designed for complex human activity analysis and sensor fusion tasks. Its origin, size, and collection methodology are not specified in the available metadata.
NVIDIA released this synthetic dialogue dataset in March 2026 to improve model interactivity and instruction following. It contains multi-turn conversations generated by an ensemble of high-parameter models including Qwen3-235B, GLM-4.6, and Kimi-K2-Thinking.
A Chinese multimodal benchmark for e-commerce product understanding, released following a legal and privacy review aligned with China's PIPL. The dataset includes original images, product titles, and category/attribute annotations, with all personally identifiable information removed. It was created by author ZHNie and last updated on March 23, —.
A synthetic dataset of 1,070,917 agentic command operations for 36 creative, technical, and engineering software environments. Created by rAVEUK and last updated on March 15, 2026, it is engineered to stress-test and evaluate multimodal AI agents operating within complex software infrastructures.
10,000 entries support training and evaluating Multimodal Large Language Models on visual instruction following. The dataset is structured in a messages format with user instructions and assistant responses, referencing images from sources like LLaVA-Instruct and Visual Genome. It was created by KerenStone for research published in the paper 'Empowering Reliable Visual-Centric Instruction Following in MLLMs'.
Creative Professionals Agentic Tasks 1M is a massive-scale synthetic dataset containing 1,070,917 agentic command operations across 36 diverse software environments. It is specifically engineered to stress-test, evaluate, and fine-tune multimodal AI agents designed for complex software interaction and multi-step reasoning. The dataset spans creative, technical, and engineering domains to provide a robust training ground for deep software infrastructure operations.
EveNet is a foundation model for particle collision data analysis, as described in the arXiv preprint arXiv:2601.17126. The dataset was uploaded by Avencast and last updated on March 31, 2026. Its specific content and scale are not detailed in the provided metadata.
A multimodal dataset linking wide-field calcium imaging of the mouse neocortex to behavioral measurements during a motor skill learning task. It includes 15 sessions over two weeks from 25 mice trained to pull a lever for water rewards, with simultaneous high-speed videography and environmental monitoring. The dataset is formatted in the Neurodata Without Borders (NWB) standard and adheres to FAIR principles.