Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
OmniScience provides between 1 million and 10 million multi-modal records for scientific image understanding, released by UniParser in January 2026. The data pairs scientific imagery with text to support image-to-text tasks, following a collection phase completed in September 2025.
The dataset is distributed under a Creative Commons Attribution Non-Commercial Share Alike 4.0 license; it is optimized for use with Polars, Dask, and the Hugging Face Datasets library.