Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Common-O contains between 10,000 and 100,000 image-text pairs designed by Meta researchers in 2026 to evaluate multimodal LLM reasoning. The data is organized into two subsets featuring household objects to test the ability of models to identify common elements across 3 to 16 different scenes.
The dataset is distributed in Parquet format and requires libraries like Polars or Dask for efficient handling of the 10K-100K records as indicated by metadata tags.