Sign in to view source links and access this dataset
Description
115,800 images across 1,000 classes, with a maximum of 1,280 and a minimum of 5 samples per category, form the OpenMMlo dataset. It also includes 18,000 images for out-of-distribution detection. The dataset was constructed by MiaoMiaoYang by extending open-source datasets like ImageNet-LT, iNaturalist2018, and Places-LT.
Use Cases
Benchmarking model performance on long-tailed image classification based on the 1,000 classes with varying sample counts.
Evaluating out-of-distribution detection capabilities of MLLMs using the dedicated 18k OOD images.
Studying the impact of open-world, real-world data distributions on multi-modal model generalization.
Training or fine-tuning vision-language models on a dataset with imbalanced class frequencies.
Strengths
115,800 total samples provides a substantial base for training and evaluation.
1,000 classes with a long-tailed distribution (max 1,280, min 5 per class) mimics real-world data imbalance.
Includes a dedicated subset of 18,000 images for out-of-distribution detection tasks.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the OOD subset is specified, but total dataset size and file formats are unknown.
Data may reflect biases inherent to the source datasets (ImageNet-LT, iNaturalist2018, Places-LT).
Provenance
Source
Extended from open-source datasets ImageNet-LT, iNaturalist2018, and Places-LT.
Collection Method
Constructed by extending existing datasets.
Freshness
Last updated 2026-05-07 04:47:33; freshness should be verified.
License is unknown; potential usage restrictions should be checked before download.