Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Puffin-4M is a large-scale, high-quality dataset containing 4 million samples for camera-centric multimodal understanding and generation. It integrates vision, language, and camera modalities to address the scarcity of benchmarks in spatial multimodal intelligence. The dataset was created by KangLiao and was last updated in January 2026.
The full dataset description is hosted externally; users must visit the provided Hugging Face page for complete details on structure, license, and access.