Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VisionFoundry-10K is a synthetic visual question answering dataset containing 10,000 image-question-answer triples. The data was created by the VisionFoundry pipeline, which uses an LLM to generate task-aware content and a text-to-image model to synthesize images, with samples filtered by a multimodal verifier. It was authored by zlab-princeton and last updated on Hugging Face in April 2026.
License is unknown, which may restrict commercial or research use.