Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
VisGym consists of 17 diverse, long-horizon environments for evaluating Vision-Language Models on interactive tasks. The dataset contains agent trajectories where actions are conditioned on past actions and observation history, challenging multimodal sequence handling.
The full description is hosted externally; users must visit the dataset page for complete details on structure, format, and access.