Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A subset of the dataset introduced in the paper 'ROMA: Real-time Omni-Multimodal Assistant with Interactive Streaming Understanding'. This dataset is designed to train multimodal models for streaming video understanding, focusing on proactive interaction tasks. It was authored by EurekaTian and last updated on the Hugging Face platform in January 2026.
License is unknown; terms of use must be verified before application.