Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
OmniVideo-R1 provides between 100,000 and 1,000,000 preprocessed records for audio-visual reasoning, published by jankin123 in March 2026. The collection supports a two-stage training framework for multimodal models, specifically focusing on Query-Intensive (QI) grounding and modality attention.
Data is formatted in JSON and intended for use with the OmniVideo-R1 framework; see Arxiv 2602.05847 for implementation details.