Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
10,000 to 100,000 multimodal records for cold-start supervised fine-tuning (SFT) in reasoning tasks, released by WaltonFuture in 2025. It supports the research paper 'Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start' by providing initial training data for a two-stage reinforcement learning pipeline.
Associated with Arxiv paper 2505.22334; requires understanding of the two-stage reinforcement learning approach described therein.