Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Paper Conclusion RL Training is a dataset for reinforcement learning training based on the EasyR1 (verl) framework. The training model is Qwen3-VL-8B-Thinking, using an external judge model (Qwen3-4B-Instruct-2507) to score predicted conclusions against a 235B teacher model's reference conclusions. The dataset was authored by SII-ChengqiLi and last updated on 2026-04-10.
License is unknown; terms of use must be verified before application.