Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A research paper and associated materials introducing a novel algorithm for distributional off-policy evaluation in reinforcement learning. The work by Qi Kuang presents Deep Quantile Process regression-based Off-Policy Evaluation (DQPOPE), which estimates the full return distribution rather than just its expectation. The package includes a PDF, markdown files, and a ZIP archive, totaling 12.5 MB and last updated on May 18, 2026.
License is CC-BY-4.0. The 12.5 MB size suggests a small dataset, primarily containing paper and code files rather than large-scale training data.