Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
50 million (query, document) text pairs sampled from 34 source subsets using balanced temperature sampling. The dataset was created by author capemox and last updated on Hugging Face in May 2026. Pairs are allocated proportionally to the square root of each source's size, with surplus redistribution.
License is unknown; terms of use must be verified before application.