Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Open-Orca's SlimOrca Dedup is a dataset of 363,000 unique instruction-response examples derived from the SlimOrca collection. It was created by removing RLHF instances and applying minhash and Jaccard similarity techniques for deduplication. The dataset was last updated on Hugging Face on May 19, 2025.
Demo models linked in the description were trained on the full SlimOrca dataset, not this deduplicated version.