Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Housing 300,466 high-perplexity text samples filtered from the OpenHermes 2.5 dataset by Malum0x in March 2026. It consists of the top 30% of records that Qwen2.5-3B-Instruct identified as having the highest cross-entropy loss during scoring.
Samples were selected based on the 70th percentile loss threshold; users should verify if high perplexity indicates complexity or data noise for their specific use case.