Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
Featuring 200,000 text documents from The Pile, balanced for toxicity. It was created by selecting the 100,000 most toxic and 100,000 least toxic documents from a 7-million-document subset scored using the Perspective API. The dataset was authored by tomekkorbak and last updated in April 2022.
The dataset is constructed from two separate Hugging Face datasets ('tomekkorbak/pile-toxic-chunk-0' and 'tomekkorbak/pile-nontoxic-chunk-0'); users should verify the join or combination method. License information is unknown and should be checked before use.