Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A curated subset of The Pile dataset focused on toxic text examples, balanced for training and evaluation. The dataset was created by researcher tomekkorbak and uploaded to Hugging Face in June 2022. It is part of a series of balanced subsets derived from the larger 825GB Pile corpus.
License is unknown; users should verify terms before commercial use. The dataset is stored in Parquet format, requiring compatible libraries for access.