Loading...
Loading...
Available on 1 platform
Sign in to view source links and access this dataset
A filtered subset of the Pile dataset, focused on text with toxicity labels, curated by researcher tomekkorbak and hosted on Hugging Face. It contains approximately 100,000 text samples, as indicated by its size category, and was last updated in April 2022. The data is intended for training and evaluating language models on toxic content.
License terms are unspecified; users must verify permissible use. The dataset is stored in Parquet format, requiring compatible libraries like Polars or Pandas for loading.