Jigsaw Toxicity Pred

Name: Jigsaw Toxicity Pred
Creator: google
Published: 2022-03-02T23:29:22
Keywords: Source Datasetsoriginal, Task Idsmulti Label Classification, Languageen, Licensecc0 10, Size Categories100 Kn1 M, Annotations Creatorscrowdsourced, Regionus, Language Creatorsother, Task Categoriestext Classification, Multilingualitymonolingual

by googleUpdated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

159,571 Wikipedia talk page comments labeled across six distinct categories of toxicity including toxic, severe_toxic, and identity_hate. Each record contains raw text from human discussions paired with binary indicators of offensive behavior as determined by human raters.

Use Cases

Train a multi-label text classifier using the comment_text and the six toxicity category columns
Analyze linguistic markers of aggression using the identity_hate and threat labels
Develop content moderation algorithms to filter comments based on the obscene and insult scores

Strengths

159,571 rows of Wikipedia comment text
Six binary label columns: toxic, severe_toxic, obscene, threat, insult, and identity_hate
Human-labeled ground truth for subjective text classification

Source Datasetsoriginal Task Idsmulti Label Classification Languageen Licensecc0 10 Size Categories100 Kn1 M Annotations Creatorscrowdsourced Regionus Language Creatorsother Task Categoriestext Classification Multilingualitymonolingual

Related Datasets

Quality Score

D29

Description

20

Source

41

Reputation

29

Access

22

Community

970 downloads

32 likes

0 views

Dataset Info

Author: google
Created: Mar 2, 2022
Updated: Jan 18, 2024
Last synced: Jul 22, 2026

Access

22

Community

970 downloads

32 likes

0 views

Dataset Info

Author: google
Created: Mar 2, 2022
Updated: Jan 18, 2024
Last synced: Jul 22, 2026

Jigsaw Toxicity Pred

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info