Toxicity Detection in Online Comments With and Without Context
by John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos
arff
Available on 1 platform
Sign in to view source links and access this dataset
Description
John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, and Ion Androutsopoulos created this dataset for binary classification of toxic online comments. The data includes annotations for comments both with and without their surrounding context, such as parent comments and discussion topics. The dataset is hosted on OpenML and originates from research documented in an arXiv paper.
Use Cases
Training binary classifiers to detect toxic comments based on the provided 'label'.
Comparing model performance on toxicity detection with and without contextual information.
Studying the impact of discussion context and parent comments on toxicity perception.
Benchmarking content moderation algorithms for online platforms.
Strengths
Provides dual annotation sets for in-context and out-of-context analysis.
Includes contextual features like parent comments and discussion topics.
Target variable is clearly defined as a binary label (1=toxic, 0=non-toxic).
Limitations
Row count and dataset size are unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Last update date is unknown; freshness unverified.
Provenance
Source
Athens University
Collection Method
Annotations of online comments, likely from forum or social media threads.
Time Range
null
Freshness
null
Geography
null
License is attributed to Athens University; users should verify specific terms.