Name: Toxicity Detection in Online Comments With and Without Context
Creator: John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos
License: Athens University
Keywords: Toxicity Detection, Computational Linguistics, Text Classification, Text, Content Moderation

Description

John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, and Ion Androutsopoulos created this dataset for binary classification of toxic online comments. The data includes annotations for comments both with and without their surrounding context, such as parent comments and discussion topics. The dataset is hosted on OpenML and originates from research documented in an arXiv paper.

Use Cases

Training binary classifiers to detect toxic comments based on the provided 'label'.
Comparing model performance on toxicity detection with and without contextual information.
Studying the impact of discussion context and parent comments on toxicity perception.
Benchmarking content moderation algorithms for online platforms.

Strengths

Provides dual annotation sets for in-context and out-of-context analysis.
Includes contextual features like parent comments and discussion topics.
Target variable is clearly defined as a binary label (1=toxic, 0=non-toxic).

Limitations

Row count and dataset size are unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
Last update date is unknown; freshness unverified.

Provenance

Source: Athens University
Collection Method: Annotations of online comments, likely from forum or social media threads.
Time Range: null
Freshness: null
Geography: null

License is attributed to Athens University; users should verify specific terms.

Text Toxicity Detection Computational Linguistics Text Classification Content Moderation

Toxicity Detection in Online Comments With and Without Context

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info