Thai Toxicity Tweet

Name: Thai Toxicity Tweet
Creator: tmu-nlp
Published: 2022-03-02T23:29:22
Keywords: Source Datasetsoriginal, Size Categories1 Kn10 K, Languageth, Language Creatorsfound, Regionus, Task Categoriestext Classification, Multilingualitymonolingual, Task Idssentiment Classification, Annotations Creatorsexpert Generated

by tmu-nlpUpdated 2y ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

3,300 human-annotated Thai tweets categorized into 2,027 toxic and 1,273 non-toxic samples. The corpus includes labels from three annotators guided by a 44-word dictionary and accounts for 506 tweets that are no longer publicly available via a TWEET_NOT_FOUND placeholder in the text field.

Use Cases

Train a binary classification model to distinguish between toxic and non-toxic content using the tweet_text and human labels
Evaluate the accuracy of keyword-based moderation systems by comparing the 44-word dictionary against actual toxicity labels
Research linguistic features of sarcasm and word sense ambiguity in Thai text that lead to annotator disagreement

Strengths

3,300 total tweets labeled by three human annotators for binary toxicity
Contains 2,027 toxic and 1,273 non-toxic samples
Identifies 506 missing tweets with the specific string TWEET_NOT_FOUND in the tweet_text column
Annotation process utilized a specific 44-word dictionary to guide reviewers

Source Datasetsoriginal Size Categories1 Kn10 K Languageth Language Creatorsfound Regionus Task Categoriestext Classification Multilingualitymonolingual Task Idssentiment Classification Annotations Creatorsexpert Generated

Related Datasets

Quality Score

D39

Description

42

Source

49

Reputation

23

Access

22

Community

92 downloads

10 likes

0 views

Dataset Info

Author: tmu-nlp
Created: Mar 2, 2022
Updated: Jan 18, 2024
Last synced: Jun 3, 2026

Access

22

Community

92 downloads

10 likes

0 views

Dataset Info

Author: tmu-nlp
Created: Mar 2, 2022
Updated: Jan 18, 2024
Last synced: Jun 3, 2026

Thai Toxicity Tweet

Description

Use Cases

Strengths

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info