TweepFake Synthetic: Machine-Generated Tweets for Controlled Detection Studies

Name: TweepFake Synthetic: Machine-Generated Tweets for Controlled Detection Studies
Creator: KirillNik
Published: 2026-06-04T16:36:16
Keywords: Social Media, Nlp Research, Text, Natural Language Processing, Llm Generated Content, Synthetic Text, Synthetic

by KirillNikUpdated 1mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

KirillNik created a corpus of synthetic tweets generated by three open or API-based large language models. The dataset is designed for controlled-variable studies on detecting machine-generated social-media text, with topics extracted from real human tweets. It was last updated on June 4, 2026.

Use Cases

Training machine-generated text detectors based on the controlled-variable corpus design.
Studying the stylistic differences between outputs from multiple LLMs based on the described prompting strategies.
Analyzing topic-conditioned text generation by holding the subject matter constant across model outputs.

Strengths

Dataset design holds topics constant across models and prompts, allowing attribution of output differences to the model and prompt.
Built specifically as a controlled-variable corpus for studying machine-generated text detection.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
Description metadata is limited; actual data quality requires manual inspection after download.

Provenance

Source: huggingface
Collection Method: Synthetic tweets generated by three open/API LLMs, conditioned on topics from real human tweets.
Freshness: Last updated 2026-06-04 16:46:04; freshness should be verified.

Text Social Media Nlp Research Natural Language Processing Llm Generated Content Synthetic Text Synthetic

Related Datasets

Quality Score

D37

Description

42

Source

36

Reputation

39

Access

26

Community

10 downloads

1 likes

0 views

Dataset Info

Author: KirillNik
Created: Jun 4, 2026
Updated: Jun 4, 2026
Last synced: Jun 11, 2026

Access

26

Community

10 downloads

1 likes

0 views

Dataset Info

Author: KirillNik
Created: Jun 4, 2026
Updated: Jun 4, 2026
Last synced: Jun 11, 2026

TweepFake Synthetic: Machine-Generated Tweets for Controlled Detection Studies

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info