Uncivil Reddit is a text dataset from the figshare platform, published under a CC-BY-4.0 license. The dataset is 242.2 MB in size and is available in CSV and R file formats. It was last updated on May 4, 2026, by an author listed as Anonymous Anon.
Use Cases
- Training models to detect uncivil or toxic language in online comments (inferred from domain, verify after download)
- Analyzing discourse patterns and community interactions on social media platforms (inferred from domain, verify after download)
- Benchmarking natural language processing tools for sentiment or conflict analysis (inferred from domain, verify after download)
Strengths
- Published on the figshare platform with a permissive CC-BY-4.0 license.
- Dataset size is 242.2 MB, indicating a medium-scale collection.
- Available in multiple formats, including CSV and R, for accessibility.
Limitations
- Metadata is minimal; actual content requires verification after download.
- Row count and column definitions are unknown, limiting suitability assessment.
- Data may reflect temporal or source bias inherent to its collection from Reddit.
Provenance
- Source
- figshare
- Freshness
- Last updated 2026-05-04 10:51:15.