Vitiligo-Related Social Media Posts from Baidu and Reddit, 2021-2025
by Hongjie Luo·Updated 24d ago
1.3 MB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
2,305 publicly available posts about vitiligo from Baidu and Reddit published between October 2021 and October 2025. An AI-assisted workflow extracted 6,375 keywords, normalized into 2,419 unique terms, and categorized them into nine themes like treatment and diagnosis. The dataset was created by Hongjie Luo and shared under a CC-BY-4.0 license on figshare.
Use Cases
Compare patient concerns across different cultural platforms based on the categorized thematic content.
Analyze high-frequency questions about vitiligo treatment and diagnosis based on the extracted keywords.
Train NLP models for classifying health-related social media posts based on the thematic categories.
Study cross-platform differences in discussions of psychological impact and stigma based on the described results.
Strengths
Includes 2,305 posts with a clear breakdown by platform (Baidu: 1,414; Reddit: 891).
Covers a defined time range from October 2021 to October 2025.
Content was processed through an AI-assisted workflow and researcher-led validation.
Results are structured into nine thematic categories with quantified proportions.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
Row count for the underlying extracted keywords (6,375) is known, but the structure of the final dataset file is unspecified.
Data may reflect geographic and platform bias inherent to the sources (Baidu and Reddit).
Provenance
Source
Publicly available posts from Baidu and Reddit.
Collection Method
Stratified random sampling by publication year, AI-assisted keyword extraction and translation, followed by researcher-led validation and categorization.
Time Range
October 2021 to October 2025
Freshness
Last updated 2026-05-13 05:35:46; freshness should be verified.
Geography
Posts are from Chinese (Baidu) and international (Reddit) online communities.
Primary data file is a 1.3 MB DOCX document; the structure of the contained data is unknown.