Model Performance Summary for Cancer Support Text Emotional Tone Classification
by Shuo Xu·Updated 8d ago
5.5 KB1files
Available on 1 platform
Sign in to view source links and access this dataset
Description
Shuo Xu's dataset, last updated May 29, 2026, summarizes test performance for five model families classifying emotional tone in cancer peer-support text. The 5.5 KB Excel file contains results from a study using the 'Mental Health Insights: Vulnerable Cancer Survivors & Caregivers' dataset, comparing models like TF-IDF Logistic Regression, Random Forest, LightGBM, GRU, and fine-tuned ALBERT. It includes performance metrics such as weighted F1 and macro one-vs-rest AUC with bootstrap confidence intervals.
Use Cases
Benchmarking NLP model performance on emotional tone classification based on the described comparison of five model families.
Studying the impact of LLM-based annotation on label distribution based on the described shift in label prevalence.
Evaluating token augmentation techniques for incorporating context features based on the described prepending of LLM-extracted variables.
Analyzing error patterns in multi-class sentiment classification based on the described polarity-reversing and adjacent errors.
Strengths
Performance metrics include weighted F1 and macro one-vs-rest AUC with bootstrap confidence intervals, providing statistical rigor.
Model comparison is based on a 60/20/20 stratified train/validation/test split with hyperparameters selected on validation data only.
The study introduces two methodological extensions: LLM-based annotation and token-based augmentation with structured variables.
Limitations
Row count is unknown, which may limit suitability assessment.
Column-level documentation is absent; field semantics must be inferred after download.
The dataset is very small at 5.5 KB, indicating it is a summary of results rather than a primary data corpus.
Provenance
Source
figshare, author Shuo Xu
Collection Method
Derived from a study using the 'Mental Health Insights: Vulnerable Cancer Survivors & Caregivers' dataset; models were trained and tested on a stratified split.
Freshness
Last updated 2026-05-29 17:34:09; freshness should be verified.
License is CC-BY-4.0. Data is in XLS format, requiring compatible software. The 5.5 KB size indicates a very limited summary table.