150,000 unfiltered samples for training models to generate concise titles from a user's first message in a conversation. Created by SupraLabs, this dataset is derived from the training pipeline for their experimental Supra Title model family. The dataset was last updated on June 14, 2026.
Use Cases
- Fine-tune a language model for chat title generation based on first-message context.
- Benchmark model performance on the specific task of generating descriptive titles from conversation starters.
- Train a sequence-to-sequence model to transform user queries into concise, descriptive titles.
- Evaluate the quality and conciseness of generated titles against a large-scale, task-specific corpus.
Strengths
- Contains 150,000 samples, providing a substantial corpus for model training.
- Focuses specifically on the single task of generating titles from a user's first message, offering clear task definition.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- The description metadata is limited; actual data quality requires manual inspection after download.
Provenance
- Source
- SupraLabs
- Collection Method
- Derived from the training pipeline used for the experimental Supra Title model family.
- Freshness
- Last updated 2026-06-14 12:41:43; freshness should be verified.