TPBench is a dataset for evaluating long-dialogue compression around lifecycle turning points. It was submitted as an artifact for the NeurIPS 2026 Evaluations and Datasets Track by the author '4papersubmission'. The dataset includes probe JSONL files, result aggregates, scorer/reader code, license disclosures, and Croissant metadata with Responsible AI fields.
Use Cases
- Benchmarking dialogue compression models based on the described turning-point focus.
- Evaluating model performance on long-dialogue tasks using the provided probe files and scorer code.
- Studying Responsible AI practices in dataset construction using the included Croissant metadata.
- Analyzing compression quality around specific lifecycle events mentioned in the description.
Strengths
- Includes Responsible AI metadata fields as part of the Croissant metadata.
- Provides evaluation code (scorer/reader) alongside the data artifacts.
- Designed for a specific, defined NLP task: evaluating long-dialogue compression around turning points.
Limitations
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
- The dataset's provenance and organization are listed as unknown.
Provenance
- Source
- Author '4papersubmission' on Hugging Face.
- Collection Method
- Created as an artifact for NeurIPS 2026 Evaluations and Datasets Track.
- Freshness
- Last updated 2026-05-07 06:22:21; freshness should be verified.