Name: When Cot Knows Better: Turn-Level Diagnostics for Multi-Turn Reasoning Models
Creator: UVSKKR
Published: 2026-06-04T03:44:56
Keywords: Reasoning Models, Chain Of Thought, Benchmark, Tabular, Failure Modes, Adversarial Evaluation, Ai Diagnostics

Description

UVSKKR's dataset provides evaluation artifacts for the ICML 2026 Workshop paper "When the Chain of Thought Knows Better: Failure Modes in Multi-Turn Reasoning Models". It offers a granular, turn-level diagnostic of how distilled reasoning models behave under prolonged adversarial pressure. The dataset was last updated on June 4, 2026.

Use Cases

Diagnose failure modes in reasoning models based on turn-level adversarial pressure data.
Benchmark model robustness based on granular, multi-turn evaluation artifacts.
Analyze the behavior of distilled reasoning models under prolonged adversarial conditions.

Strengths

Dataset is associated with a paper accepted at the ICML 2026 Workshop on Failure Modes in Agentic AI.
Provides a granular, turn-level diagnostic of model behavior.
Focuses on a specific and current research area: failure modes in multi-turn reasoning.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count and file formats are unknown, which may limit suitability assessment.
The full description is hosted externally, requiring a click-through for complete context.

Provenance

Source: UVSKKR on Hugging Face, associated with an ICML 2026 Workshop paper.
Collection Method: Likely contains evaluation artifacts generated for the research paper.
Freshness: Last updated 2026-06-04 11:34:38

The full dataset description is hosted on an external page; complete metadata requires visiting the provided URL.

Tabular Reasoning Models Chain Of Thought Benchmark Failure Modes Adversarial Evaluation Ai Diagnostics

When Cot Knows Better: Turn-Level Diagnostics for Multi-Turn Reasoning Models

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info