LongCoT: Benchmark for Long-Horizon Reasoning Across Multiple Domains

Name: LongCoT: Benchmark for Long-Horizon Reasoning Across Multiple Domains
Creator: LongHorizonReasoning
Published: 2026-04-16T16:04:02
Keywords: Computer Science, Mathematics, Benchmark, Tabular, Reasoning Benchmark, Long Context

by LongHorizonReasoningUpdated 2mo ago

Available on 1 platform

Sign in to view source links and access this dataset

Description

LongCoT is a benchmark dataset designed to measure a model's ability to sustain coherent reasoning across long chains of thought. The data covers multiple domains including logic, computer science, chemistry, chess, and mathematics. It is released by LongHorizonReasoning on Hugging Face, with the canonical codebase hosted on GitHub.

Use Cases

Benchmarking model performance on long-context reasoning tasks based on the described multi-domain problems.
Training models for improved chain-of-thought reasoning based on the structured benchmark data.
Analyzing failure modes in logical and mathematical reasoning across extended sequences.
Developing specialized verifiers or evaluation harnesses for long-horizon tasks.

Strengths

Focuses on a specific and challenging AI task: long-horizon reasoning.
Covers a diverse set of domains including logic, computer science, chemistry, chess, and mathematics.
Data is provided in a viewer-friendly Parquet format for easy browsing and loading.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and dataset size are unknown, which may limit suitability assessment.

Provenance

Source: LongHorizonReasoning
Collection Method: Likely constructed as a benchmark for evaluating AI models.
Time Range: null
Freshness: Last updated 2026-04-16 16:17:15; freshness should be verified.
Geography: null

null

Tabular Computer Science Mathematics Benchmark Reasoning Benchmark Long Context

Related Datasets

Quality Score

D36

Description

39

Source

36

Reputation

35

Access

26

Community

1 likes

0 views

Dataset Info

Author: LongHorizonReasoning
Created: Apr 16, 2026
Updated: Apr 16, 2026
Last synced: May 14, 2026

Access

26

Community

1 likes

0 views

Dataset Info

Author: LongHorizonReasoning
Created: Apr 16, 2026
Updated: Apr 16, 2026
Last synced: May 14, 2026

LongCoT: Benchmark for Long-Horizon Reasoning Across Multiple Domains

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info