DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Oolong Synth: A Synthetic Benchmark for Long Context Reasoning Evaluation | DataSalon

Home Machine LearningOolong Synth: A Synthetic Benchmark for Long Context Reasoning Evaluation

Machine Learning

Oolong Synth: A Synthetic Benchmark for Long Context Reasoning Evaluation

Name: Oolong Synth: A Synthetic Benchmark for Long Context Reasoning Evaluation
Creator: oolongbench
Published: 2025-09-21T03:16:43
Keywords: Ai Evaluation, Long Context Reasoning, Evaluation Benchmark, Benchmark, Text, Text, Time Series, Synthetic Data

by oolongbench·Updated 8d ago

Available on 1 platform

Description

A synthetic dataset created by oolongbench for evaluating long-context reasoning and aggregation capabilities in AI models. The dataset is structured with input context windows and corresponding questions, requiring models to produce answers. It was last updated on June 20, 2026, with corrections to some temporal queries.

Use Cases

Benchmarking model performance on long-context reasoning tasks based on the described context-window and question structure.
Evaluating information aggregation capabilities across extended text passages based on the dataset's purpose.
Testing model robustness on temporal queries within long contexts, as suggested by the update notes.

Strengths

Designed specifically for evaluating long-context reasoning, a key capability for modern AI.
Includes corrections for temporal queries, indicating maintenance and refinement.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.
The dataset is synthetic, which may not reflect real-world data distributions.

Provenance

Source: oolongbench
Collection Method: Synthetic generation for evaluation, as described in the associated research paper.
Freshness: Last updated 2026-06-20 23:54:02

License is unknown; users should verify terms before use.

Text Time Series Ai Evaluation Long Context Reasoning Evaluation Benchmark Benchmark Synthetic Data

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

3.5K downloads

5 likes

0 views

Dataset Info

Author: oolongbench
Created: Sep 21, 2025
Updated: Jun 20, 2026
Last synced: Jun 24, 2026

Access

Community

3.5K downloads

5 likes

0 views

Dataset Info

Author: oolongbench
Created: Sep 21, 2025
Updated: Jun 20, 2026
Last synced: Jun 24, 2026

Oolong Synth: A Synthetic Benchmark for Long Context Reasoning Evaluation

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info