DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

Oolong-Pairs: A Long-Context Pairwise-Aggregation Reasoning Benchmark | DataSalon

Home NLP & TextOolong-Pairs: A Long-Context Pairwise-Aggregation Reasoning Benchmark

NLP & Text

Oolong-Pairs: A Long-Context Pairwise-Aggregation Reasoning Benchmark

Name: Oolong-Pairs: A Long-Context Pairwise-Aggregation Reasoning Benchmark
Creator: mit-oasys
Published: 2026-06-01T01:44:04
Keywords: Nlp Evaluation, Benchmark, Question Answering, Text, Reasoning Benchmark, Long Context

by mit-oasys·Updated 1mo ago

Available on 1 platform

Description

Oolong-Pairs is a long-context, pairwise-aggregation reasoning benchmark built on top of the oolongbench/oolong-synth dataset. Each task presents a long context of thousands of general-knowledge questions, each implicitly labeled with one of six TREC coarse categories. The dataset was created by mit-oasys and was last updated on June 1, 2026.

Use Cases

Benchmarking long-context reasoning capabilities based on thousands of general-knowledge questions.
Evaluating model performance on pairwise aggregation tasks based on implicit category labels.
Training models to compute exact matches across aggregated categories based on the described TREC classification scheme.

Strengths

Built on a long-context structure containing thousands of general-knowledge questions.
Tasks involve implicit labeling across six distinct TREC coarse categories.

Limitations

Description metadata is limited; actual data quality requires manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count is unknown, which may limit suitability assessment.

Provenance

Source: mit-oasys
Collection Method: Built on top of the oolongbench/oolong-synth dataset.
Freshness: Last updated 2026-06-01 02:03:23; freshness should be verified.

Text Nlp Evaluation Benchmark Question Answering Reasoning Benchmark Long Context

Related Datasets

Quality Score

D40

Description

Source

Reputation

Quality Score

D40

Description

Source

Reputation

Access

Community

197 downloads

1 likes

0 views

Dataset Info

Author: mit-oasys
Created: Jun 1, 2026
Updated: Jun 1, 2026
Last synced: Jun 20, 2026

Access

Community

197 downloads

1 likes

0 views

Dataset Info

Author: mit-oasys
Created: Jun 1, 2026
Updated: Jun 1, 2026
Last synced: Jun 20, 2026

Oolong-Pairs: A Long-Context Pairwise-Aggregation Reasoning Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info