Oolong-Pairs is a long-context, pairwise-aggregation reasoning benchmark built on top of the oolongbench/oolong-synth dataset. Each task presents a long context of thousands of general-knowledge questions, each implicitly labeled with one of six TREC coarse categories. The dataset was created by mit-oasys and was last updated on June 1, 2026.
Use Cases
- Benchmarking long-context reasoning capabilities based on thousands of general-knowledge questions.
- Evaluating model performance on pairwise aggregation tasks based on implicit category labels.
- Training models to compute exact matches across aggregated categories based on the described TREC classification scheme.
Strengths
- Built on a long-context structure containing thousands of general-knowledge questions.
- Tasks involve implicit labeling across six distinct TREC coarse categories.
Limitations
- Description metadata is limited; actual data quality requires manual inspection after download.
- Column-level documentation is absent; field semantics must be inferred after download.
- Row count is unknown, which may limit suitability assessment.
Provenance
- Source
- mit-oasys
- Collection Method
- Built on top of the oolongbench/oolong-synth dataset.
- Freshness
- Last updated 2026-06-01 02:03:23; freshness should be verified.