DataSalon

Discover quality datasets for AI training — aggregated from 40+ platforms, curated by AI.

ProductSearch Datasets Browse Topics Rankings Community API / MCP

ResourcesDocumentation Blog Changelog Status

LegalPrivacy Policy Terms of Service Cookie Policy

DHSA RULER: Long-Context NLP Evaluation Benchmark | DataSalon

Home NLP & TextDHSA RULER: Long-Context NLP Evaluation Benchmark

NLP & Text

DHSA RULER: Long-Context NLP Evaluation Benchmark

Name: DHSA RULER: Long-Context NLP Evaluation Benchmark
Creator: sxiong
Published: 2026-06-19T02:41:31
Keywords: Nlp Evaluation, Benchmark, Question Answering, Text, Retrieval, Long Context, Synthetic

by sxiong·Updated 9d ago

Available on 1 platform

Description

RULER is a benchmark designed to evaluate effective context length and long-context behavior beyond simple retrieval. The dataset contains pre-generated JSONL files organized by target context lengths of 4096, 8192, 16384, 32768, and 49152 tokens. It was authored by sxiong and last updated on June 19, 2026.

Use Cases

Benchmarking model performance on retrieval tasks based on the described retrieval evaluation.
Evaluating multi-hop reasoning capabilities based on the multi-hop tracing tasks mentioned.
Testing model ability for information aggregation based on the described aggregation tasks.
Assessing question-answering performance in long-context settings based on the described QA-style tasks.

Strengths

Structured for specific target context lengths up to 49152 tokens.
Covers multiple task types including retrieval, multi-hop tracing, aggregation, and question answering.
Last updated on June 19, 2026, suggesting recent maintenance.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
Row count, file formats, and sample data are unknown, which may limit suitability assessment.
License information is unknown, which may restrict usage.

Provenance

Source: huggingface
Collection Method: Pre-generated for the RULER benchmark; specific collection method not detailed.
Freshness: Last updated 2026-06-19 02:42:01

License is unknown; users must verify terms of use before downloading.

Text Nlp Evaluation Benchmark Question Answering Retrieval Long Context Synthetic

Related Datasets

Quality Score

D37

Description

Source

Reputation

Quality Score

D37

Description

Source

Reputation

Access

Community

4 downloads

1 likes

0 views

Dataset Info

Author: sxiong
Created: Jun 19, 2026
Updated: Jun 19, 2026
Last synced: Jun 25, 2026

Access

Community

4 downloads

1 likes

0 views

Dataset Info

Author: sxiong
Created: Jun 19, 2026
Updated: Jun 19, 2026
Last synced: Jun 25, 2026

DHSA RULER: Long-Context NLP Evaluation Benchmark

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info