Name: Daily Oracle: A Continuous Benchmark for LLM Future Prediction Using News QA Pairs
Creator: agentic-learning-ai-lab
Published: 2025-07-10T03:58:12
Keywords: News Qa, Temporal Benchmark, Benchmark, Llm Evaluation, Text, Prediction Capability, Synthetic

Description

20,085 true/false and 18,262 multiple-choice questions automatically generated from daily news headlines. The dataset, created by agentic-learning-ai-lab, spans from January 1, 2020, to May 26, 2026, and is designed to evaluate how large language models' prescient capabilities evolve over time.

Use Cases

Benchmarking LLM performance on future-oriented questions based on automatically generated news QA pairs.
Tracking the evolution of model capabilities over time based on the dataset's continuous evaluation design.
Analyzing model performance differences between true/false and multiple-choice question formats.
Studying the relationship between news events and model prediction accuracy based on the temporal span of the data.

Strengths

Contains 38,347 total questions (20,085 TF and 18,262 MC) for evaluation.
Covers a significant time span of over six years, from 2020-01-01 to 2026-05-26.
Designed for continuous evaluation, allowing for tracking capability changes over time.

Limitations

Column-level documentation is absent; field semantics must be inferred after download.
The dataset page indicates a 'Current Version' with a future end date (2026-05-26), which may affect interpretation.
The raw description is brief, and full details require visiting an external link.

Provenance

Source: huggingface dataset by agentic-learning-ai-lab.
Collection Method: Questions are automatically generated from daily news.
Time Range: 2020-01-01 to 2026-05-26
Freshness: Last updated 2026-05-27 15:43:01; freshness should be verified.

The dataset page suggests viewing the full description on an external link for complete details.

Text News Qa Temporal Benchmark Benchmark Llm Evaluation Prediction Capability Synthetic

Daily Oracle: A Continuous Benchmark for LLM Future Prediction Using News QA Pairs

Description

Use Cases

Strengths

Limitations

Provenance

Related Topics

Related Datasets

Quality Score

Community

Dataset Info

Community

Dataset Info