Sign in to view source links and access this dataset
Description
20,085 true/false and 18,262 multiple-choice questions automatically generated from daily news headlines. The dataset, created by agentic-learning-ai-lab, spans from January 1, 2020, to May 26, 2026, and is designed to evaluate how large language models' prescient capabilities evolve over time.
Use Cases
Benchmarking LLM performance on future-oriented questions based on automatically generated news QA pairs.
Tracking the evolution of model capabilities over time based on the dataset's continuous evaluation design.
Analyzing model performance differences between true/false and multiple-choice question formats.
Studying the relationship between news events and model prediction accuracy based on the temporal span of the data.
Strengths
Contains 38,347 total questions (20,085 TF and 18,262 MC) for evaluation.
Covers a significant time span of over six years, from 2020-01-01 to 2026-05-26.
Designed for continuous evaluation, allowing for tracking capability changes over time.
Limitations
Column-level documentation is absent; field semantics must be inferred after download.
The dataset page indicates a 'Current Version' with a future end date (2026-05-26), which may affect interpretation.
The raw description is brief, and full details require visiting an external link.
Provenance
Source
huggingface dataset by agentic-learning-ai-lab.
Collection Method
Questions are automatically generated from daily news.
Time Range
2020-01-01 to 2026-05-26
Freshness
Last updated 2026-05-27 15:43:01; freshness should be verified.
The dataset page suggests viewing the full description on an external link for complete details.