Sign in to view source links and access this dataset
Description
MINTEval is an analytical benchmark for evaluating memory under multi-target interference in long-horizon agent systems. It was introduced in a research paper and published on Hugging Face by the author 'dinobby' on 2026-05-19. Each example presents a sequence of contexts followed by questions that require reasoning over that history, covering four domains: state tracking, multi-turn dialogue, Wikipedia revisions, and GitHub commits.
Use Cases
Benchmarking agent memory performance based on sequences of events and edits.
Evaluating reasoning over long conversation histories based on multi-turn dialogue contexts.
Testing an AI's ability to track changes and answer questions based on Wikipedia revision histories.
Assessing memory for code evolution and commit histories in software development contexts.
Strengths
Covers four distinct domains (state tracking, multi-turn dialogue, Wikipedia revisions, GitHub commits) for varied evaluation.
Designed specifically for evaluating memory under multi-target interference, a defined research challenge.
Introduced in a dedicated research paper, suggesting a formal methodological foundation.
Limitations
Description metadata is limited; actual data quality, size, and structure require manual inspection after download.
Column-level documentation is absent; field semantics must be inferred after download.
Row count and file formats are unknown, which may limit suitability assessment.
Provenance
Source
Hugging Face dataset uploaded by author 'dinobby'.
Collection Method
Introduced as a benchmark in a research paper; specific data collection method is not detailed in the provided input.
Freshness
Last updated 2026-05-19 15:56:00; freshness should be verified.
License is unknown; users must verify terms of use before application.